Commit Graph

85 Commits

Author SHA1 Message Date
Daniel Lemire c13c2650a2
Merge pull request #940 from simdjson/issue938
Verifying (and fixing) issue 938
2020-06-18 18:25:31 -04:00
Daniel Lemire 04a19f9813 Fixes https://github.com/simdjson/simdjson/issues/937 2020-06-17 18:06:13 -04:00
Daniel Lemire 6537d0dc76 Avoiding the unused errors. 2020-06-17 14:19:58 +00:00
Daniel Lemire 8d609607e2 Verifying the bug. 2020-06-16 20:04:09 -04:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire 4dfbf98e4e
Using a worker instead of a thread per batch (#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.

To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.

  This fixes our parse_stream benchmark which is just busted.
  This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.

Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.

Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 16:51:18 -04:00
Daniel Lemire 45e2178ada Duh. 2020-06-11 17:20:28 +00:00
Daniel Lemire a6e4933d93 Exposing the string minifier. 2020-06-11 13:07:18 -04:00
John Keiser fe01da077e Make threaded version work again 2020-06-07 16:21:00 -07:00
John Keiser c4a0fe1606 Add tests for parse_many() errors 2020-06-07 16:20:46 -07:00
Daniel Lemire 7a69da16e4
Fixing issue 906 (#912)
* Fixing issue 906

* Safe patching.

* Now with explanations.

* Bumping up memory allocation.

* Putting the patch back.

* fallback fixes.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-05 15:37:09 -04:00
Daniel Lemire 12150baa5e
Using just ASCII. (#899)
* Using just ASCII.

* Let us prune checkperf.

* Moving the description of lookup2 to the HACKING.md file.
2020-05-21 21:59:06 -04:00
John Keiser 5312fd30e5 Fix CRT_SECURE warnings in clang 2020-05-04 11:36:00 -07:00
John Keiser 0e6ea76e88
Make checkperf work on Windows (#799)
* Make command line arguments work for Windows

* Run checkperf on Windows
2020-04-27 14:20:05 -04:00
John Keiser d4a37f6ef5 Enable conversion warnings on Linux and Windows 2020-04-22 14:21:30 -07:00
John Keiser ff09b6c824 Run fewer redundant steps and configs in CI 2020-04-17 12:23:05 -07:00
John Keiser 289cc3e7a0 Treat warnings as errors during compilation 2020-04-15 19:59:38 -07:00
Daniel Lemire 6d7c77ddc1
Let us try to check with the exceptions disabled. (#707)
* Tweaking code so that we can run all tests with exceptions off.
* Removing SIMDJSON_DISABLE_EXCEPTIONS
2020-04-15 16:45:36 -04:00
Daniel Lemire b523c43927
Can we provide a size() function to arrays and objects? (eager approach) [TO BE MERGED] (#690)
* This is an implementation of "size()" for arrays and objects.
* Adding benchmark
* Adding a size() remark in the documentation.
* Extending size() to result types.
2020-04-15 10:15:48 -04:00
John Keiser b9ac0a79f1
Merge pull request #715 from simdjson/jkeiser/thorough-type-tests
Test more variants of cast, get, etc.
2020-04-14 16:08:36 -07:00
Daniel Lemire 8539896f3d
It is inconvenient to be unable to print a padded_string. (#713)
* It is inconvenient to be unable to print a padded_string.

* Allows us to print the padded_string even when it is embedded in result object when exceptions are enabled.
2020-04-14 19:07:32 -04:00
John Keiser a3b508ceff Test get<>(), exception vs. no exception, explicit vs. implicit cast 2020-04-14 13:18:42 -07:00
John Keiser 3dcc188d93 Add more tests to cmake 2020-04-08 14:52:56 -07:00
John Keiser 10b7556a37 Specify cmake tests, benchmarks and tools idiomatically 2020-04-08 14:52:56 -07:00
John Keiser 406240bae3 Support C++ 14 2020-04-08 14:00:13 -07:00
Daniel Lemire 5731c5437a
Sanity test. (#675) 2020-04-04 16:39:37 -04:00
Daniel Lemire 04f14ec026
This adds a test for std::ignore (#674) 2020-04-04 11:53:03 -04:00
John Keiser 13aee51011 Add element.type() for type switching 2020-04-02 14:07:19 -07:00
John Keiser 434776db1a Deprecate more things 2020-03-30 13:48:43 -07:00
John Keiser ea8a5020e2 Remove array indexer, make object indexer key lookup 2020-03-28 15:56:43 -07:00
John Keiser 622d9c9480 Replace as_X and is_X with get<T> and is<T> 2020-03-28 15:29:53 -07:00
John Keiser 62da98aef6 Rename dom::stream to dom::document_stream 2020-03-28 13:42:24 -07:00
John Keiser 03746b966b Move document/element/etc. under dom 2020-03-28 13:42:21 -07:00
John Keiser e836c28008 Deprecate parser error code methods
- Also make competitions compile without warnings
2020-03-28 10:13:20 -07:00
John Keiser 5ad405006c Return document::element from parse, load, parse_many, load_many 2020-03-27 12:24:41 -07:00
John Keiser c14b2fb36c Remove const char* variants for at_key()
- Remove const char * variants for at_key(), string_view covers them
- Add at_key_case_insensitive variants on *_result
- Add at(), at_key(), at_key_case_insensitive() tests
2020-03-27 09:09:08 -07:00
John Keiser f0f111b387 Make ParsedJson::Iterator backcompat test 2020-03-27 09:07:39 -07:00
Daniel Lemire 5fb149f833 Converted inter_tests... 2020-03-26 19:52:17 -04:00
Daniel Lemire abb0bf9247 Fixed basictests 2020-03-26 19:40:29 -04:00
John Keiser 2e420169c3 Remove document::parse and document::load 2020-03-26 10:13:09 -07:00
John Keiser 5aec2671ea Remove JsonStream. Use parse_many() instead. 2020-03-26 09:25:07 -07:00
John Keiser a0bce440a6 Remove document_iterator, document::iterator, ParsedJsonIterator
Keep ParsedJson::Iterator only, without template, in same form as
it was in 0.2
2020-03-25 18:26:51 -07:00
John Keiser e1b1500e3b Make _padded available without using namespace simdjson 2020-03-25 09:37:18 -07:00
John Keiser c34b1a1b2a Organize basic tests to make easier to turn on/off 2020-03-21 18:12:16 -07:00
John Keiser e4df0ca368 Add parse, parse_many, load, load_many tests 2020-03-21 18:12:16 -07:00
Daniel Lemire 5d1e3efce8
faster minifier (#568)
* Fallback should use our scalar code.
* parse should have a nicer error message.
* Making it so that "minify" can use different architectures.
* Let us change the minifier competition so that it tests all implementations.
* Documenting the untaken optimization opportunity.

Co-authored-by: John Keiser <john@johnkeiser.com>
2020-03-20 16:14:47 -04:00
John Keiser 7cf3a7511b Add fallback implementation to CI
- Also add SIMDJSON_IMPLEMENTATION_HASWELL/WESTMERE/ARM64/FALLBACK=1/0 to
enable/disable various implemnentations
2020-03-17 14:59:47 -07:00
John Keiser af203aaf86 Add fallback parser for pre-SSE4.2 machines 2020-03-17 14:59:47 -07:00
John Keiser 8e2c06cb0e Compile with -fno-exceptions 2020-03-17 13:54:37 -07:00