Commit Graph

1606 Commits

Author SHA1 Message Date
John Keiser 05bc664c11 Don't extend from tape_ref in public classes 2020-06-19 13:25:52 -07:00
Daniel Lemire 2cc84b6e51
Merge pull request #943 from simdjson/dlemire/fewer_perf_tests
Fewer performance tests.
2020-06-18 22:04:50 -04:00
Daniel Lemire 5ccdbef7d5
Merge pull request #936 from simdjson/dlemire/new_examples
New examples.
2020-06-18 18:29:06 -04:00
Daniel Lemire c13c2650a2
Merge pull request #940 from simdjson/issue938
Verifying (and fixing) issue 938
2020-06-18 18:25:31 -04:00
Daniel Lemire ec6c998a3a
Merge pull request #942 from simdjson/dlemire/better_doxygen_home_page
Tweaks doxygen so that we have a better main page.
2020-06-18 18:25:04 -04:00
Daniel Lemire 2f6091419f
Merge pull request #944 from simdjson/issue680
Document the complexity of array.at
2020-06-18 18:24:08 -04:00
Daniel Lemire 2022dd7d74
Merge pull request #945 from simdjson/issue678
Fixing issue 678
2020-06-18 18:23:56 -04:00
Daniel Lemire b8202dab3b
Merge pull request #946 from simdjson/issue937
Fixing issue 937
2020-06-18 18:20:44 -04:00
Daniel Lemire ef688a74fe Minor tweak to the documentation. 2020-06-18 18:18:12 -04:00
John Keiser f632e7c043 Put C++11 capable version back, change name to readme style 2020-06-18 12:50:49 -07:00
Daniel Lemire 04a19f9813 Fixes https://github.com/simdjson/simdjson/issues/937 2020-06-17 18:06:13 -04:00
Daniel Lemire 2cbc591c9d Fixing issue 678 2020-06-17 16:17:17 -04:00
Daniel Lemire 3f00e79bcb
Merge branch 'master' into dlemire/better_doxygen_home_page 2020-06-17 16:02:49 -04:00
Daniel Lemire 3586fc4910 Fix for issue 680 2020-06-17 18:49:22 +00:00
Daniel Lemire c9a6bbeb64
Merge pull request #935 from simdjson/dlemire/tuning_the_documentation
Some tweaks to the documentation
2020-06-17 14:33:23 -04:00
Daniel Lemire 0655a135e6 Reverting. 2020-06-17 17:52:07 +00:00
Daniel Lemire d3e8bb1889 Fewer performance tests. 2020-06-17 17:44:28 +00:00
Daniel Lemire 14ceacac73 Tweaking. 2020-06-17 13:27:17 -04:00
Daniel Lemire e4f33b5970 Tweaking the message. 2020-06-17 12:36:36 -04:00
Daniel Lemire 4474f8ef18 Cleaning a bit the examples. 2020-06-17 16:24:55 +00:00
John Keiser 76c9f4f5a6
Merge pull request #941 from simdjson/jkeiser/forgot
Remove unnecessary functions
2020-06-17 09:09:28 -07:00
Daniel Lemire 942ef3b7f2
Merge pull request #939 from simdjson/dlemire/lookup3
Introducing lookup3 (UTF-8 validation).
2020-06-17 11:19:09 -04:00
Daniel Lemire 0b9df6d8c4 It turns out that we need fairly complicated logic. 2020-06-17 15:17:10 +00:00
Daniel Lemire b5ea504ad2 Tweaks doxygen so that we have a better main page. 2020-06-17 11:07:21 -04:00
Daniel Lemire 803b0c4bdb Light touch. 2020-06-17 11:00:13 -04:00
Daniel Lemire 6537d0dc76 Avoiding the unused errors. 2020-06-17 14:19:58 +00:00
John Keiser f8f36c085c Remove unnecessary functions 2020-06-17 07:11:53 -07:00
John Keiser 7339f67dd7
Merge pull request #462 from simdjson/jkeiser/if-backslash
Wrap backslash processing in a branch
2020-06-17 07:07:58 -07:00
Daniel Lemire 0d4e501239 Fixing the bug. 2020-06-17 10:06:16 -04:00
Daniel Lemire 8d609607e2 Verifying the bug. 2020-06-16 20:04:09 -04:00
Daniel Lemire 71a889ed73 Introducing lookup3 (UTF-8 validation). 2020-06-16 19:08:25 -04:00
Daniel Lemire 27a75a9085 Tweaking. 2020-06-15 17:54:34 -04:00
Daniel Lemire 954d6c326d New examples. 2020-06-15 17:45:15 -04:00
Daniel Lemire 7ea05d038e
New API traversal tests. (#931)
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-15 13:15:52 -04:00
Daniel Lemire 33930ff046 Adding link. 2020-06-15 13:07:53 -04:00
Daniel Lemire 16f41ea059 Added a word. 2020-06-14 18:48:42 -04:00
Daniel Lemire 0a7270fc29 More tweaks. 2020-06-14 18:47:22 -04:00
Daniel Lemire 23fbd9d004 Some tweaks. 2020-06-14 18:28:09 -04:00
John Keiser 610c79fbf3 Don't use backslash branch on ARM 2020-06-13 07:51:28 -07:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire 89b059b1ea
Testing with GCC 10 and clang 10 (#926)
* Testing with GCC 10 and clang 10

* Fixing spurious space

* gcc10 does not need the cmake installation.

* We don't want to run the perf test on ARM. I ignore them systematically. ARM performance
should be assessed manually.

* Switching to GCC 10 and Clang 10

* Disabling some tests under sanitizers when they involve rapidjson or other parsers.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 17:58:53 -04:00
Daniel Lemire bd2d0f769f
One unlikely too many (#930) 2020-06-12 17:58:10 -04:00
Daniel Lemire d830422489
Put back the amalgamation files and add tests (#929)
Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 17:57:45 -04:00
Daniel Lemire d1a54249e7 New API traversal tests. 2020-06-12 17:42:57 -04:00
Daniel Lemire 4dfbf98e4e
Using a worker instead of a thread per batch (#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.

To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.

  This fixes our parse_stream benchmark which is just busted.
  This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.

Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.

Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 16:51:18 -04:00
Daniel Lemire 1b6258ec8c Added std::minify 2020-06-12 16:37:41 -04:00
Daniel Lemire be707dbb6f Added a remark 2020-06-12 16:07:34 -04:00
John Keiser 664b03bb13 Short circuit find escapes if there is a backslash 2020-06-12 10:10:35 -07:00
John Keiser 7c6723d912 Print progress bar even if there is only one file 2020-06-12 10:01:19 -07:00