Commit Graph

237 Commits

Author SHA1 Message Date
Daniel Lemire 039d82ff1b
Returning basictests to its original function: basic tests (only) (#1010)
* The initial motivation behind basictests was for a quick set of sanity tests to check whether your code made sense. It
was not meant for thorough testing to find corner cases. However, over time, it grew to include such expensive tests.
This PR takes them out. It also allows us to bring back basictests to MinGW tests, since it is now cheap.

This is not an exercise in software engineering and making things prettier. This is a pragmatic change to improve our
test coverage and quality of life.

* Adds many more cheap tests.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-07-13 09:39:35 -04:00
Daniel Lemire 74870a8189
Fixing issue 1013. (#1016)
* Fixing issue 1013.

* Bumping to 0.4.6

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-07-01 14:14:51 -04:00
Daniel Lemire 0ef4d90ad0
Fix for issue 1014. (#1015)
* Fix for issue 1014.

* Explanation.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-30 19:36:26 -04:00
Daniel Lemire ccc94c9b05
Mingw tests (32-bit and 64-bit) (#1004) 2020-06-29 21:10:54 -04:00
Daniel Lemire cb8a9ef2c0 This removes git as a dependency 2020-06-24 15:13:47 -04:00
John Keiser 187084ce46
Merge pull request #970 from simdjson/jkeiser/singleheader-tests
Make singleheader tests be test-only
2020-06-23 17:07:03 -07:00
Daniel Lemire 544fa57641 Damn merge conflicts. 2020-06-23 19:15:47 -04:00
John Keiser 843b73dedb Make singleheader tests be test-only 2020-06-23 13:35:27 -07:00
Daniel Lemire b84a3a0230
Merge branch 'master' into issue961 2020-06-23 14:33:06 -04:00
John Keiser 257089884f
Merge pull request #958 from simdjson/jkeiser/is
Make simdjson_result<element>.is() return bool
2020-06-23 09:51:37 -07:00
John Keiser c650ea9765
Merge pull request #960 from simdjson/jkeiser/idiomatic-get
Convert simdjson to use .get()
2020-06-23 09:49:41 -07:00
John Keiser 2d84b6f6d9 Make simdjson_result<element>.is() return bool 2020-06-23 09:09:24 -07:00
John Keiser eef1171944
Merge pull request #954 from simdjson/jkeiser/parse-many-result
Return error from parse_many
2020-06-23 09:06:20 -07:00
Daniel Lemire 696b0e29e4 Fixing issue 961 2020-06-23 10:47:32 -04:00
Daniel Lemire dada5090b0 These compilers are insane. 2020-06-22 20:25:55 -04:00
Daniel Lemire 1c4593c648 These compilers are really pedantic. 2020-06-22 20:04:37 -04:00
Daniel Lemire e7004cef76 Removing a test so that it is all ASCII. 2020-06-22 16:55:16 -04:00
Daniel Lemire 2bb101bd19 Code reformatting. 2020-06-22 16:50:57 -04:00
Daniel Lemire 26baf70912 Pedantic compiler 2020-06-22 16:45:32 -04:00
Daniel Lemire 69a247d500 Adding tests. 2020-06-22 16:12:37 -04:00
Daniel Lemire a76c67c19f Fixing... 2020-06-22 15:57:54 -04:00
John Keiser 0c9dc11550 Use really_inline to help g++ detect initialized variable 2020-06-21 16:27:05 -07:00
John Keiser 1ff55c2729 Replace auto [x,error] with .get() everywhere 2020-06-21 16:26:59 -07:00
Daniel Lemire 38bb08778a With an example. 2020-06-21 17:57:22 -04:00
Daniel Lemire 5dbcdf1484 Ok 2020-06-21 17:52:30 -04:00
John Keiser 6fa5abcd7e Replace x.get<T>() with x.get(v) or T(x) 2020-06-21 14:36:38 -07:00
John Keiser 1b1a122b1f Fix copy constructor issue on older gcc 2020-06-21 12:06:14 -07:00
John Keiser ae1bd891e7 Remove deprecated uses of parse_many 2020-06-21 11:19:06 -07:00
John Keiser 9899e5021d Allow use of document_stream with tie() 2020-06-20 21:15:05 -07:00
John Keiser a7fc7d4ffb Switch from get(v,e) to e = get(v) 2020-06-20 17:57:09 -07:00
John Keiser f336103f63 Convert tools/docs/benchmarks to bool get() idiom 2020-06-20 17:55:46 -07:00
John Keiser 56e2b38048 Add bool result from tie()/get(), get<T>(T&,error_code&) 2020-06-20 17:55:46 -07:00
John Keiser 0b8c357eff Add get_X and is_X methods 2020-06-19 13:27:33 -07:00
John Keiser efc168f473 Make test changes only 2020-06-19 13:27:33 -07:00
John Keiser d8428f98d9 Add cast_tester.h 2020-06-19 13:27:33 -07:00
John Keiser 60f17d26a3 Move test macros to a header 2020-06-19 13:27:00 -07:00
Daniel Lemire 5ccdbef7d5
Merge pull request #936 from simdjson/dlemire/new_examples
New examples.
2020-06-18 18:29:06 -04:00
Daniel Lemire c13c2650a2
Merge pull request #940 from simdjson/issue938
Verifying (and fixing) issue 938
2020-06-18 18:25:31 -04:00
John Keiser f632e7c043 Put C++11 capable version back, change name to readme style 2020-06-18 12:50:49 -07:00
Daniel Lemire 04a19f9813 Fixes https://github.com/simdjson/simdjson/issues/937 2020-06-17 18:06:13 -04:00
Daniel Lemire 0655a135e6 Reverting. 2020-06-17 17:52:07 +00:00
Daniel Lemire 4474f8ef18 Cleaning a bit the examples. 2020-06-17 16:24:55 +00:00
Daniel Lemire 6537d0dc76 Avoiding the unused errors. 2020-06-17 14:19:58 +00:00
Daniel Lemire 8d609607e2 Verifying the bug. 2020-06-16 20:04:09 -04:00
Daniel Lemire 27a75a9085 Tweaking. 2020-06-15 17:54:34 -04:00
Daniel Lemire 954d6c326d New examples. 2020-06-15 17:45:15 -04:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire 89b059b1ea
Testing with GCC 10 and clang 10 (#926)
* Testing with GCC 10 and clang 10

* Fixing spurious space

* gcc10 does not need the cmake installation.

* We don't want to run the perf test on ARM. I ignore them systematically. ARM performance
should be assessed manually.

* Switching to GCC 10 and Clang 10

* Disabling some tests under sanitizers when they involve rapidjson or other parsers.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 17:58:53 -04:00
Daniel Lemire 4dfbf98e4e
Using a worker instead of a thread per batch (#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.

To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.

  This fixes our parse_stream benchmark which is just busted.
  This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.

Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.

Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 16:51:18 -04:00