Commit Graph

422 Commits

Author SHA1 Message Date
John Keiser 041d59cc17 Create acceptance_tests, all_tests, etc. make targets
And use them for mingw build and test
2020-12-23 09:14:45 -08:00
John Keiser a405173d59 Break up ondemand_basictests into smaller executables
(Allows for faster compilation speeds)
2020-12-23 09:14:44 -08:00
John Keiser ad4f718e0c Make assert test succeed (not run anything) in Release builds 2020-12-21 10:52:01 -08:00
John Keiser a1cf588d5f Fix GCC 7 warning when inlining does its job 2020-12-21 09:19:07 -08:00
John Keiser 195acc3e45 Add find_field / find_field_unordered to object 2020-12-20 11:39:50 -08:00
John Keiser b8426584fc Make field lookup order-insensitive. 2020-12-19 13:57:29 -08:00
Daniel Lemire 85001c55fb
Fixing UTF-8 validation under PPC64 (#1346)
* Entering a new UTF-8 test

* Maybe *I* had a bug in the tests.

* Replacing nulls with 1s.

* Let us try to be more verbose.

* Return 0.

* Fixing issue.

* Adding puzzler scenario.

* Fixing PPC64

Co-authored-by: Daniel Lemire <dlemire@rcs-power9-talos>
2020-12-19 10:42:27 -05:00
John Keiser f785f76d98
Merge pull request #1342 from simdjson/jkeiser/iter-safety
Safety: assert in debug mode when on demand values are used out of order
2020-12-16 15:58:29 -08:00
John Keiser d670906ff3 Don't keep a separate at_start boolean in object 2020-12-15 11:29:31 -08:00
Daniel Lemire c578f63f34
Fixing issue 1337 (#1338)
* Fixing issue 1337

* This test should always run.

* Removing bad attribute access.
2020-12-14 16:20:36 -05:00
Daniel Lemire 9a404bdcdc
Reenabling the optimized kernels (main branch). (#1344)
* Reenabling the optimized kernels (main branch).

* Defining SIMDJSON_CAN_ALWAYS_RUN_PPC64 and SIMDJSON_CAN_ALWAYS_RUN_ARM64

* Adding the bad UTF8 string from the fuzzer.

* Taking into account John's comments.

* Bumping the lib version.

* Update CMakeLists.txt
2020-12-14 14:28:45 -05:00
John Keiser f9c6dedca1 Add test for out-of-order parse asserts 2020-12-13 13:39:47 -08:00
John Keiser 806cb39103 Clean up some more benchmark/test code 2020-12-07 13:44:58 -08:00
John Keiser 3aa3175378 Add tests for recovering from partial child iteration 2020-12-06 16:19:06 -08:00
John Keiser 4c63956624 Enable object["x"]["y"] 2020-12-06 15:23:52 -08:00
John Keiser c89647af9e Make all values use same underlying iterator 2020-12-06 15:23:52 -08:00
John Keiser 1303a88769 Unconditionally track depth 2020-12-06 15:23:52 -08:00
Daniel Lemire aaa12c8fb6
more thread testing (#1324)
* Adding a few tests.

* More tests.

* Trailing.

* More links/documentations.
2020-12-02 11:33:36 -05:00
Daniel Lemire 9304d88920
Prototype test for issue 1299: using parse_many, find the location of the end of the last document (#1301)
* Prototype test for issue 1299.

* This improves the documentation.

* Removing trailing white spaces.

* Removing trailing spaces

* Trailing.
2020-12-01 15:59:20 -05:00
Daniel Lemire d9c4191e8a
This should fix an issue we have with unclosed strings with document_stream (#1321)
* This should fix an issue we have with unclosed strings.

* Patching the fallback kernel.

* Better guarding the code.

* Patching the fallback.
2020-11-30 13:43:57 -05:00
Daniel Lemire dc69bc28ae
Trying to verify recent document stream issues. (#1318)
* Trying to verify recent document stream issues.

* Adding another one.

* More thorough tests.

* Removing trailing spaces.

* Working toward exposing some issues.

* Tweaking.
2020-11-27 17:04:10 -05:00
Daniel Lemire 1abb64f6f6
This fixes issue 1302 (#1303)
* This verifies issue 1302

* I believe that this fixes the issue.

* Let us do it differently.

* Better comments.
2020-11-27 14:19:36 -05:00
Daniel Lemire 5609851747
This will guard the batch_size so it cannot be so low as to accidentally burn through your CPU. (#1319)
* This will guard the batch_size so it cannot be so low as to accidentally burn through your CPU.
2020-11-27 09:37:54 -05:00
friendlyanon c805fc28a4
Remove git modules (#1258)
* Bump minimum CMake version

* Remove unnecessary git checks

* Move benchmark options where they are used

* Declare helper functions for dependencies

The custom solution here is tailored for fast configure times, but only
works for dependencies on Github.

* Import dependencies using the declared commands

* Remove git submodules

* Call target_link_libraries properly

target_link_libraries must not be called without a requirement
specifier.

* Fix includes for competition

Co-authored-by: friendlyanon <friendlyanon@users.noreply.github.com>
2020-11-04 13:34:29 -05:00
Paul Dreik af4db55e66
remove trailing whitespace (#1284) 2020-11-03 21:48:09 +01:00
Paul Dreik 47669566da
fix unintended early return in ondemand unit test loops (#1263)
fix uninteded early return in ondemand unit test loops

go through and fix warnings appearing in qtcreator,

qualify with std::, add const, abort on error

get rid of ulp_distance, not needed anymore when parsing is exact
2020-11-01 19:08:40 +01:00
Paul Dreik 0b82f07115
fix segfault in numberparsing #1273 (#1274)
This was a read overflow.
2020-11-01 18:27:21 +01:00
Paul Dreik f93fb21c95
optionally disable deprecated apis (#1271)
Introduce cmake option SIMDJSON_DISABLE_DEPRECATED_API (default Off)
which turns off deprecated simdjson api functions by setting the macro
 SIMDJSON_DISABLE_DEPRECATED_API.

For non-cmake users, users will have to set SIMDJSON_DISABLE_DEPRECATED_API
by some other means to disable the api.

Closes #1264
2020-11-01 06:38:52 +01:00
Daniel Lemire a8bf10ea5a Minor patch. 2020-10-30 14:51:50 -04:00
Daniel Lemire a75c07065f
Fix for issue 1246. We document the relationship between parser instances and elements (#1250)
* Fix for issue 1246.

* Adopting John's wording.
2020-10-26 08:40:45 -04:00
Daniel Lemire 14039d05a9
Adding a new benchmark for ondemand: distinct user id (#1239)
* Adding a distinct user id benchmark

* reenabling everything

* Removing an unnecessary "value()".

* Better tests of the examples and some fixes.

* Guarding exception code.
2020-10-23 08:47:01 -04:00
Daniel Lemire 61fb4244f2
There are a couple of extra value() that should not be there (#1242) 2020-10-21 12:37:22 -04:00
Daniel Lemire 0d6919dd99
Reenable the on-demand tests and allows us to convert a raw string into a C++ string. (#1232)
* Reenable the on-demand tests and allows us to convert a raw string into a C++ string.

* Fixing a 1-byte buffer overrun.

* More documentation.

* Adding more tests.

* Enabling the new tests

* Committing a nicer example.

* Not yet happy but this should fix our failures.

* Duh.

* Ok. Making it easier to get string_view instances from field instances.

* It is a struct.

* Trying to satisfy VS.

* Adopting John's name.
2020-10-19 20:22:24 -04:00
Paul Dreik f1b4a54991
add fuzz element (#1204)
* add definitions for is_number and tie (by lemire)
* add fuzzer for element
* update fuzz documentation
* fix UB in creating an empty padded string
* don't bother null terminating padded_string, it is done by the std::memset already
*  refactor fuzz data splitting into a separate class
2020-10-17 05:48:50 +02:00
Daniel Lemire 07a6e098c8
This would allow users to find out what builtin is. (#1227)
* This would allow users to find out what builtin is.

* Trying another approach.

* Added instructions.

* Cleaning up the printout.

* Let us be less invasive.

* Adding a comment.
2020-10-15 21:58:42 -04:00
Daniel Lemire 3cd98df30d
This adds new tests regarding ordering. (#1233)
* This adds new tests regarding ordering.

* Updating the documentation with more examples.

* Adding compilation tests.

* Pruning code for exceptions.

* Guarding exceptionless.
2020-10-15 16:41:14 -04:00
Daniel Lemire bb2bc98a22
Fix issue https://github.com/simdjson/simdjson/issues/1127 (#1224) 2020-10-13 09:18:54 -04:00
Daniel Lemire 37e6d1e9c7
new number parsing (#1222)
* Remove our dependency on strtod_l by bundling our own slow path.

* Ok. Let us drop strtod entirely.

* Trimming down the powers to -342.

* Removing useless line.

* Many more comments.

* Adding some DLL exports.

* Let the gods help those who rely on windows+gcc.

* Marking the subnormals as unlikely. This is pretty much "performance neutral", but it might help just a bit with twitter.json.
2020-10-10 12:47:49 -04:00
John Keiser 676a3d068c Add non-top-level array iteration test 2020-10-06 11:29:46 -07:00
John Keiser 5533f8d87b Add object iteration error tests 2020-10-06 11:29:46 -07:00
John Keiser 93af7b61ce Add array iteration error tests 2020-10-06 11:29:46 -07:00
John Keiser 364ad5529d Add basic error tests 2020-10-06 11:29:46 -07:00
John Keiser 1974a46fe0 Remove validate tests and unimplemented cast tests from ondemand 2020-10-06 11:29:46 -07:00
John Keiser 5327ab9903 Remove ondemand minify and format tests 2020-10-06 11:29:46 -07:00
John Keiser 235d191bae Add Twitter exception tests 2020-10-06 11:29:46 -07:00
John Keiser 4eb80ec75a Add DOM API exception tests 2020-10-06 11:29:45 -07:00
John Keiser 2900459222 Separate twitter tests from DOM API tests 2020-10-06 11:29:45 -07:00
John Keiser 9088792b0e Disable value["x"] (unsafe, convert to object first) 2020-10-06 11:29:45 -07:00
John Keiser b41fe7beab Add object indexing tests 2020-10-06 11:29:45 -07:00
John Keiser 00f9bb8a07 Add null tests 2020-10-06 11:29:45 -07:00
John Keiser a90e1637cb Add boolean tests 2020-10-06 11:29:45 -07:00
John Keiser 4bad5c0241 Add numeric value tests 2020-10-06 11:29:45 -07:00
John Keiser ce09d82fc7 Test strings 2020-10-06 11:29:45 -07:00
John Keiser 5b926b8196 Support array iteration over document 2020-10-06 11:29:45 -07:00
John Keiser c719ccdb48 Add tests for empty object/array 2020-10-06 11:29:45 -07:00
John Keiser 9f1786aeb1 Add .get(T) to value/document 2020-10-06 11:29:45 -07:00
John Keiser 0bb83e06bc Fix root number parsing 2020-10-06 11:29:45 -07:00
John Keiser 99bc591366 Don't enable fallback test unless fallback is explicitly selected 2020-10-06 11:29:45 -07:00
John Keiser 8b3c8820e0 Fix numberparsingcheck to define found_invalid_number before simdjson.h 2020-10-06 11:29:39 -07:00
John Keiser ebcb3c6b3b On-demand parse implementation 2020-10-04 12:47:29 -07:00
Daniel Lemire f1841e48b3
Minor fixes to some headers (tweak) (#1198) 2020-10-02 12:29:05 -04:00
Daniel Lemire 9865bb6904
Make it possible to check that an implementation is supported at runtime (#1197)
* Make it possible to check that an implementation is supported at runtime.

* add CI fuzzing on arm 64 bit

This adds fuzzing on drone.io arm64

For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.

Closes: #1188

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* Make it possible to check that an implementation is supported at runtime.

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* We need to accomodate cxxopts

Co-authored-by: Paul Dreik <github@pauldreik.se>
2020-10-02 11:04:51 -04:00
Daniel Lemire 8b5a89c136
Parsing floats with 19 significant digits should be fine. (#1191)
* Parsing floats with 19 significant digits should be fine.

* Adding more tests with very long mantissa.
2020-09-29 19:42:43 -04:00
Daniel Lemire da093c1982
Fixing "undefined behavior" issue in new fast_itoa functions (#1186)
* Fixing "undefined behavior" issue.

* Simplifying our custom atoi

* Fixing minor bug
2020-09-29 19:17:03 -04:00
Daniel Lemire 048fb6278a
This adds two tests to verify a new fuzzer issue. (So far I could not verify.) (#1194) 2020-09-29 11:45:41 -04:00
Daniel Lemire 0e584fa4a5
Attempt to fix issue 1187. (#1192) 2020-09-27 12:04:47 -04:00
Daniel Lemire 60c139a844
Faster and more correct serialization (#1168)
* Adding new files.

* Better.

* Fixing minifier and adding tests.

* Adding benchmarks.

* Including the array header.

* Replacing old stream-based code by the new code.

* Doubling up the itoa.

* Hidden away to_chars in internal namespace.

* Removing the repetitions.

* Documented the atoi functions.

* Tuning the escape sequences.

* Moving the operators off the main namespace.

* Added more tests.

* Tweaking the implementation so that it works with and without exp.

* The string_builder template and mini_formatter class
 are not part of  our public API and are subject to change
 at any time!

* Adding a benchmark and some optimization.

* Cleaning.

* Strictly speaking, this header is needed.
2020-09-23 10:00:39 -04:00
Daniel Lemire f410213003
Improve documentation on padding
- Improves and clarifies the documentation on padding.
 - Use std:: prefix for memcpy, strlen etc.

Related to issues #1175 and #1178
2020-09-23 09:07:14 +02:00
Daniel Lemire 19cb5d57db
Some minor documentation fixes. (#1177) 2020-09-17 13:17:35 -04:00
Daniel Lemire 72c83d9430
This avoids locale-dependent number parsing at the standard library level (#1157)
* This avoids locale-dependent number parsing at the standard library level.

* Adding missing cast.

* Inserting the missing "endif"

* Trial and error.

* Another attempt.

* Another tweak.

* Another fix.

* Restricting it even more.

* Tweaking our symbol checks.

* Somewhat smarter tests.

* Nice comments.

* Minor simplification.

* Adding cerr.
2020-09-15 11:36:18 -04:00
Daniel Lemire bfbac12f76
We were forgetting to check the end bytes at the end of the UTF8 validation. (#1173)
* We were forgetting to check the end bytes at the end of the UTF8 validation.

* Silencing the sanitizer

* Better explanation.
2020-09-15 11:33:09 -04:00
Daniel Lemire 3e5497e2f9
Fixes issue 1170 and makes the usage of minify easier. (#1171)
* Fixes issue 1170 and makes the usage of minify easier.

* This should get the fallback implementation to detect unclosed strings.
2020-09-12 16:20:20 -04:00
Daniel Lemire 0552335ec1
Fixing the issue. (#1151) 2020-09-02 18:41:59 -04:00
Daniel Lemire 7aea774b21
Adding a tests and a fix for empty strings in at_pointer (#1148)
* Adding a test.

* More tests.
2020-09-02 17:04:56 -04:00
Daniel Lemire 5b10c38e43
Make parse_many safer. (#1137) 2020-08-20 22:22:46 -04:00
Daniel Lemire 3316df9195
Adding test for issue 1133 and improving documentation (#1134)
* Adding test.

* Saving.

* With exceptions.

* Added extensive tests.

* Better documentation.

* Tweaking CI

* Cleaning.

* Do not assume make.

* Let us make the build verbose

* Reorg

* I do not understand how circle ci works.

* Breaking it up.

* Better syntax.
2020-08-20 14:03:14 -04:00
Daniel Lemire 8a8eea53a2
Prefixing macros (issue 1035) (#1124)
* Renaming partially done.

* More prefixing.

* I thought that this was fixed.

* Missed one.

* Missed a few.

* Missed another one.

* Minor fixes.
2020-08-18 18:25:36 -04:00
Daniel Lemire 09bd7e8ef8
Verification and fix for issue 1063 (JSON Pointers) (#1064)
* Specification is not followed.

* Fixes.

* Do not pass string_view by reference.

* Better documentation.

* The example is written for exceptions.

* Better documentation.

* Updating with deprecation.

* Updating example.

* Updating example.
2020-08-18 17:23:18 -04:00
John Keiser 9356619380
Merge pull request #1110 from simdjson/jkeiser/number-corruption
Fix potential buffer overrun with heavily customized input and padding
2020-08-18 14:17:25 -07:00
Daniel Lemire fc15147cf5
This allows the users to disable threading. (#1122)
* This allows the users to disable threading.

* This would disable bash scripts under FreeBSD. (#1118)

* This would disable bash scripts under FreeBSD.

* Let us also disable GIT.

* Let us try to just disable GIT

* Nope. We must have both bash and git disabled.

* This allows the users to disable threading.
2020-08-18 16:43:08 -04:00
John Keiser fa355603fb Add test for corruption while parsing a number 2020-08-18 10:10:01 -07:00
Daniel Lemire 4a6eebc0e4
This corrects a small typo in the documentation. (#1121)
* This corrects a small typo in the documentation.

* Modifying the test as well.
2020-08-18 08:36:15 -04:00
Daniel Lemire 501fed6c4f
This would disable bash scripts under FreeBSD. (#1118)
* This would disable bash scripts under FreeBSD.

* Let us also disable GIT.

* Let us try to just disable GIT

* Nope. We must have both bash and git disabled.
2020-08-17 11:50:57 -04:00
Daniel Lemire daeca1bb18
Basics. (#1116) 2020-08-14 17:28:09 -04:00
Daniel Lemire 83615ff351
Fixes issue 1088 (#1096) 2020-08-06 11:42:13 -04:00
Daniel Lemire 039d82ff1b
Returning basictests to its original function: basic tests (only) (#1010)
* The initial motivation behind basictests was for a quick set of sanity tests to check whether your code made sense. It
was not meant for thorough testing to find corner cases. However, over time, it grew to include such expensive tests.
This PR takes them out. It also allows us to bring back basictests to MinGW tests, since it is now cheap.

This is not an exercise in software engineering and making things prettier. This is a pragmatic change to improve our
test coverage and quality of life.

* Adds many more cheap tests.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-07-13 09:39:35 -04:00
Daniel Lemire 74870a8189
Fixing issue 1013. (#1016)
* Fixing issue 1013.

* Bumping to 0.4.6

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-07-01 14:14:51 -04:00
Daniel Lemire 0ef4d90ad0
Fix for issue 1014. (#1015)
* Fix for issue 1014.

* Explanation.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-30 19:36:26 -04:00
Daniel Lemire ccc94c9b05
Mingw tests (32-bit and 64-bit) (#1004) 2020-06-29 21:10:54 -04:00
Daniel Lemire cb8a9ef2c0 This removes git as a dependency 2020-06-24 15:13:47 -04:00
John Keiser 187084ce46
Merge pull request #970 from simdjson/jkeiser/singleheader-tests
Make singleheader tests be test-only
2020-06-23 17:07:03 -07:00
Daniel Lemire 544fa57641 Damn merge conflicts. 2020-06-23 19:15:47 -04:00
John Keiser 843b73dedb Make singleheader tests be test-only 2020-06-23 13:35:27 -07:00
Daniel Lemire b84a3a0230
Merge branch 'master' into issue961 2020-06-23 14:33:06 -04:00
John Keiser 257089884f
Merge pull request #958 from simdjson/jkeiser/is
Make simdjson_result<element>.is() return bool
2020-06-23 09:51:37 -07:00
John Keiser c650ea9765
Merge pull request #960 from simdjson/jkeiser/idiomatic-get
Convert simdjson to use .get()
2020-06-23 09:49:41 -07:00
John Keiser 2d84b6f6d9 Make simdjson_result<element>.is() return bool 2020-06-23 09:09:24 -07:00
John Keiser eef1171944
Merge pull request #954 from simdjson/jkeiser/parse-many-result
Return error from parse_many
2020-06-23 09:06:20 -07:00
Daniel Lemire 696b0e29e4 Fixing issue 961 2020-06-23 10:47:32 -04:00
Daniel Lemire dada5090b0 These compilers are insane. 2020-06-22 20:25:55 -04:00
Daniel Lemire 1c4593c648 These compilers are really pedantic. 2020-06-22 20:04:37 -04:00
Daniel Lemire e7004cef76 Removing a test so that it is all ASCII. 2020-06-22 16:55:16 -04:00
Daniel Lemire 2bb101bd19 Code reformatting. 2020-06-22 16:50:57 -04:00
Daniel Lemire 26baf70912 Pedantic compiler 2020-06-22 16:45:32 -04:00
Daniel Lemire 69a247d500 Adding tests. 2020-06-22 16:12:37 -04:00
Daniel Lemire a76c67c19f Fixing... 2020-06-22 15:57:54 -04:00
John Keiser 0c9dc11550 Use really_inline to help g++ detect initialized variable 2020-06-21 16:27:05 -07:00
John Keiser 1ff55c2729 Replace auto [x,error] with .get() everywhere 2020-06-21 16:26:59 -07:00
Daniel Lemire 38bb08778a With an example. 2020-06-21 17:57:22 -04:00
Daniel Lemire 5dbcdf1484 Ok 2020-06-21 17:52:30 -04:00
John Keiser 6fa5abcd7e Replace x.get<T>() with x.get(v) or T(x) 2020-06-21 14:36:38 -07:00
John Keiser 1b1a122b1f Fix copy constructor issue on older gcc 2020-06-21 12:06:14 -07:00
John Keiser ae1bd891e7 Remove deprecated uses of parse_many 2020-06-21 11:19:06 -07:00
John Keiser 9899e5021d Allow use of document_stream with tie() 2020-06-20 21:15:05 -07:00
John Keiser a7fc7d4ffb Switch from get(v,e) to e = get(v) 2020-06-20 17:57:09 -07:00
John Keiser f336103f63 Convert tools/docs/benchmarks to bool get() idiom 2020-06-20 17:55:46 -07:00
John Keiser 56e2b38048 Add bool result from tie()/get(), get<T>(T&,error_code&) 2020-06-20 17:55:46 -07:00
John Keiser 0b8c357eff Add get_X and is_X methods 2020-06-19 13:27:33 -07:00
John Keiser efc168f473 Make test changes only 2020-06-19 13:27:33 -07:00
John Keiser d8428f98d9 Add cast_tester.h 2020-06-19 13:27:33 -07:00
John Keiser 60f17d26a3 Move test macros to a header 2020-06-19 13:27:00 -07:00
Daniel Lemire 5ccdbef7d5
Merge pull request #936 from simdjson/dlemire/new_examples
New examples.
2020-06-18 18:29:06 -04:00
Daniel Lemire c13c2650a2
Merge pull request #940 from simdjson/issue938
Verifying (and fixing) issue 938
2020-06-18 18:25:31 -04:00
John Keiser f632e7c043 Put C++11 capable version back, change name to readme style 2020-06-18 12:50:49 -07:00
Daniel Lemire 04a19f9813 Fixes https://github.com/simdjson/simdjson/issues/937 2020-06-17 18:06:13 -04:00
Daniel Lemire 0655a135e6 Reverting. 2020-06-17 17:52:07 +00:00
Daniel Lemire 4474f8ef18 Cleaning a bit the examples. 2020-06-17 16:24:55 +00:00
Daniel Lemire 6537d0dc76 Avoiding the unused errors. 2020-06-17 14:19:58 +00:00
Daniel Lemire 8d609607e2 Verifying the bug. 2020-06-16 20:04:09 -04:00
Daniel Lemire 27a75a9085 Tweaking. 2020-06-15 17:54:34 -04:00
Daniel Lemire 954d6c326d New examples. 2020-06-15 17:45:15 -04:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire 89b059b1ea
Testing with GCC 10 and clang 10 (#926)
* Testing with GCC 10 and clang 10

* Fixing spurious space

* gcc10 does not need the cmake installation.

* We don't want to run the perf test on ARM. I ignore them systematically. ARM performance
should be assessed manually.

* Switching to GCC 10 and Clang 10

* Disabling some tests under sanitizers when they involve rapidjson or other parsers.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 17:58:53 -04:00
Daniel Lemire 4dfbf98e4e
Using a worker instead of a thread per batch (#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.

To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.

  This fixes our parse_stream benchmark which is just busted.
  This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.

Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.

Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 16:51:18 -04:00
Daniel Lemire 45e2178ada Duh. 2020-06-11 17:20:28 +00:00
Daniel Lemire a6e4933d93 Exposing the string minifier. 2020-06-11 13:07:18 -04:00
John Keiser fe01da077e Make threaded version work again 2020-06-07 16:21:00 -07:00
John Keiser 3e226795f0 Run all passing json against parse_many. Empty documents pass, too. 2020-06-07 16:20:51 -07:00
John Keiser c4a0fe1606 Add tests for parse_many() errors 2020-06-07 16:20:46 -07:00
John Keiser ef63a84a3e Move document stream state to implementation 2020-06-07 16:20:44 -07:00
Daniel Lemire 7a69da16e4
Fixing issue 906 (#912)
* Fixing issue 906

* Safe patching.

* Now with explanations.

* Bumping up memory allocation.

* Putting the patch back.

* fallback fixes.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-05 15:37:09 -04:00
Daniel Lemire 12150baa5e
Using just ASCII. (#899)
* Using just ASCII.

* Let us prune checkperf.

* Moving the description of lookup2 to the HACKING.md file.
2020-05-21 21:59:06 -04:00
Daniel Lemire d2c9ea8a9a
Detect bash instead of relying on MSVC detection. (#894) 2020-05-20 12:13:14 -04:00
John Keiser 5312fd30e5 Fix CRT_SECURE warnings in clang 2020-05-04 11:36:00 -07:00
John Keiser 1d06624d38 Unset /D_CRT_SECURE_NO_WARNINGS
- Also localize DISABLE_DEPRECATED_WARNING so that we catch other
  deprecations
2020-05-04 11:35:05 -07:00
Furkan Usta 064eb0b24f CMake: Make simdjson-internal-flags subsume simdjson-flags 2020-05-03 02:48:29 +03:00
Furkan Usta af968c5b44 Merge branch 'master' of github.com:simdjson/simdjson into cmake-flags 2020-05-03 02:12:23 +03:00
Furkan Usta 1e9488d4a6 Remove Microsoft comment regarding dirent in parsingchecks 2020-05-02 16:01:30 +03:00
Furkan Usta ff1d77ead9 Add NOMINMAX to parsingchecks 2020-05-02 15:33:53 +03:00