* Entering a new UTF-8 test
* Maybe *I* had a bug in the tests.
* Replacing nulls with 1s.
* Let us try to be more verbose.
* Return 0.
* Fixing issue.
* Adding puzzler scenario.
* Fixing PPC64
Co-authored-by: Daniel Lemire <dlemire@rcs-power9-talos>
* Reenabling the optimized kernels (main branch).
* Defining SIMDJSON_CAN_ALWAYS_RUN_PPC64 and SIMDJSON_CAN_ALWAYS_RUN_ARM64
* Adding the bad UTF8 string from the fuzzer.
* Taking into account John's comments.
* Bumping the lib version.
* Update CMakeLists.txt
* Bump minimum CMake version
* Remove unnecessary git checks
* Move benchmark options where they are used
* Declare helper functions for dependencies
The custom solution here is tailored for fast configure times, but only
works for dependencies on Github.
* Import dependencies using the declared commands
* Remove git submodules
* Call target_link_libraries properly
target_link_libraries must not be called without a requirement
specifier.
* Fix includes for competition
Co-authored-by: friendlyanon <friendlyanon@users.noreply.github.com>
fix uninteded early return in ondemand unit test loops
go through and fix warnings appearing in qtcreator,
qualify with std::, add const, abort on error
get rid of ulp_distance, not needed anymore when parsing is exact
Introduce cmake option SIMDJSON_DISABLE_DEPRECATED_API (default Off)
which turns off deprecated simdjson api functions by setting the macro
SIMDJSON_DISABLE_DEPRECATED_API.
For non-cmake users, users will have to set SIMDJSON_DISABLE_DEPRECATED_API
by some other means to disable the api.
Closes#1264
* Adding a distinct user id benchmark
* reenabling everything
* Removing an unnecessary "value()".
* Better tests of the examples and some fixes.
* Guarding exception code.
* Reenable the on-demand tests and allows us to convert a raw string into a C++ string.
* Fixing a 1-byte buffer overrun.
* More documentation.
* Adding more tests.
* Enabling the new tests
* Committing a nicer example.
* Not yet happy but this should fix our failures.
* Duh.
* Ok. Making it easier to get string_view instances from field instances.
* It is a struct.
* Trying to satisfy VS.
* Adopting John's name.
* add definitions for is_number and tie (by lemire)
* add fuzzer for element
* update fuzz documentation
* fix UB in creating an empty padded string
* don't bother null terminating padded_string, it is done by the std::memset already
* refactor fuzz data splitting into a separate class
* This would allow users to find out what builtin is.
* Trying another approach.
* Added instructions.
* Cleaning up the printout.
* Let us be less invasive.
* Adding a comment.
* This adds new tests regarding ordering.
* Updating the documentation with more examples.
* Adding compilation tests.
* Pruning code for exceptions.
* Guarding exceptionless.
* Remove our dependency on strtod_l by bundling our own slow path.
* Ok. Let us drop strtod entirely.
* Trimming down the powers to -342.
* Removing useless line.
* Many more comments.
* Adding some DLL exports.
* Let the gods help those who rely on windows+gcc.
* Marking the subnormals as unlikely. This is pretty much "performance neutral", but it might help just a bit with twitter.json.
* Make it possible to check that an implementation is supported at runtime.
* add CI fuzzing on arm 64 bit
This adds fuzzing on drone.io arm64
For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.
Closes: #1188
* Guarding the implementation accesses.
* Better doc.
* Updating cxxopts.
* Make it possible to check that an implementation is supported at runtime.
* Guarding the implementation accesses.
* Better doc.
* Updating cxxopts.
* We need to accomodate cxxopts
Co-authored-by: Paul Dreik <github@pauldreik.se>
* Adding new files.
* Better.
* Fixing minifier and adding tests.
* Adding benchmarks.
* Including the array header.
* Replacing old stream-based code by the new code.
* Doubling up the itoa.
* Hidden away to_chars in internal namespace.
* Removing the repetitions.
* Documented the atoi functions.
* Tuning the escape sequences.
* Moving the operators off the main namespace.
* Added more tests.
* Tweaking the implementation so that it works with and without exp.
* The string_builder template and mini_formatter class
are not part of our public API and are subject to change
at any time!
* Adding a benchmark and some optimization.
* Cleaning.
* Strictly speaking, this header is needed.
* This avoids locale-dependent number parsing at the standard library level.
* Adding missing cast.
* Inserting the missing "endif"
* Trial and error.
* Another attempt.
* Another tweak.
* Another fix.
* Restricting it even more.
* Tweaking our symbol checks.
* Somewhat smarter tests.
* Nice comments.
* Minor simplification.
* Adding cerr.
* Adding test.
* Saving.
* With exceptions.
* Added extensive tests.
* Better documentation.
* Tweaking CI
* Cleaning.
* Do not assume make.
* Let us make the build verbose
* Reorg
* I do not understand how circle ci works.
* Breaking it up.
* Better syntax.
* Specification is not followed.
* Fixes.
* Do not pass string_view by reference.
* Better documentation.
* The example is written for exceptions.
* Better documentation.
* Updating with deprecation.
* Updating example.
* Updating example.
* This allows the users to disable threading.
* This would disable bash scripts under FreeBSD. (#1118)
* This would disable bash scripts under FreeBSD.
* Let us also disable GIT.
* Let us try to just disable GIT
* Nope. We must have both bash and git disabled.
* This allows the users to disable threading.
* This would disable bash scripts under FreeBSD.
* Let us also disable GIT.
* Let us try to just disable GIT
* Nope. We must have both bash and git disabled.
* The initial motivation behind basictests was for a quick set of sanity tests to check whether your code made sense. It
was not meant for thorough testing to find corner cases. However, over time, it grew to include such expensive tests.
This PR takes them out. It also allows us to bring back basictests to MinGW tests, since it is now cheap.
This is not an exercise in software engineering and making things prettier. This is a pragmatic change to improve our
test coverage and quality of life.
* Adds many more cheap tests.
Co-authored-by: Daniel Lemire <lemire@gmai.com>
* Testing with GCC 10 and clang 10
* Fixing spurious space
* gcc10 does not need the cmake installation.
* We don't want to run the perf test on ARM. I ignore them systematically. ARM performance
should be assessed manually.
* Switching to GCC 10 and Clang 10
* Disabling some tests under sanitizers when they involve rapidjson or other parsers.
Co-authored-by: Daniel Lemire <lemire@gmai.com>
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.
To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.
This fixes our parse_stream benchmark which is just busted.
This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.
Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.
Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>