* Minor edits regarding the On Demand documentation.
* Adding more instructions for CMake
* Tweaking.
* Adding changes requested by John.
* Bringing back detailed explanations of -march=native.
* Initial PPC64 support
* Add travis CI
* Fix outdated cmake version for travis
* Fix indendtation
* Try another workaround for outdated cmake in travis
* Try beta cmake
* Add dash before beta
* Use builtin snaps
* Use cmake as rocksdb
* Test cmake on bionic
* Remove unnecessary things from travis
* Remove unnecessary things from travis
* Another try of compiler install
* Add all major compilers
* Add all major compilers
* Add all major compilers
* Tweak travis a bit
* Typo
* More robust travis
* Typos typos typos
* Add fewer compilers, add non specific build for clang and gcc, should be the final config
* CMAKE_FLAGS is in incorrect place
* Remove default implementation
* Limit build thread number
* Fall back prefix_xor to a usual implementation, no performance boost is noticed
* Test for power9 as it is the main architecture for OpenPOWER right now
* Add to documentation to build with power9 as the implementation is compatible but compiler optimizations is not
* Replace ARM with PPC in the comment
* Adding a distinct user id benchmark
* reenabling everything
* Removing an unnecessary "value()".
* Better tests of the examples and some fixes.
* Guarding exception code.
* Reenable the on-demand tests and allows us to convert a raw string into a C++ string.
* Fixing a 1-byte buffer overrun.
* More documentation.
* Adding more tests.
* Enabling the new tests
* Committing a nicer example.
* Not yet happy but this should fix our failures.
* Duh.
* Ok. Making it easier to get string_view instances from field instances.
* It is a struct.
* Trying to satisfy VS.
* Adopting John's name.
* This would allow users to find out what builtin is.
* Trying another approach.
* Added instructions.
* Cleaning up the printout.
* Let us be less invasive.
* Adding a comment.
* This adds new tests regarding ordering.
* Updating the documentation with more examples.
* Adding compilation tests.
* Pruning code for exceptions.
* Guarding exceptionless.
* Make it possible to check that an implementation is supported at runtime.
* add CI fuzzing on arm 64 bit
This adds fuzzing on drone.io arm64
For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.
Closes: #1188
* Guarding the implementation accesses.
* Better doc.
* Updating cxxopts.
* Make it possible to check that an implementation is supported at runtime.
* Guarding the implementation accesses.
* Better doc.
* Updating cxxopts.
* We need to accomodate cxxopts
Co-authored-by: Paul Dreik <github@pauldreik.se>
* Adding test.
* Saving.
* With exceptions.
* Added extensive tests.
* Better documentation.
* Tweaking CI
* Cleaning.
* Do not assume make.
* Let us make the build verbose
* Reorg
* I do not understand how circle ci works.
* Breaking it up.
* Better syntax.
* Specification is not followed.
* Fixes.
* Do not pass string_view by reference.
* Better documentation.
* The example is written for exceptions.
* Better documentation.
* Updating with deprecation.
* Updating example.
* Updating example.
Links to other files need to be either relative to themselves (doc/performance.md -> performance.md) or absolute (doc/performance.md -> /doc/performance.md). This change fixes the documentation when read on GitHub.
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.
To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.
This fixes our parse_stream benchmark which is just busted.
This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.
Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.
Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>