* Make it possible to check that an implementation is supported at runtime.
* add CI fuzzing on arm 64 bit
This adds fuzzing on drone.io arm64
For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.
Closes: #1188
* Guarding the implementation accesses.
* Better doc.
* Updating cxxopts.
* Make it possible to check that an implementation is supported at runtime.
* Guarding the implementation accesses.
* Better doc.
* Updating cxxopts.
* We need to accomodate cxxopts
Co-authored-by: Paul Dreik <github@pauldreik.se>
* Adding new files.
* Better.
* Fixing minifier and adding tests.
* Adding benchmarks.
* Including the array header.
* Replacing old stream-based code by the new code.
* Doubling up the itoa.
* Hidden away to_chars in internal namespace.
* Removing the repetitions.
* Documented the atoi functions.
* Tuning the escape sequences.
* Moving the operators off the main namespace.
* Added more tests.
* Tweaking the implementation so that it works with and without exp.
* The string_builder template and mini_formatter class
are not part of our public API and are subject to change
at any time!
* Adding a benchmark and some optimization.
* Cleaning.
* Strictly speaking, this header is needed.
* This avoids locale-dependent number parsing at the standard library level.
* Adding missing cast.
* Inserting the missing "endif"
* Trial and error.
* Another attempt.
* Another tweak.
* Another fix.
* Restricting it even more.
* Tweaking our symbol checks.
* Somewhat smarter tests.
* Nice comments.
* Minor simplification.
* Adding cerr.
* Adding test.
* Saving.
* With exceptions.
* Added extensive tests.
* Better documentation.
* Tweaking CI
* Cleaning.
* Do not assume make.
* Let us make the build verbose
* Reorg
* I do not understand how circle ci works.
* Breaking it up.
* Better syntax.
* Specification is not followed.
* Fixes.
* Do not pass string_view by reference.
* Better documentation.
* The example is written for exceptions.
* Better documentation.
* Updating with deprecation.
* Updating example.
* Updating example.
* This allows the users to disable threading.
* This would disable bash scripts under FreeBSD. (#1118)
* This would disable bash scripts under FreeBSD.
* Let us also disable GIT.
* Let us try to just disable GIT
* Nope. We must have both bash and git disabled.
* This allows the users to disable threading.
* This would disable bash scripts under FreeBSD.
* Let us also disable GIT.
* Let us try to just disable GIT
* Nope. We must have both bash and git disabled.
* The initial motivation behind basictests was for a quick set of sanity tests to check whether your code made sense. It
was not meant for thorough testing to find corner cases. However, over time, it grew to include such expensive tests.
This PR takes them out. It also allows us to bring back basictests to MinGW tests, since it is now cheap.
This is not an exercise in software engineering and making things prettier. This is a pragmatic change to improve our
test coverage and quality of life.
* Adds many more cheap tests.
Co-authored-by: Daniel Lemire <lemire@gmai.com>
* Testing with GCC 10 and clang 10
* Fixing spurious space
* gcc10 does not need the cmake installation.
* We don't want to run the perf test on ARM. I ignore them systematically. ARM performance
should be assessed manually.
* Switching to GCC 10 and Clang 10
* Disabling some tests under sanitizers when they involve rapidjson or other parsers.
Co-authored-by: Daniel Lemire <lemire@gmai.com>
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.
To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.
This fixes our parse_stream benchmark which is just busted.
This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.
Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.
Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
* This is an implementation of "size()" for arrays and objects.
* Adding benchmark
* Adding a size() remark in the documentation.
* Extending size() to result types.
To avoid using data belonging to a temporary, the parse functions are ref qualified to get a compile error if used on an rvalue. See https://github.com/simdjson/simdjson/issues/696
Compilation tests are also added, to make sure bad usage fails to compile.
Reviewed by jkeiser.
* Trying to correct the documentation so that it actually describes how the code behaves.
* tweaking the wording.
* Improving.
* Removing confusing sentence.
* Fixing formatting.
* Now with working example, tested.
* Added a smaller piece of code
* It is inconvenient to be unable to print a padded_string.
* Allows us to print the padded_string even when it is embedded in result object when exceptions are enabled.
* Fallback should use our scalar code.
* parse should have a nicer error message.
* Making it so that "minify" can use different architectures.
* Let us change the minifier competition so that it tests all implementations.
* Documenting the untaken optimization opportunity.
Co-authored-by: John Keiser <john@johnkeiser.com>
* Currently, document::stream contains an attribute that is a reference:
```
document::parser &parser;
```
Yet we try to have it default on the move operator:
```
stream &operator=(document::stream &&other) = default;
stream &operator=(const document::stream &) = delete; // Disallow copying
```
```
stream(document::stream &&other) = default;
stream(const document::stream &) = delete; // Disallow copying
```
I am not sure what the move is supposed to do with the reference.
I cannot find where we test the copy constructor and assignment. This has been concerned that it is either dead code or buggy code.
* Remove non-working, unnecessary move constructors
* We still want to disallow copies.
Co-authored-by: John Keiser <john@johnkeiser.com>
* Make architecture implementations virtual functions
- Easier to add new architectures (add implementation to implementation.cpp)
- Easier to add new algorithms / functions to architecture selection
(add to implementation.h, implement)
- Automatically select best implementation in static initialization
- Allow user to explicitly select implementation with a string (i.e.
parameter)
- Allow user to inspect current implementation name/description
- Allow user to list available implementations
- Eliminate architecture enum and architecture-based templating
- Add noexcept in non-inline functions
* Move implementation static methods to their own classes
* Detect best supported implementation on first use
* available_implementationsI() -> available_implementations
This creates a "document" class with only user-facing document state (no parser internals).
- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)
Usage:
```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```
```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
* Fix issue472: make JsonStream a template.
* Adding missing include.
* Tweaking headers and some minor formatting.
* Removing file from aggregation.
* Moving jsoncharutils
* Adding new header.
* Trying another header.
* Let us try to route around Visual Studio's nonesense.
* Fix for issue467
* Updating single-header
* Let us make it so that JsonStream is constructed from a padded_string which will avoid dangerous overruns.
* Fixing parse_stream
* Updating documentation.