* Testing with GCC 10 and clang 10
* Fixing spurious space
* gcc10 does not need the cmake installation.
* We don't want to run the perf test on ARM. I ignore them systematically. ARM performance
should be assessed manually.
* Switching to GCC 10 and Clang 10
* Disabling some tests under sanitizers when they involve rapidjson or other parsers.
Co-authored-by: Daniel Lemire <lemire@gmai.com>
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.
To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.
This fixes our parse_stream benchmark which is just busted.
This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.
Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.
Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
* This is an implementation of "size()" for arrays and objects.
* Adding benchmark
* Adding a size() remark in the documentation.
* Extending size() to result types.
To avoid using data belonging to a temporary, the parse functions are ref qualified to get a compile error if used on an rvalue. See https://github.com/simdjson/simdjson/issues/696
Compilation tests are also added, to make sure bad usage fails to compile.
Reviewed by jkeiser.
* Trying to correct the documentation so that it actually describes how the code behaves.
* tweaking the wording.
* Improving.
* Removing confusing sentence.
* Fixing formatting.
* Now with working example, tested.
* Added a smaller piece of code
* It is inconvenient to be unable to print a padded_string.
* Allows us to print the padded_string even when it is embedded in result object when exceptions are enabled.
* Fallback should use our scalar code.
* parse should have a nicer error message.
* Making it so that "minify" can use different architectures.
* Let us change the minifier competition so that it tests all implementations.
* Documenting the untaken optimization opportunity.
Co-authored-by: John Keiser <john@johnkeiser.com>
* Currently, document::stream contains an attribute that is a reference:
```
document::parser &parser;
```
Yet we try to have it default on the move operator:
```
stream &operator=(document::stream &&other) = default;
stream &operator=(const document::stream &) = delete; // Disallow copying
```
```
stream(document::stream &&other) = default;
stream(const document::stream &) = delete; // Disallow copying
```
I am not sure what the move is supposed to do with the reference.
I cannot find where we test the copy constructor and assignment. This has been concerned that it is either dead code or buggy code.
* Remove non-working, unnecessary move constructors
* We still want to disallow copies.
Co-authored-by: John Keiser <john@johnkeiser.com>
* Make architecture implementations virtual functions
- Easier to add new architectures (add implementation to implementation.cpp)
- Easier to add new algorithms / functions to architecture selection
(add to implementation.h, implement)
- Automatically select best implementation in static initialization
- Allow user to explicitly select implementation with a string (i.e.
parameter)
- Allow user to inspect current implementation name/description
- Allow user to list available implementations
- Eliminate architecture enum and architecture-based templating
- Add noexcept in non-inline functions
* Move implementation static methods to their own classes
* Detect best supported implementation on first use
* available_implementationsI() -> available_implementations
This creates a "document" class with only user-facing document state (no parser internals).
- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)
Usage:
```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```
```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
* Fix issue472: make JsonStream a template.
* Adding missing include.
* Tweaking headers and some minor formatting.
* Removing file from aggregation.
* Moving jsoncharutils
* Adding new header.
* Trying another header.
* Let us try to route around Visual Studio's nonesense.