* Adding new files.
* Better.
* Fixing minifier and adding tests.
* Adding benchmarks.
* Including the array header.
* Replacing old stream-based code by the new code.
* Doubling up the itoa.
* Hidden away to_chars in internal namespace.
* Removing the repetitions.
* Documented the atoi functions.
* Tuning the escape sequences.
* Moving the operators off the main namespace.
* Added more tests.
* Tweaking the implementation so that it works with and without exp.
* The string_builder template and mini_formatter class
are not part of our public API and are subject to change
at any time!
* Adding a benchmark and some optimization.
* Cleaning.
* Strictly speaking, this header is needed.
* This avoids locale-dependent number parsing at the standard library level.
* Adding missing cast.
* Inserting the missing "endif"
* Trial and error.
* Another attempt.
* Another tweak.
* Another fix.
* Restricting it even more.
* Tweaking our symbol checks.
* Somewhat smarter tests.
* Nice comments.
* Minor simplification.
* Adding cerr.
* Adding test.
* Saving.
* With exceptions.
* Added extensive tests.
* Better documentation.
* Tweaking CI
* Cleaning.
* Do not assume make.
* Let us make the build verbose
* Reorg
* I do not understand how circle ci works.
* Breaking it up.
* Better syntax.
* Specification is not followed.
* Fixes.
* Do not pass string_view by reference.
* Better documentation.
* The example is written for exceptions.
* Better documentation.
* Updating with deprecation.
* Updating example.
* Updating example.
* This allows the users to disable threading.
* This would disable bash scripts under FreeBSD. (#1118)
* This would disable bash scripts under FreeBSD.
* Let us also disable GIT.
* Let us try to just disable GIT
* Nope. We must have both bash and git disabled.
* This allows the users to disable threading.
* This would disable bash scripts under FreeBSD.
* Let us also disable GIT.
* Let us try to just disable GIT
* Nope. We must have both bash and git disabled.
* The initial motivation behind basictests was for a quick set of sanity tests to check whether your code made sense. It
was not meant for thorough testing to find corner cases. However, over time, it grew to include such expensive tests.
This PR takes them out. It also allows us to bring back basictests to MinGW tests, since it is now cheap.
This is not an exercise in software engineering and making things prettier. This is a pragmatic change to improve our
test coverage and quality of life.
* Adds many more cheap tests.
Co-authored-by: Daniel Lemire <lemire@gmai.com>
* Testing with GCC 10 and clang 10
* Fixing spurious space
* gcc10 does not need the cmake installation.
* We don't want to run the perf test on ARM. I ignore them systematically. ARM performance
should be assessed manually.
* Switching to GCC 10 and Clang 10
* Disabling some tests under sanitizers when they involve rapidjson or other parsers.
Co-authored-by: Daniel Lemire <lemire@gmai.com>
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.
To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.
This fixes our parse_stream benchmark which is just busted.
This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.
Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.
Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
* This is an implementation of "size()" for arrays and objects.
* Adding benchmark
* Adding a size() remark in the documentation.
* Extending size() to result types.
To avoid using data belonging to a temporary, the parse functions are ref qualified to get a compile error if used on an rvalue. See https://github.com/simdjson/simdjson/issues/696
Compilation tests are also added, to make sure bad usage fails to compile.
Reviewed by jkeiser.
* Trying to correct the documentation so that it actually describes how the code behaves.
* tweaking the wording.
* Improving.
* Removing confusing sentence.
* Fixing formatting.
* Now with working example, tested.
* Added a smaller piece of code
* It is inconvenient to be unable to print a padded_string.
* Allows us to print the padded_string even when it is embedded in result object when exceptions are enabled.
* Fallback should use our scalar code.
* parse should have a nicer error message.
* Making it so that "minify" can use different architectures.
* Let us change the minifier competition so that it tests all implementations.
* Documenting the untaken optimization opportunity.
Co-authored-by: John Keiser <john@johnkeiser.com>
* Currently, document::stream contains an attribute that is a reference:
```
document::parser &parser;
```
Yet we try to have it default on the move operator:
```
stream &operator=(document::stream &&other) = default;
stream &operator=(const document::stream &) = delete; // Disallow copying
```
```
stream(document::stream &&other) = default;
stream(const document::stream &) = delete; // Disallow copying
```
I am not sure what the move is supposed to do with the reference.
I cannot find where we test the copy constructor and assignment. This has been concerned that it is either dead code or buggy code.
* Remove non-working, unnecessary move constructors
* We still want to disallow copies.
Co-authored-by: John Keiser <john@johnkeiser.com>
* Make architecture implementations virtual functions
- Easier to add new architectures (add implementation to implementation.cpp)
- Easier to add new algorithms / functions to architecture selection
(add to implementation.h, implement)
- Automatically select best implementation in static initialization
- Allow user to explicitly select implementation with a string (i.e.
parameter)
- Allow user to inspect current implementation name/description
- Allow user to list available implementations
- Eliminate architecture enum and architecture-based templating
- Add noexcept in non-inline functions
* Move implementation static methods to their own classes
* Detect best supported implementation on first use
* available_implementationsI() -> available_implementations
This creates a "document" class with only user-facing document state (no parser internals).
- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)
Usage:
```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```
```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
* Fix issue472: make JsonStream a template.
* Adding missing include.
* Tweaking headers and some minor formatting.
* Removing file from aggregation.
* Moving jsoncharutils
* Adding new header.
* Trying another header.
* Let us try to route around Visual Studio's nonesense.
* Fix for issue467
* Updating single-header
* Let us make it so that JsonStream is constructed from a padded_string which will avoid dangerous overruns.
* Fixing parse_stream
* Updating documentation.
Only the simdjson library should optionally depend on threads,
the executables that link to simdjson will get the dependency
indirectly.
* add option for controlling threads (default is on)
* add CI testing with threading on/off for msvc, gcc and clang
* fix an unrelated copy paste comment error in the cirlce ci build conf
* JsonStream threaded prototype
* JsonStream Threaded version working. Still supporting non-threaded version.
* Fix where invalid files would enter infinite loop.
* SingleHeader update
* I will remove -pthread in cmake for now.
* Attempt at resolving the -pthread issue
* rough prototype working. Needs more test and fine tuning.
* prototype working on large files.
* prototype working on large files.
* Adding benchmarks
* jsonstream API adjustment
* type
* minor fixes and cleaning.
* minor fixes and cleaning.
* removing warnings
* removing some copies
* runtime dispatch error fix
* makefile linking src/jsonstream.cpp
* fixing arm stage 1 headers
* fixing stage 2 headers
* fixing stage 1 arm header
* making jsonstream portable
* cleaning imports
* including <algorithms> for windows compiler
* cleaning benchmark imports
* adding jsonstream to amalgamation
* merged main into branch
* bug fix where JsonStream would bug on rare cases.
* Addind a JsonStream Demo to Amalgamation
* Fix for https://github.com/lemire/simdjson/issues/345
* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)
* Final (?) fix for https://github.com/lemire/simdjson/issues/345
* Verbose basictest
* Being more forgiving of powers of ten.
* Let us zero the tail end.
* add basic fuzzers (#348)
* add basic fuzzing using libFuzzer
* let cmake respect cflags, otherwise the fuzzer flags go unnoticed
also, integrates badly with oss-fuzz
* add new fuzzer for minification, simplify the old one
* add fuzzer for the dump example
* clang format
* adding Paul Dreik
* rough prototype working. Needs more test and fine tuning.
* prototype working on large files.
* prototype working on large files.
* Adding benchmarks
* jsonstream API adjustment
* type
* minor fixes and cleaning.
* Fixing issue 351 (#352)
* Fixing issues 351 and 353
* minor fixes and cleaning.
* removing warnings
* removing some copies
* Fix ARM compile errors on g++ 7.4 (#354)
* Fix ARM compilation errors
* Update singleheader
* runtime dispatch error fix
* makefile linking src/jsonstream.cpp
* fixing arm stage 1 headers
* fixing stage 2 headers
* fixing stage 1 arm header
* fix integer overflow in subnormal_power10 (#355)
detected by oss-fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714
* Adding new test file, following https://github.com/lemire/simdjson/pull/355
* making jsonstream portable
* cleaning imports
* including <algorithms> for windows compiler
* cleaning benchmark imports
* adding jsonstream to amalgamation
* merged main into branch
* bug fix where JsonStream would bug on rare cases.
* Addind a JsonStream Demo to Amalgamation
* merging main
* rough prototype working. Needs more test and fine tuning.
* prototype working on large files.
* prototype working on large files.
* Adding benchmarks
* jsonstream API adjustment
* minor fixes and cleaning.
* minor fixes and cleaning.
* removing warnings
* removing some copies
* runtime dispatch error fix
* makefile linking src/jsonstream.cpp
* fixing arm stage 1 headers
* fixing stage 2 headers
* fixing stage 1 arm header
* making jsonstream portable
* cleaning imports
* including <algorithms> for windows compiler
* cleaning benchmark imports
* adding jsonstream to amalgamation
* bug fix where JsonStream would bug on rare cases.
* Addind a JsonStream Demo to Amalgamation
* rough prototype working. Needs more test and fine tuning.
* minor fixes and cleaning.
* adding jsonstream to amalgamation
* merged main into branch
* Addind a JsonStream Demo to Amalgamation
* merging main
* merging main
* make file fix