* This is an implementation of "size()" for arrays and objects.
* Adding benchmark
* Adding a size() remark in the documentation.
* Extending size() to result types.
To avoid using data belonging to a temporary, the parse functions are ref qualified to get a compile error if used on an rvalue. See https://github.com/simdjson/simdjson/issues/696
Compilation tests are also added, to make sure bad usage fails to compile.
Reviewed by jkeiser.
* Some users are interested, as a metric, in the number of documents parsed per second.
Obviously, this means reusing the same parser again and again.
* Adding a sentence
* This update the parsingcompetition benchmark so that it displays the number of documents parsed per second.
* Fallback should use our scalar code.
* parse should have a nicer error message.
* Making it so that "minify" can use different architectures.
* Let us change the minifier competition so that it tests all implementations.
* Documenting the untaken optimization opportunity.
Co-authored-by: John Keiser <john@johnkeiser.com>
* If we are going to have a google benchmark flag, we better make sure that we test it out minimal (it should build).
* Fix bench_dom_api
Co-authored-by: John Keiser <john@johnkeiser.com>
* Make architecture implementations virtual functions
- Easier to add new architectures (add implementation to implementation.cpp)
- Easier to add new algorithms / functions to architecture selection
(add to implementation.h, implement)
- Automatically select best implementation in static initialization
- Allow user to explicitly select implementation with a string (i.e.
parameter)
- Allow user to inspect current implementation name/description
- Allow user to list available implementations
- Eliminate architecture enum and architecture-based templating
- Add noexcept in non-inline functions
* Move implementation static methods to their own classes
* Detect best supported implementation on first use
* available_implementationsI() -> available_implementations
This creates a "document" class with only user-facing document state (no parser internals).
- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)
Usage:
```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```
```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
* Fix for issue467
* Updating single-header
* Let us make it so that JsonStream is constructed from a padded_string which will avoid dangerous overruns.
* Fixing parse_stream
* Updating documentation.
* This revert the code back to how it was prior to the silly "run two stages" routine and instead
adds an option to benchmark the code over hot buffers. It turns out that it can be expensive,
when the files are large, to allocate the pages.
* Instead of emulating the whole parsing as stage 1 + stage 2, let us
benchmark the real thing.
* Adding explicit constructor.
* Adding warning to the benchmark user.
* Making re-running optional.
* string literal + integer means unintended and incorrect pointer arithmetic
fixes a clang warning. it could not be triggered, because it can only be
triggered if the string given to getopt is not covered among the
cases in the switch.
* handle review comment
* I think we can align the numbers better (so it is prettier).
* Remove space before %, align third line better
Co-authored-by: John Keiser <john@johnkeiser.com>
Only the simdjson library should optionally depend on threads,
the executables that link to simdjson will get the dependency
indirectly.
* add option for controlling threads (default is on)
* add CI testing with threading on/off for msvc, gcc and clang
* fix an unrelated copy paste comment error in the cirlce ci build conf
* JsonStream threaded prototype
* JsonStream Threaded version working. Still supporting non-threaded version.
* Fix where invalid files would enter infinite loop.
* SingleHeader update
* I will remove -pthread in cmake for now.
* Attempt at resolving the -pthread issue
* rough prototype working. Needs more test and fine tuning.
* prototype working on large files.
* prototype working on large files.
* Adding benchmarks
* jsonstream API adjustment
* type
* minor fixes and cleaning.
* minor fixes and cleaning.
* removing warnings
* removing some copies
* runtime dispatch error fix
* makefile linking src/jsonstream.cpp
* fixing arm stage 1 headers
* fixing stage 2 headers
* fixing stage 1 arm header
* making jsonstream portable
* cleaning imports
* including <algorithms> for windows compiler
* cleaning benchmark imports
* adding jsonstream to amalgamation
* merged main into branch
* bug fix where JsonStream would bug on rare cases.
* Addind a JsonStream Demo to Amalgamation
* Fix for https://github.com/lemire/simdjson/issues/345
* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)
* Final (?) fix for https://github.com/lemire/simdjson/issues/345
* Verbose basictest
* Being more forgiving of powers of ten.
* Let us zero the tail end.
* add basic fuzzers (#348)
* add basic fuzzing using libFuzzer
* let cmake respect cflags, otherwise the fuzzer flags go unnoticed
also, integrates badly with oss-fuzz
* add new fuzzer for minification, simplify the old one
* add fuzzer for the dump example
* clang format
* adding Paul Dreik
* rough prototype working. Needs more test and fine tuning.
* prototype working on large files.
* prototype working on large files.
* Adding benchmarks
* jsonstream API adjustment
* type
* minor fixes and cleaning.
* Fixing issue 351 (#352)
* Fixing issues 351 and 353
* minor fixes and cleaning.
* removing warnings
* removing some copies
* Fix ARM compile errors on g++ 7.4 (#354)
* Fix ARM compilation errors
* Update singleheader
* runtime dispatch error fix
* makefile linking src/jsonstream.cpp
* fixing arm stage 1 headers
* fixing stage 2 headers
* fixing stage 1 arm header
* fix integer overflow in subnormal_power10 (#355)
detected by oss-fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714
* Adding new test file, following https://github.com/lemire/simdjson/pull/355
* making jsonstream portable
* cleaning imports
* including <algorithms> for windows compiler
* cleaning benchmark imports
* adding jsonstream to amalgamation
* merged main into branch
* bug fix where JsonStream would bug on rare cases.
* Addind a JsonStream Demo to Amalgamation
* merging main
* rough prototype working. Needs more test and fine tuning.
* prototype working on large files.
* prototype working on large files.
* Adding benchmarks
* jsonstream API adjustment
* minor fixes and cleaning.
* minor fixes and cleaning.
* removing warnings
* removing some copies
* runtime dispatch error fix
* makefile linking src/jsonstream.cpp
* fixing arm stage 1 headers
* fixing stage 2 headers
* fixing stage 1 arm header
* making jsonstream portable
* cleaning imports
* including <algorithms> for windows compiler
* cleaning benchmark imports
* adding jsonstream to amalgamation
* bug fix where JsonStream would bug on rare cases.
* Addind a JsonStream Demo to Amalgamation
* rough prototype working. Needs more test and fine tuning.
* minor fixes and cleaning.
* adding jsonstream to amalgamation
* merged main into branch
* Addind a JsonStream Demo to Amalgamation
* merging main
* merging main
* make file fix
* Allow -f
* Support parse -s (force sse)
* Simplify flatten_bits
- Add directly to base instead of storing variable
- Don't modify base_ptr after beginning of function
- Eliminate base variable and increment base_ptr instead
* De-unroll the flatten_bits loops
* Decrease dependencies in stage 1
- Do all finalize_structurals work before computing the quote mask; mask
out the quote mask later
- Join find_whitespace_and_structurals and finalize_structurals into
single find_structurals call, to reduce variable leakage
- Rework pseudo_pred algorithm to refer to "primitive" for clarity and some
dependency reduction
- Rename quote_mask to in_string to describe what we're trying to
achieve ("mask" could mean many things)
- Break up find_quote_mask_and_bits into find_quote_mask and
invalid_string_bytes to reduce data leakage (i.e. don't expose quote bits
or odd_ends at all to find_structural_bits)
- Genericize overflow methods "follows" and "follows_odd_sequence" for
descriptiveness and possible lifting into a generic simd parsing library
* Mark branches as likely/unlikely
* Reorder and unroll+interleave stage 1 loop
* Nest the cnt > 16 branch inside cnt > 8
* handle uint64 value in JSON
* Add integer_tests
* Add get_unsigned_integer() on ParsedJson::BasicIterator
* Write 'u' to tape when the value seems unsigned
* Add to handle 'u' element
* Brush up integer_tests.cpp
* Append tests/integer_tests in .gitignore
* Add comments to is_integer and is_unsigned_integer
* Add -n and -w arguments
* Add Dockerfile that compares perf against master
* Add checkperf to .drone.yml
* Clone from github instead of .git since CI doesn't have .git
* Attempt 1 - fn targeting
GCC won't work with templates with different targets, need to specialize all the way up the call stack.
* Compiles properly with cmake. Does not with the Makefile.
* Compilation works with Makefile
* instruction_set changes to architecture
* some aesthetic changes
* fix amalgation and tests + aesthetic changes
* This now compiles and passes tests under CLANG
* Minor correction.
* Trying to make it work on ARM
* Adding missing namespace
* Missing bracket
* Fixing minor compilation issues.
* Getting parse to use runtime dispatch
* Fixing amalgamation script.
* Making sure that NEON is supported.
* Fixing typo
* Merging https://github.com/lemire/simdjson/pull/229
* Manual merge of
https://github.com/lemire/simdjson/pull/229
by @jkeiser (second part)
* Trying another way.
* Removing the paral.
* Fixing the make file
* Let us make the practice run long enough.
* Resolved the awful slowness.
* Cleaning the README.md
* With runtime dispatching, we should not need flags anymore.
* Changing isa detection file's name + fixing typos.
* Improving portability.
* Revisiting faulty logic regarding same-page overruns.
* Disabling same-page overruns under VS.
* Clarifying the documentation
* Fix for issue 131 + being more explicit regarding memory realloc.
* Fix for issue 137.
* removing "using namespace std" throughout. Fix for 50
* Introducing typed malloc/free.
* Introducing a custom class (padded_string) that solves several minor usability issues.
* Updating amalgamation for testing.
* Making sure we can run with the sanitizers on.
* Minor code simplification in the number parsing.
* Following @EmilGedda 's recommendations regarding the makefile.
* Reference to blog post.
* Adding link to https://johnnylee-sde.github.io/Fast-numeric-string-to-int/
* Better hex parsing.
* Minor change to benchmark cmake
* Moved ParsedJson and its Iterator to separate .cpp files
* Uncommented functions, that has nothing to do with this pr
* Removed really_inline comments
* Reinstated some inline functions to restore previous performance
* Re-merged iterator in ParsedJson
* Uncommented some WARN_UNUSED