Commit Graph

142 Commits

Author SHA1 Message Date
John Keiser 1f76737510 Make valstat-ish parse APIs 2020-02-18 08:37:07 -08:00
John Keiser bc8bc7d1a8
Lowercase Architecture and ErrorValues (#487)
ErrorValues -> error_code, Architecture -> architecture
2020-02-14 15:21:28 -08:00
John Keiser 8e7d1a5f09
Separate document state from ParsedJson
This creates a "document" class with only user-facing document state (no parser internals).

- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)

Usage:

```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```

```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
2020-02-07 10:02:36 -08:00
Daniel Lemire 4518f1fba1 Some minor nitpicking. 2020-02-07 10:41:45 -05:00
Daniel Lemire 5c59b3a775 Fixing memory leaks. (Minor issue.) 2020-02-07 10:29:15 -05:00
Daniel Lemire 28710f8ad5
fix for Issue 467 (#469)
* Fix for issue467

* Updating single-header

* Let us make it so that JsonStream is constructed from a padded_string which will avoid dangerous overruns.

* Fixing parse_stream

* Updating documentation.
2020-01-29 19:00:18 -05:00
John Keiser 6978a0b8d4 Benchmark escapes (#464)
* Add escapes as a feature we benchmark

* Don't print effectiveness metric unless verbose is on
2020-01-27 09:58:14 -05:00
Daniel Lemire aea79912ec
Adding a "get_corpus" benchmark. (#456)
* Adding a "get_corpus" benchmark.

* Improving portability.
2020-01-20 17:27:25 -05:00
Daniel Lemire 80b4dd2e8a
Removing all stdout, stderr from main library. (#455)
* Removing all stdout,stderr from main library.
2020-01-20 16:03:15 -05:00
Daniel Lemire f87e64f988
Add option to make buffers hot and remove recent benchmarking changes (#443)
* This revert the code back to how it was prior to the silly "run two stages" routine and instead
adds an option to benchmark the code over hot buffers. It turns out that it can be expensive,
when the files are large, to allocate the pages.
2020-01-15 19:48:00 -05:00
Daniel Lemire f97b655f02
Instead of emulating the whole parsing as stage 1 + stage 2, let us benchmark the real thing. (#441)
* Instead of emulating the whole parsing as stage 1 + stage 2, let us
benchmark the real thing.

* Adding explicit constructor.

* Adding warning to the benchmark user.

* Making re-running optional.
2020-01-11 10:14:22 -05:00
Daniel Lemire 7bde23590a
Debugging jsonstream (#432)
Fixes #424 (and provide tests for it), as well as #401
2020-01-03 22:22:47 -05:00
John Keiser 3b9e6bff3c Print stage 2 information in feature benchmarker 2020-01-02 17:23:21 -07:00
Paul Dreik 27293cc1c1
don't add integers to string literals (#410)
* string literal + integer means unintended and incorrect pointer arithmetic

fixes a clang warning. it could not be triggered, because it can only be
triggered if the string given to getopt is not covered among the
cases in the switch.

* handle review comment
2019-12-24 20:19:22 +01:00
Daniel Lemire b2ebdb0d07
I think we can align the numbers better (so it is prettier). (#399)
* I think we can align the numbers better (so it is prettier).

* Remove space before %, align third line better

Co-authored-by: John Keiser <john@johnkeiser.com>
2019-12-20 19:58:49 -05:00
John Keiser 60916318f7 Show miss rate, make it more accurate 2019-12-18 14:38:25 -08:00
John Keiser e2f349e7bd Measure impact of utf-8 blocks and structurals per block directly 2019-12-17 11:41:13 -08:00
Daniel Lemire f32b97733b Updating the json minifier benchmark to match that of the new API. 2019-12-02 10:46:03 -05:00
Paul Dreik 6d14afd80e
Make threads optional in the cmake build (#376)
Only the simdjson library should optionally depend on threads,
the executables that link to simdjson will get the dependency
indirectly.

* add option for controlling threads (default is on)
* add CI testing with threading on/off for msvc, gcc and clang
* fix an unrelated copy paste comment error in the cirlce ci build conf
2019-11-22 21:51:46 +01:00
Jeremie Piotte 29fc51522a
Introducing concurrency mode in JsonStream. (#373)
* JsonStream threaded prototype

* JsonStream Threaded version working. Still supporting non-threaded version.

* Fix where invalid files would enter infinite loop.

* SingleHeader update

* I will remove -pthread in cmake for now.

* Attempt at resolving the -pthread issue
2019-11-21 11:22:06 -05:00
Daniel Lemire 6cd8fb7982
Adding a getline benchmark (#344) 2019-11-20 20:33:16 -05:00
Jeremie Piotte bdc2b07339
Streams of JSON documents + Large files (>4GB) (#350) (#364)
* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* Fix for https://github.com/lemire/simdjson/issues/345

* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)

* Final (?) fix for https://github.com/lemire/simdjson/issues/345

* Verbose basictest

* Being more forgiving of powers of ten.

* Let us zero the tail end.

* add basic fuzzers (#348)

* add basic fuzzing using libFuzzer

* let cmake respect cflags, otherwise the fuzzer flags go unnoticed

also, integrates badly with oss-fuzz

* add new fuzzer for minification, simplify the old one

* add fuzzer for the dump example

* clang format

* adding Paul Dreik

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* Fixing issue 351 (#352)

* Fixing issues 351 and 353

* minor fixes and cleaning.

* removing warnings

* removing some copies

* Fix ARM compile errors on g++ 7.4 (#354)

* Fix ARM compilation errors

* Update singleheader

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* fix integer overflow in subnormal_power10 (#355)

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* Adding new test file, following https://github.com/lemire/simdjson/pull/355

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* merging main

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* rough prototype working.  Needs more test and fine tuning.

* minor fixes and cleaning.

* adding jsonstream to amalgamation

* merged main into branch

* Addind a JsonStream Demo to Amalgamation

* merging main

* merging main

* make file fix
2019-11-08 17:39:45 -05:00
John Keiser de8df0a05f Combined performance patch (5% overall, 15% stage 1) (#317)
* Allow -f

* Support parse -s (force sse)

* Simplify flatten_bits

- Add directly to base instead of storing variable
- Don't modify base_ptr after beginning of function
- Eliminate base variable and increment base_ptr instead

* De-unroll the flatten_bits loops

* Decrease dependencies in stage 1

- Do all finalize_structurals work before computing the quote mask; mask
  out the quote mask later
- Join find_whitespace_and_structurals and finalize_structurals into
  single find_structurals call, to reduce variable leakage
- Rework pseudo_pred algorithm to refer to "primitive" for clarity and some
  dependency reduction
- Rename quote_mask to in_string to describe what we're trying to
  achieve ("mask" could mean many things)
- Break up find_quote_mask_and_bits into find_quote_mask and
  invalid_string_bytes to reduce data leakage (i.e. don't expose quote bits
  or odd_ends at all to find_structural_bits)
- Genericize overflow methods "follows" and "follows_odd_sequence" for
  descriptiveness and possible lifting into a generic simd parsing library

* Mark branches as likely/unlikely

* Reorder and unroll+interleave stage 1 loop

* Nest the cnt > 16 branch inside cnt > 8
2019-10-01 12:01:08 -04:00
saka1 c1f27fb848 Accept large unsigned integers (#295)
* handle uint64 value in JSON
* Add integer_tests
* Add get_unsigned_integer() on  ParsedJson::BasicIterator
* Write 'u' to tape when the value seems unsigned
* Add to handle 'u' element
* Brush up integer_tests.cpp
* Append tests/integer_tests in .gitignore
* Add comments to is_integer and is_unsigned_integer
2019-09-02 10:50:24 -04:00
Daniel Lemire 9f26355fe0
This should lower false positives. (#299) 2019-08-25 09:33:00 -04:00
John Keiser 585f84a734 Move architecture-specific headers to src/ (#287)
* Use namespaces instead of templates for stage1 impls

* Move stage1 implementation into the src/ directory

* Move architecture-specific code to src/
2019-08-21 07:59:49 -04:00
John Keiser 85fb37b6ea Lower the bar for performance check 2019-08-16 12:34:28 -07:00
John Keiser c8d50a6060 Make perf validation more stable, check no-AVX as well (#275)
* Compare against v0.2.1, fail only if perf is less 6 times in a row

* Run AVX and no-AVX perf tests in Circle CI

* Set % difference threshold to 0.2%
2019-08-15 20:43:21 -04:00
Daniel Lemire 7f27e1e0e1 Merge branch 'master' of github.com:lemire/simdjson 2019-08-12 16:05:31 -04:00
John Keiser 875e2f9d0d check for performance degradation in CI (#270)
* Add -n and -w arguments

* Add Dockerfile that compares perf against master

* Add checkperf to .drone.yml

* Clone from github instead of .git since CI doesn't have .git
2019-08-12 16:03:56 -04:00
Daniel Lemire 5c538dd9d6 correcting weird formatting. 2019-08-12 15:56:15 -04:00
Daniel Lemire 144b10b35d
simdjson vs. JSON for Modern C++ (#247)
* New competitor.

* Fixing makefile.
2019-08-02 19:48:34 -04:00
Daniel Lemire 038b18edf1
Adding style scripts. (#243)
* Adding style scripts.
2019-08-01 16:09:26 -04:00
John Keiser bf59ba76f5 Fix most warnings on VS2019 (#241) 2019-07-31 17:43:45 -04:00
ioioioio c2eea8abba Style uniformization (#238)
* massive clang-format -style=LLVM

* naming harmonization

* adding commentary about sysinfoapi.h
2019-07-30 17:18:10 -04:00
Daniel Lemire eba02dc1b9 Runtime dispatch
* Attempt 1 - fn targeting

GCC won't work with templates with different targets, need to specialize all the way up the call stack.

* Compiles properly with cmake. Does not with the Makefile.

* Compilation works with Makefile

* instruction_set changes to architecture

* some aesthetic changes

* fix amalgation and tests + aesthetic changes

* This now compiles and passes tests under CLANG

* Minor correction.

* Trying to make it work on ARM

* Adding missing namespace

* Missing bracket

* Fixing minor compilation issues.

* Getting parse to use runtime dispatch

* Fixing amalgamation script.

* Making sure that NEON is supported.

* Fixing typo

* Merging https://github.com/lemire/simdjson/pull/229

* Manual merge of
https://github.com/lemire/simdjson/pull/229
by @jkeiser  (second part)

* Trying another way.

* Removing the paral.

* Fixing the make file

* Let us make the practice run long enough.

* Resolved the awful slowness.

* Cleaning the README.md

* With runtime dispatching, we should not need flags anymore.

* Changing isa detection file's name + fixing typos.
2019-07-28 22:46:33 -04:00
AmoghSubhedar 9aa2cd71b2 Fix benchmark cpp files (#225) 2019-07-26 08:37:52 -04:00
ioioioio 036f9d5a45 Merge branch 'master' of https://github.com/lemire/simdjson into Multiple_implementation_refactoring_stage2 2019-07-03 10:34:58 -04:00
ioioioio 3f24879157 Stage2 refactored to simplify multiple implementations 2019-07-02 17:12:00 -04:00
ioioioio 9230588ce8 conflicts are solved 2019-07-02 15:21:00 -04:00
Daniel Lemire aa78b70d69 Introducing a "native" instruction set so that you do not need to do #ifdef to select the right SIMD set all the time.
Fixing indentation.
Removing some obsolete WARN_UNUSED.
Fixing a weird warning with optind variable.
2019-07-01 14:18:30 -04:00
ioioioio 6723221a42 Refactoring stage1 to facilitate multiple implementations. 2019-06-28 15:14:42 -04:00
Daniel Lemire d7f7f1b200
Fixing issue. (#193) 2019-06-20 18:49:47 -04:00
Daniel Lemire b0e6bfa84c
Simpler iteration code (#190)
* Adding convenience method to simplify code.

* Simplifying the iteration code.
2019-06-12 16:29:24 -04:00
Daniel Lemire b1e8990654
Moving iterator functions in the header file (#189)
We want the compiler to inline hot functions in the iterators. Let us leave them in the header file. Please.
2019-06-11 21:09:58 -04:00
Daniel Lemire cdc75dec97 Adding GB/s to the table version of parse. 2019-06-03 13:45:34 -04:00
Daniel Lemire f0bee2ac8b Ease diagnostic with GHz reporting. 2019-06-03 13:24:54 -04:00
Daniel Lemire 295e481a2e Getting more precise timings (avoiding the overhead of linux perf. counters). 2019-06-03 10:59:07 -04:00
Daniel Lemire 5aaca27cda Making it practical to benchmark large files. 2019-05-31 20:33:16 -04:00
Daniel Lemire f220c1e9eb Removing bogus doc. 2019-05-31 19:48:52 -04:00