Commit Graph

374 Commits

Author SHA1 Message Date
Paul Dreik 6d14afd80e
Make threads optional in the cmake build (#376)
Only the simdjson library should optionally depend on threads,
the executables that link to simdjson will get the dependency
indirectly.

* add option for controlling threads (default is on)
* add CI testing with threading on/off for msvc, gcc and clang
* fix an unrelated copy paste comment error in the cirlce ci build conf
2019-11-22 21:51:46 +01:00
Jeremie Piotte 29fc51522a
Introducing concurrency mode in JsonStream. (#373)
* JsonStream threaded prototype

* JsonStream Threaded version working. Still supporting non-threaded version.

* Fix where invalid files would enter infinite loop.

* SingleHeader update

* I will remove -pthread in cmake for now.

* Attempt at resolving the -pthread issue
2019-11-21 11:22:06 -05:00
Daniel Lemire 6cd8fb7982
Adding a getline benchmark (#344) 2019-11-20 20:33:16 -05:00
Jeremie Piotte bdc2b07339
Streams of JSON documents + Large files (>4GB) (#350) (#364)
* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* Fix for https://github.com/lemire/simdjson/issues/345

* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)

* Final (?) fix for https://github.com/lemire/simdjson/issues/345

* Verbose basictest

* Being more forgiving of powers of ten.

* Let us zero the tail end.

* add basic fuzzers (#348)

* add basic fuzzing using libFuzzer

* let cmake respect cflags, otherwise the fuzzer flags go unnoticed

also, integrates badly with oss-fuzz

* add new fuzzer for minification, simplify the old one

* add fuzzer for the dump example

* clang format

* adding Paul Dreik

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* Fixing issue 351 (#352)

* Fixing issues 351 and 353

* minor fixes and cleaning.

* removing warnings

* removing some copies

* Fix ARM compile errors on g++ 7.4 (#354)

* Fix ARM compilation errors

* Update singleheader

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* fix integer overflow in subnormal_power10 (#355)

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* Adding new test file, following https://github.com/lemire/simdjson/pull/355

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* merging main

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* rough prototype working.  Needs more test and fine tuning.

* minor fixes and cleaning.

* adding jsonstream to amalgamation

* merged main into branch

* Addind a JsonStream Demo to Amalgamation

* merging main

* merging main

* make file fix
2019-11-08 17:39:45 -05:00
John Keiser de8df0a05f Combined performance patch (5% overall, 15% stage 1) (#317)
* Allow -f

* Support parse -s (force sse)

* Simplify flatten_bits

- Add directly to base instead of storing variable
- Don't modify base_ptr after beginning of function
- Eliminate base variable and increment base_ptr instead

* De-unroll the flatten_bits loops

* Decrease dependencies in stage 1

- Do all finalize_structurals work before computing the quote mask; mask
  out the quote mask later
- Join find_whitespace_and_structurals and finalize_structurals into
  single find_structurals call, to reduce variable leakage
- Rework pseudo_pred algorithm to refer to "primitive" for clarity and some
  dependency reduction
- Rename quote_mask to in_string to describe what we're trying to
  achieve ("mask" could mean many things)
- Break up find_quote_mask_and_bits into find_quote_mask and
  invalid_string_bytes to reduce data leakage (i.e. don't expose quote bits
  or odd_ends at all to find_structural_bits)
- Genericize overflow methods "follows" and "follows_odd_sequence" for
  descriptiveness and possible lifting into a generic simd parsing library

* Mark branches as likely/unlikely

* Reorder and unroll+interleave stage 1 loop

* Nest the cnt > 16 branch inside cnt > 8
2019-10-01 12:01:08 -04:00
saka1 c1f27fb848 Accept large unsigned integers (#295)
* handle uint64 value in JSON
* Add integer_tests
* Add get_unsigned_integer() on  ParsedJson::BasicIterator
* Write 'u' to tape when the value seems unsigned
* Add to handle 'u' element
* Brush up integer_tests.cpp
* Append tests/integer_tests in .gitignore
* Add comments to is_integer and is_unsigned_integer
2019-09-02 10:50:24 -04:00
Daniel Lemire 9f26355fe0
This should lower false positives. (#299) 2019-08-25 09:33:00 -04:00
John Keiser 585f84a734 Move architecture-specific headers to src/ (#287)
* Use namespaces instead of templates for stage1 impls

* Move stage1 implementation into the src/ directory

* Move architecture-specific code to src/
2019-08-21 07:59:49 -04:00
John Keiser 85fb37b6ea Lower the bar for performance check 2019-08-16 12:34:28 -07:00
John Keiser c8d50a6060 Make perf validation more stable, check no-AVX as well (#275)
* Compare against v0.2.1, fail only if perf is less 6 times in a row

* Run AVX and no-AVX perf tests in Circle CI

* Set % difference threshold to 0.2%
2019-08-15 20:43:21 -04:00
Daniel Lemire 7f27e1e0e1 Merge branch 'master' of github.com:lemire/simdjson 2019-08-12 16:05:31 -04:00
John Keiser 875e2f9d0d check for performance degradation in CI (#270)
* Add -n and -w arguments

* Add Dockerfile that compares perf against master

* Add checkperf to .drone.yml

* Clone from github instead of .git since CI doesn't have .git
2019-08-12 16:03:56 -04:00
Daniel Lemire 5c538dd9d6 correcting weird formatting. 2019-08-12 15:56:15 -04:00
Daniel Lemire 144b10b35d
simdjson vs. JSON for Modern C++ (#247)
* New competitor.

* Fixing makefile.
2019-08-02 19:48:34 -04:00
Daniel Lemire 038b18edf1
Adding style scripts. (#243)
* Adding style scripts.
2019-08-01 16:09:26 -04:00
John Keiser bf59ba76f5 Fix most warnings on VS2019 (#241) 2019-07-31 17:43:45 -04:00
ioioioio c2eea8abba Style uniformization (#238)
* massive clang-format -style=LLVM

* naming harmonization

* adding commentary about sysinfoapi.h
2019-07-30 17:18:10 -04:00
Daniel Lemire eba02dc1b9 Runtime dispatch
* Attempt 1 - fn targeting

GCC won't work with templates with different targets, need to specialize all the way up the call stack.

* Compiles properly with cmake. Does not with the Makefile.

* Compilation works with Makefile

* instruction_set changes to architecture

* some aesthetic changes

* fix amalgation and tests + aesthetic changes

* This now compiles and passes tests under CLANG

* Minor correction.

* Trying to make it work on ARM

* Adding missing namespace

* Missing bracket

* Fixing minor compilation issues.

* Getting parse to use runtime dispatch

* Fixing amalgamation script.

* Making sure that NEON is supported.

* Fixing typo

* Merging https://github.com/lemire/simdjson/pull/229

* Manual merge of
https://github.com/lemire/simdjson/pull/229
by @jkeiser  (second part)

* Trying another way.

* Removing the paral.

* Fixing the make file

* Let us make the practice run long enough.

* Resolved the awful slowness.

* Cleaning the README.md

* With runtime dispatching, we should not need flags anymore.

* Changing isa detection file's name + fixing typos.
2019-07-28 22:46:33 -04:00
AmoghSubhedar 9aa2cd71b2 Fix benchmark cpp files (#225) 2019-07-26 08:37:52 -04:00
ioioioio 036f9d5a45 Merge branch 'master' of https://github.com/lemire/simdjson into Multiple_implementation_refactoring_stage2 2019-07-03 10:34:58 -04:00
ioioioio 3f24879157 Stage2 refactored to simplify multiple implementations 2019-07-02 17:12:00 -04:00
ioioioio 9230588ce8 conflicts are solved 2019-07-02 15:21:00 -04:00
Daniel Lemire aa78b70d69 Introducing a "native" instruction set so that you do not need to do #ifdef to select the right SIMD set all the time.
Fixing indentation.
Removing some obsolete WARN_UNUSED.
Fixing a weird warning with optind variable.
2019-07-01 14:18:30 -04:00
ioioioio 6723221a42 Refactoring stage1 to facilitate multiple implementations. 2019-06-28 15:14:42 -04:00
Daniel Lemire d7f7f1b200
Fixing issue. (#193) 2019-06-20 18:49:47 -04:00
Daniel Lemire b0e6bfa84c
Simpler iteration code (#190)
* Adding convenience method to simplify code.

* Simplifying the iteration code.
2019-06-12 16:29:24 -04:00
Daniel Lemire b1e8990654
Moving iterator functions in the header file (#189)
We want the compiler to inline hot functions in the iterators. Let us leave them in the header file. Please.
2019-06-11 21:09:58 -04:00
Daniel Lemire cdc75dec97 Adding GB/s to the table version of parse. 2019-06-03 13:45:34 -04:00
Daniel Lemire f0bee2ac8b Ease diagnostic with GHz reporting. 2019-06-03 13:24:54 -04:00
Daniel Lemire 295e481a2e Getting more precise timings (avoiding the overhead of linux perf. counters). 2019-06-03 10:59:07 -04:00
Daniel Lemire 5aaca27cda Making it practical to benchmark large files. 2019-05-31 20:33:16 -04:00
Daniel Lemire f220c1e9eb Removing bogus doc. 2019-05-31 19:48:52 -04:00
Daniel Lemire 642132920f Fixing performance regression caused by helpful code contributions
that moved inlineable functions into the source file combined with
helpful compilers which aren't smart enough to do the inlinining in
any case.
2019-05-31 18:16:12 -04:00
Daniel Lemire 4e7e7d99cc We do not want to check unified_machine against a bool now that we return an integer. 2019-05-31 11:11:23 -04:00
Daniel Lemire 6b5231f930 Just improving the look and feel of 'parsingcompetition'. 2019-05-24 20:08:06 -04:00
Daniel Lemire 2c7a9734af Updating parsingcompetition to the new API. 2019-05-24 19:28:21 -04:00
Daniel Lemire e370a65383
Fix for issues 32, 50, 131, 137
* Improving portability.

* Revisiting faulty logic regarding same-page overruns.

* Disabling same-page overruns under VS.

* Clarifying the documentation

* Fix for issue 131 + being more explicit regarding memory realloc.

* Fix for issue 137.

* removing "using namespace std" throughout. Fix for 50

* Introducing typed malloc/free.

* Introducing a custom class (padded_string) that solves several minor usability issues.

* Updating amalgamation for testing.
2019-05-09 17:59:51 -04:00
Daniel Lemire 0d81fd287e
With this commit we can do all tests with full sanitizers on, and get no warning (#132)
* Making sure we can run with the sanitizers on.
* Minor code simplification in the number parsing.
* Following @EmilGedda 's recommendations regarding the makefile.
* Reference to blog post.
* Adding link to https://johnnylee-sde.github.io/Fast-numeric-string-to-int/
* Better hex parsing.
2019-04-24 17:31:47 -04:00
Geoff Langdale 5578401a0f benchmark/parse.cpp doesn't need intrinsics for itself. 2019-03-21 11:29:17 +11:00
Daniel Lemire df8f792183
Store the string lengths on the string tape (#101)
* Store string length in the string-tape item.
* Files are now limited to 4GB.
* Moving detection of unescaped chars to stage 1 to reduce the burden due to string parsing.

Fixes https://github.com/lemire/simdjson/issues/114

Fixes https://github.com/lemire/simdjson/issues/87
2019-03-13 19:32:57 -04:00
myd7349 2851ea490c Export CMake targets (#96) 2019-03-04 16:07:06 -05:00
Thomas Navennec 352dd5e7fa Change parse_json return type from bool to int (#82)
* Added simdjerr namespace

* Updated jsonparser files

* updated stage1 and stage2

* removed stage2 inline function

* Added forgotten return statements

* Updated tools and benchmarks

* Corrected parenthesis

* Removed extra =

* Accidentally undid reinterpret_cast

* Better comments, undid a header name fuckup

* Added an errorMsg method, updated readme

* Removed useless header from stage2

* Updated single-header file

* added simdjerr.cpp contents to simdjson.cpp

* Made single header version work

* Updated singleheader test, fixed simdjson.cpp

* Renamed simdjerr namespace and files to simdjson

* Updating the amalgamation.
2019-03-02 17:18:45 -05:00
Kai Wolf 772919ef11 Use unique_ptr instead of new/delete 2019-02-25 21:03:20 +01:00
Kai Wolf b521719b6f Fix old-style C-Casts 2019-02-23 17:31:38 +01:00
Kai Wolf ff22e75f95 Apply minor readability fixes 2019-02-23 17:28:20 +01:00
Geoff Langdale 3d30fd5440 Fixed a stage number message and we now fail out if no structural chars from stage 1 2019-02-23 10:51:45 +11:00
Thomas Navennec 9606343b2c ParsedJson & ParsedJson::iterator definitions in .cpp files (#47)
* Minor change to benchmark cmake

* Moved ParsedJson and its Iterator to separate .cpp files

* Uncommented functions, that has nothing to do with this pr

* Removed really_inline comments

* Reinstated some inline functions to restore previous performance

* Re-merged iterator in ParsedJson

* Uncommented some WARN_UNUSED
2019-02-22 14:38:35 -05:00
Daniel Lemire 1b115dbd3a Adding jsoncpp 2019-01-24 14:28:26 -05:00
Daniel Lemire c901865ac8 Including more cases. 2019-01-17 19:21:09 -05:00
Daniel Lemire 974babf69f Adding more competition. 2019-01-17 17:24:29 -05:00
Daniel Lemire 86de53ab17 Minor tweaks. 2019-01-03 19:05:21 +00:00
Daniel Lemire 741f3c8c7d updating stat model 2019-01-02 16:47:35 -05:00
Daniel Lemire e92b19a692 Saving... 2019-01-01 15:16:44 -05:00
Daniel Lemire df65355ded More details. 2019-01-01 15:12:51 -05:00
Daniel Lemire f1ee507bca More details. 2019-01-01 14:39:35 -05:00
Daniel Lemire c3e2ec1618 Cleaning. 2018-12-31 17:39:06 -05:00
Daniel Lemire 3ce1dd8087 Cleaning. 2018-12-31 17:13:32 -05:00
Daniel Lemire 58d41923fd
Porting to visual studio
Now builds on Visual Studio
2018-12-30 21:00:19 -05:00
Daniel Lemire 386bebb33b adding support for cmake. 2018-12-28 13:13:10 -05:00
Daniel Lemire 3b24ba9043 Adding cmake 2018-12-28 13:05:42 -05:00
Daniel Lemire bf4089b33b Removing custom types (more standard code). 2018-12-27 20:09:25 -05:00
Daniel Lemire d7e8d53a2a Cleaning. 2018-12-27 17:39:32 -05:00
Daniel Lemire 8db5e6d044 Tweaking. 2018-12-27 17:39:17 -05:00
Daniel Lemire 2654388c52 Saving... 2018-12-27 17:10:19 -05:00
Daniel Lemire a75ef43a2f Refreshing. 2018-12-24 17:07:44 -05:00
Daniel Lemire 8db5da9ffe Adding cannonlake results. 2018-12-24 15:30:25 -05:00
Daniel Lemire 0c97db52c5 Fixing things up 2018-12-24 15:20:00 -05:00
Daniel Lemire 579bc0d848 Done analysis. 2018-12-24 15:10:55 -05:00
Daniel Lemire 15719a0d0d Tweaking. 2018-12-24 13:18:19 -05:00
Daniel Lemire 3f157d955c Got model. 2018-12-24 12:36:45 -05:00
Daniel Lemire 061c62a5da Let us try this. 2018-12-24 12:28:27 -05:00
Daniel Lemire e979a0c93f Simplifying the build 2018-12-19 00:40:04 -05:00
Daniel Lemire 3b7830002a Cleaning 2018-12-18 22:48:24 -05:00
Daniel Lemire 14b55ab77f Preparing new version with plotting. 2018-12-18 22:18:23 -05:00
Daniel Lemire b1b5665343 Cleaning. 2018-12-14 21:45:38 -05:00
Daniel Lemire 0769c39e27 Ok. Looks complete. 2018-12-14 21:32:42 -05:00
Daniel Lemire dfabee5b80 Tmp file. 2018-12-12 23:01:36 -05:00
Daniel Lemire 05a2547829 Adding benchmark. 2018-12-12 22:42:19 -05:00
Daniel Lemire 1dad5d49a6 Adding remark. 2018-12-12 12:10:23 -05:00
Daniel Lemire 15161669ec Added a version of RapidJSON with static alloc. 2018-12-12 10:19:32 -05:00
Daniel Lemire 751dce98f5 Getting there slowly. 2018-12-11 22:39:39 -05:00
Daniel Lemire 0b48fb8bd7 Removing memory leaks. 2018-12-11 17:20:29 -05:00
Daniel Lemire e8d3d784ab More fixing. 2018-12-10 22:21:03 -05:00
Daniel Lemire 05636f3a1d Cleaning. 2018-12-10 16:47:02 -05:00
Daniel Lemire 8615760331 Should now pass. 2018-12-10 15:16:31 -05:00
Daniel Lemire 176d2ccda4 Tweaking. 2018-12-10 14:25:49 -05:00
Daniel Lemire 71cdb8d825 Adding memory allocations as part of the benchmark 2018-12-06 22:34:18 -05:00
Daniel Lemire beb030fc16 Tweaking 2018-12-06 22:23:57 -05:00
Daniel Lemire c2913d5d69 Adding dynamic memory allocation. 2018-12-06 21:44:26 -05:00
Daniel Lemire 8589a0588b More clever parse function. 2018-12-06 17:40:32 -05:00
Daniel Lemire c8706c66ec Solving some build issues 2018-12-05 21:33:32 -05:00
Daniel Lemire e3a4b41c2e Cleaning. 2018-11-30 22:02:32 -05:00
Daniel Lemire c11eefca32 More cleaning. 2018-11-30 21:31:05 -05:00
Daniel Lemire a8b99984f2 Intermediate step. 2018-11-30 20:27:16 -05:00
Daniel Lemire e5707331e9 Some refactoring. 2018-11-30 09:37:57 -05:00
Daniel Lemire 12b518578d Ok, the new code seems quite fast. 2018-11-29 22:15:02 -05:00
Daniel Lemire f143d4e3f4 Adding cache access stats. 2018-11-28 10:53:57 -05:00
Daniel Lemire 8648c4108e MOre cleaning. 2018-11-27 20:42:35 -05:00
Daniel Lemire 58ac242770 Ok. Let us benchmark this thing. 2018-11-27 15:05:50 -05:00
Daniel Lemire a43b0772e1 Lots and lots of cleaning. 2018-11-27 14:37:59 -05:00
Daniel Lemire 5fae7b2100 Still working 2018-11-27 10:10:39 -05:00
Daniel Lemire 86a75462c5 Adding the ability of doing a dump. 2018-11-23 22:20:57 -05:00
Daniel Lemire 5bdf19bb18 Removing parsers that are unfair. 2018-11-20 20:08:02 -05:00
Daniel Lemire 21ee490d18 Added memcpy. 2018-11-20 18:06:03 -05:00
Daniel Lemire e4d4158e3f Added dependencies. 2018-11-20 16:43:22 -05:00
Daniel Lemire bbff6c3edb Added another ref. 2018-11-20 14:32:12 -05:00
Daniel Lemire 7647cb2e49 Added dropbox 2018-11-20 14:09:43 -05:00
Daniel Lemire 76074a821f Various cleaning steps. 2018-11-09 21:31:14 -05:00
Daniel Lemire df65de4ae2 Tuning presentation and fixing a problem with minifier benchmark. 2018-10-23 21:36:32 -04:00
Daniel Lemire 6cc5131f7a Adding an allparserscheckfile program. 2018-10-17 12:00:44 -04:00
Daniel Lemire aeacd26366 Adding mispredicted branch counts. 2018-10-04 09:47:34 -04:00
Daniel Lemire 930533b6da Normalizing the number of cycles. 2018-10-04 09:33:41 -04:00
Daniel Lemire 577d6792f4 Integrating sajson. 2018-09-28 00:00:52 -04:00
Daniel Lemire 1c8339297d With new number parser (faster!). Removing the dependency on the doubleconv library (which proves to be useless). 2018-09-26 23:35:33 -04:00
Geoff Langdale 9f91650e72 Remove old 4-stage path. 2018-09-26 15:22:55 +10:00
Daniel Lemire dee1bbe54e Integrating the new 3-stage approach. 2018-09-25 17:26:58 -04:00
Geoff Langdale 77bfe6c984 Fix some bad messages and the failure to parse key strings. 2018-09-24 10:54:29 +10:00
Geoff Langdale 053f04b15d Crude first cut of "stage34", a unified code-based DFA with explicit stack for stages 3 and 4. 2018-09-24 10:42:30 +10:00
Daniel Lemire 94ea7cefb0 Moving include files into a sensible subdirectory. 2018-08-20 17:51:38 -04:00
Daniel Lemire ef0d14c35c Minor fixes + new scripts. 2018-08-20 17:40:50 -04:00
Daniel Lemire e76d25425a Another missing file. 2018-08-20 17:30:30 -04:00
Daniel Lemire f814bf6eab Mising file. 2018-08-20 17:30:00 -04:00
Daniel Lemire fb65be64bb Major surgery. 2018-08-20 17:27:25 -04:00
Daniel Lemire 726eb5a030 Moved the files into subdirectories. 2018-08-20 14:45:51 -04:00