Commit Graph

115 Commits

Author SHA1 Message Date
John Keiser 4dc2adf7f8 Update README, add README examples 2020-02-18 08:37:07 -08:00
John Keiser 1f76737510 Make valstat-ish parse APIs 2020-02-18 08:37:07 -08:00
John Keiser bc8bc7d1a8
Lowercase Architecture and ErrorValues (#487)
ErrorValues -> error_code, Architecture -> architecture
2020-02-14 15:21:28 -08:00
John Keiser 8e7d1a5f09
Separate document state from ParsedJson
This creates a "document" class with only user-facing document state (no parser internals).

- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)

Usage:

```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```

```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
2020-02-07 10:02:36 -08:00
Daniel Lemire c924aaede9
Fix issue472: make JsonStream a template. (#473)
* Fix issue472: make JsonStream a template.

* Adding missing include.

* Tweaking headers and some minor formatting.

* Removing file from aggregation.

* Moving jsoncharutils

* Adding new header.

* Trying another header.

* Let us try to route around Visual Studio's nonesense.
2020-01-30 17:16:41 -05:00
John Keiser 6978a0b8d4 Benchmark escapes (#464)
* Add escapes as a feature we benchmark

* Don't print effectiveness metric unless verbose is on
2020-01-27 09:58:14 -05:00
Daniel Lemire aea79912ec
Adding a "get_corpus" benchmark. (#456)
* Adding a "get_corpus" benchmark.

* Improving portability.
2020-01-20 17:27:25 -05:00
Daniel Lemire 7bde23590a
Debugging jsonstream (#432)
Fixes #424 (and provide tests for it), as well as #401
2020-01-03 22:22:47 -05:00
John Keiser 60916318f7 Show miss rate, make it more accurate 2019-12-18 14:38:25 -08:00
John Keiser d7c83397e4 lookup+cont-check algorithm 2019-12-18 14:37:21 -08:00
John Keiser e2f349e7bd Measure impact of utf-8 blocks and structurals per block directly 2019-12-17 11:41:13 -08:00
Daniel Lemire 102262c7ab
Fixing issue386 (#396)
* Creating arch-specific bitmanipulation.h files.
* Improving system and compiler portability.
* We want to allow trailing_zeroes on zero inputs.
2019-12-16 19:09:18 -05:00
Daniel Lemire c9cd8e6211
PMULL is slow on ARM64, let us not rely on it? (#391) 2019-12-09 17:15:34 -05:00
John Keiser 7356b4532f Perform UTF-8 detection via flag lookup algorithm
- adds the alternative zwegner, range and lookup utf8 algorithms as well, for
ability to do "shootouts"
2019-11-25 11:49:44 -08:00
Jeremie Piotte 29fc51522a
Introducing concurrency mode in JsonStream. (#373)
* JsonStream threaded prototype

* JsonStream Threaded version working. Still supporting non-threaded version.

* Fix where invalid files would enter infinite loop.

* SingleHeader update

* I will remove -pthread in cmake for now.

* Attempt at resolving the -pthread issue
2019-11-21 11:22:06 -05:00
Jeremie Piotte bdc2b07339
Streams of JSON documents + Large files (>4GB) (#350) (#364)
* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* Fix for https://github.com/lemire/simdjson/issues/345

* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)

* Final (?) fix for https://github.com/lemire/simdjson/issues/345

* Verbose basictest

* Being more forgiving of powers of ten.

* Let us zero the tail end.

* add basic fuzzers (#348)

* add basic fuzzing using libFuzzer

* let cmake respect cflags, otherwise the fuzzer flags go unnoticed

also, integrates badly with oss-fuzz

* add new fuzzer for minification, simplify the old one

* add fuzzer for the dump example

* clang format

* adding Paul Dreik

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* Fixing issue 351 (#352)

* Fixing issues 351 and 353

* minor fixes and cleaning.

* removing warnings

* removing some copies

* Fix ARM compile errors on g++ 7.4 (#354)

* Fix ARM compilation errors

* Update singleheader

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* fix integer overflow in subnormal_power10 (#355)

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* Adding new test file, following https://github.com/lemire/simdjson/pull/355

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* merging main

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* rough prototype working.  Needs more test and fine tuning.

* minor fixes and cleaning.

* adding jsonstream to amalgamation

* merged main into branch

* Addind a JsonStream Demo to Amalgamation

* merging main

* merging main

* make file fix
2019-11-08 17:39:45 -05:00
Daniel Lemire 3439ce19c9
Adding a flag which allows us to disable AVX detection. This exposes a bug. (#356) 2019-11-06 10:39:26 -05:00
John Keiser 4bc128f07e Move compute_quote_mask to generic bitmask library 2019-11-05 13:44:04 -08:00
John Keiser c89d6bf68b Genericize utf-8 check 2019-11-05 13:37:32 -08:00
John Keiser 64872bddf4 Eliminate stage1_find_marks_flatten.h 2019-10-14 12:33:46 -07:00
John Keiser 9bbd6bd874 Move headers to implementation area
- jsoncharutils.h, numberparsing.h, simdprune_tables.h
2019-10-14 11:51:41 -07:00
John Keiser 69caa477fb Use struct for UTF-8 checks, remove templating
- Removes templating from simd_input, utf8_checker, and parse_string
- Make drone gcc run a lot faster
- Make drone clang run a little faster (NOTE:
https://hub.docker.com/r/silkeh/clang helps even more, but I wasn't sure
whether we wanted to trust that)
- Make drone arm run in parallel to get results quicker
2019-10-08 17:58:45 -07:00
Daniel Lemire 92334a8e28 Better tests. 2019-09-02 12:32:44 -04:00
John Keiser bf8083888d Validate perf against master, not v0.2.1 2019-08-26 13:35:18 -07:00
John Keiser 585f84a734 Move architecture-specific headers to src/ (#287)
* Use namespaces instead of templates for stage1 impls

* Move stage1 implementation into the src/ directory

* Move architecture-specific code to src/
2019-08-21 07:59:49 -04:00
John Keiser c8d50a6060 Make perf validation more stable, check no-AVX as well (#275)
* Compare against v0.2.1, fail only if perf is less 6 times in a row

* Run AVX and no-AVX perf tests in Circle CI

* Set % difference threshold to 0.2%
2019-08-15 20:43:21 -04:00
John Keiser 875e2f9d0d check for performance degradation in CI (#270)
* Add -n and -w arguments

* Add Dockerfile that compares perf against master

* Add checkperf to .drone.yml

* Clone from github instead of .git since CI doesn't have .git
2019-08-12 16:03:56 -04:00
John Keiser f3c3afd4cd Use direct call to templated flatten_bits instead of if (#262)
* Use direct call to templated flatten_bits instead of if

* Put really_inline back on find_structural_bits_64
2019-08-08 15:09:17 -04:00
ioioioio 2a24567370
Replace macros by include files (#236) (#248)
* stage1 compiles without macros

* cleaning

* amalgation is weird but works

* macros are removed from stringparsing

* amalgation fixed

* Huge macros are removed.

* clang-format
2019-08-04 15:58:35 -04:00
Daniel Lemire 144b10b35d
simdjson vs. JSON for Modern C++ (#247)
* New competitor.

* Fixing makefile.
2019-08-02 19:48:34 -04:00
Daniel Lemire 065805d6e1 Fixing amalgamation. 2019-07-30 09:10:40 -04:00
Daniel Lemire a53d95099c
Intrinsic-based flatten (#234)
* Providing a flatten function with intrinsics (for Visual Studio).
2019-07-29 13:28:02 -04:00
Daniel Lemire eba02dc1b9 Runtime dispatch
* Attempt 1 - fn targeting

GCC won't work with templates with different targets, need to specialize all the way up the call stack.

* Compiles properly with cmake. Does not with the Makefile.

* Compilation works with Makefile

* instruction_set changes to architecture

* some aesthetic changes

* fix amalgation and tests + aesthetic changes

* This now compiles and passes tests under CLANG

* Minor correction.

* Trying to make it work on ARM

* Adding missing namespace

* Missing bracket

* Fixing minor compilation issues.

* Getting parse to use runtime dispatch

* Fixing amalgamation script.

* Making sure that NEON is supported.

* Fixing typo

* Merging https://github.com/lemire/simdjson/pull/229

* Manual merge of
https://github.com/lemire/simdjson/pull/229
by @jkeiser  (second part)

* Trying another way.

* Removing the paral.

* Fixing the make file

* Let us make the practice run long enough.

* Resolved the awful slowness.

* Cleaning the README.md

* With runtime dispatching, we should not need flags anymore.

* Changing isa detection file's name + fixing typos.
2019-07-28 22:46:33 -04:00
ioioioio bcabdfc1ae Json pointer (#220)
* json pointer support

* Addition of tests for the json pointer

* Adding a new tool for the JSON Pointer support, and some documentation.
2019-07-26 18:38:10 -04:00
Daniel Lemire 47c6490115 Tweaking makefile. 2019-07-11 11:54:17 -04:00
Daniel Lemire 8ace2ba194
tput may fail. (#210) 2019-07-09 13:24:50 -04:00
Daniel Lemire d7b9a29dc6 Adding comments. 2019-07-04 19:10:05 -04:00
Daniel Lemire 2b2d93b05f Various minor tweaks. 2019-07-04 17:19:05 -04:00
Daniel Lemire f598a8666c Removing duplicate flag. 2019-07-04 15:59:43 -04:00
Daniel Lemire 85b5ccf5ae Removing swear word. 2019-07-04 15:57:51 -04:00
Daniel Lemire 8914b12db5 On ARM NEON, we need crypto support. 2019-06-19 16:45:09 -04:00
Daniel Lemire cf6f231be6 Allowing users to provide additional flags. 2019-06-03 14:17:42 -04:00
Daniel Lemire ba8aa46cd0 Makefile: A git failure should not be considered a critical failure. 2019-05-29 12:03:07 -04:00
Daniel Lemire 47beaff152 Adding white-listing for memory sanitizer. 2019-05-19 11:18:54 -04:00
Daniel Lemire f75280ac9c
Fix for issue 150 (#162)
* Checks for issue 150. We run through the test files with sanitizers on.

* Fix for issue 150: the remaining issues were an overrun on the depth capacity and an "off-by-1" overrun on tape capacity.

* Improving makefile.

* Safer git submodule command.

* Getting get 'git' on circleci
2019-05-09 20:51:33 -04:00
Daniel Lemire f0574d492c
Fix for issue 154 (#157)
* Changes necessary to reproduce

https://github.com/lemire/simdjson/issues/154

* Fixing issue 154.
2019-05-08 22:33:11 -04:00
Daniel Lemire 0d81fd287e
With this commit we can do all tests with full sanitizers on, and get no warning (#132)
* Making sure we can run with the sanitizers on.
* Minor code simplification in the number parsing.
* Following @EmilGedda 's recommendations regarding the makefile.
* Reference to blog post.
* Adding link to https://johnnylee-sde.github.io/Fast-numeric-string-to-int/
* Better hex parsing.
2019-04-24 17:31:47 -04:00
Emil Gedda f4a06036c6 Allows additional C(XX)FLAGS to be passed through command line (#142)
* Allow passing additional compiler flags through command line

* Simplify branching for compiler flags

* Optimize for debug while debugging or sanitizing specified
2019-04-16 22:07:03 -04:00
Daniel Lemire df8f792183
Store the string lengths on the string tape (#101)
* Store string length in the string-tape item.
* Files are now limited to 4GB.
* Moving detection of unescaped chars to stage 1 to reduce the burden due to string parsing.

Fixes https://github.com/lemire/simdjson/issues/114

Fixes https://github.com/lemire/simdjson/issues/87
2019-03-13 19:32:57 -04:00
Thomas Navennec 352dd5e7fa Change parse_json return type from bool to int (#82)
* Added simdjerr namespace

* Updated jsonparser files

* updated stage1 and stage2

* removed stage2 inline function

* Added forgotten return statements

* Updated tools and benchmarks

* Corrected parenthesis

* Removed extra =

* Accidentally undid reinterpret_cast

* Better comments, undid a header name fuckup

* Added an errorMsg method, updated readme

* Removed useless header from stage2

* Updated single-header file

* added simdjerr.cpp contents to simdjson.cpp

* Made single header version work

* Updated singleheader test, fixed simdjson.cpp

* Renamed simdjerr namespace and files to simdjson

* Updating the amalgamation.
2019-03-02 17:18:45 -05:00