Commit Graph

137 Commits

Author SHA1 Message Date
John Keiser 835b640ebd Run quickstart example in CI 2020-03-29 15:45:12 -07:00
John Keiser 5aec2671ea Remove JsonStream. Use parse_many() instead. 2020-03-26 09:25:07 -07:00
John Keiser a0bce440a6 Remove document_iterator, document::iterator, ParsedJsonIterator
Keep ParsedJson::Iterator only, without template, in same form as
it was in 0.2
2020-03-25 18:26:51 -07:00
Daniel Lemire 5d1e3efce8
faster minifier (#568)
* Fallback should use our scalar code.
* parse should have a nicer error message.
* Making it so that "minify" can use different architectures.
* Let us change the minifier competition so that it tests all implementations.
* Documenting the untaken optimization opportunity.

Co-authored-by: John Keiser <john@johnkeiser.com>
2020-03-20 16:14:47 -04:00
John Keiser f1744f5495 Break out string/structural scanning from tokenizer 2020-03-18 10:40:06 -07:00
Daniel Lemire 5750a173ff
We should not need to specify the architecture under ARM. (#564) 2020-03-18 08:24:21 -04:00
Daniel Lemire e3a4fd9f93 Minor tweaks on fallback. 2020-03-17 14:59:47 -07:00
John Keiser 7cf3a7511b Add fallback implementation to CI
- Also add SIMDJSON_IMPLEMENTATION_HASWELL/WESTMERE/ARM64/FALLBACK=1/0 to
enable/disable various implemnentations
2020-03-17 14:59:47 -07:00
John Keiser af203aaf86 Add fallback parser for pre-SSE4.2 machines 2020-03-17 14:59:47 -07:00
John Keiser 1aaad223c0 Simplify atom parsing 2020-03-13 19:00:08 -07:00
John Keiser ac0899c043 Add error tests, doc_ref_result[] chaining 2020-03-11 17:19:41 -07:00
John Keiser 40c6213d7e Add parser.load() and load_many() to load files 2020-03-11 17:19:41 -07:00
John Keiser 31e8a12e88 Make error_message(error_code) return C string
- Also move all error message logic to include inline
2020-03-06 15:41:51 -08:00
John Keiser 9a7c8fb5be Use parse_many in examples/tests/docs 2020-03-05 12:04:45 -08:00
John Keiser cfef4ff2ad Create parser.parse_many() API 2020-03-05 12:04:45 -08:00
John Keiser 5ff941ae3d Include <simdjson> directly, move document_parser_callbacks to top level 2020-03-04 14:26:54 -08:00
John Keiser a55f41a24a Move JsonStream inline implementation to inline .h 2020-03-04 14:26:54 -08:00
John Keiser eb147d9868 Mark jsonformatutils.h/isadetection.h internal
- Move jsonformatutils.h to internal/jsonformatutils.h (it is used by
document::print_json)
- Move isadetection.h to src/ (it is only used internally)
2020-03-04 14:26:54 -08:00
John Keiser f58a5d534e Move parser inline implementation to .cpp 2020-03-04 14:26:54 -08:00
John Keiser b3ea8c406e Add simdjson.cpp for unified use (#515) 2020-03-04 10:12:27 -08:00
Daniel Lemire a98d841983
Minor tweaks. (#507) 2020-02-24 17:13:10 -05:00
John Keiser 910f272467
Add parser implementation interface and selection API (#501)
* Make architecture implementations virtual functions

- Easier to add new architectures (add implementation to implementation.cpp)
- Easier to add new algorithms / functions to architecture selection
(add to implementation.h, implement)
- Automatically select best implementation in static initialization
- Allow user to explicitly select implementation with a string (i.e.
parameter)
- Allow user to inspect current implementation name/description
- Allow user to list available implementations
- Eliminate architecture enum and architecture-based templating
- Add noexcept in non-inline functions

* Move implementation static methods to their own classes

* Detect best supported implementation on first use

* available_implementationsI() -> available_implementations
2020-02-21 16:34:27 -05:00
John Keiser 4dc2adf7f8 Update README, add README examples 2020-02-18 08:37:07 -08:00
John Keiser 1f76737510 Make valstat-ish parse APIs 2020-02-18 08:37:07 -08:00
John Keiser bc8bc7d1a8
Lowercase Architecture and ErrorValues (#487)
ErrorValues -> error_code, Architecture -> architecture
2020-02-14 15:21:28 -08:00
John Keiser 8e7d1a5f09
Separate document state from ParsedJson
This creates a "document" class with only user-facing document state (no parser internals).

- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)

Usage:

```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```

```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
2020-02-07 10:02:36 -08:00
Daniel Lemire c924aaede9
Fix issue472: make JsonStream a template. (#473)
* Fix issue472: make JsonStream a template.

* Adding missing include.

* Tweaking headers and some minor formatting.

* Removing file from aggregation.

* Moving jsoncharutils

* Adding new header.

* Trying another header.

* Let us try to route around Visual Studio's nonesense.
2020-01-30 17:16:41 -05:00
John Keiser 6978a0b8d4 Benchmark escapes (#464)
* Add escapes as a feature we benchmark

* Don't print effectiveness metric unless verbose is on
2020-01-27 09:58:14 -05:00
Daniel Lemire aea79912ec
Adding a "get_corpus" benchmark. (#456)
* Adding a "get_corpus" benchmark.

* Improving portability.
2020-01-20 17:27:25 -05:00
Daniel Lemire 7bde23590a
Debugging jsonstream (#432)
Fixes #424 (and provide tests for it), as well as #401
2020-01-03 22:22:47 -05:00
John Keiser 60916318f7 Show miss rate, make it more accurate 2019-12-18 14:38:25 -08:00
John Keiser d7c83397e4 lookup+cont-check algorithm 2019-12-18 14:37:21 -08:00
John Keiser e2f349e7bd Measure impact of utf-8 blocks and structurals per block directly 2019-12-17 11:41:13 -08:00
Daniel Lemire 102262c7ab
Fixing issue386 (#396)
* Creating arch-specific bitmanipulation.h files.
* Improving system and compiler portability.
* We want to allow trailing_zeroes on zero inputs.
2019-12-16 19:09:18 -05:00
Daniel Lemire c9cd8e6211
PMULL is slow on ARM64, let us not rely on it? (#391) 2019-12-09 17:15:34 -05:00
John Keiser 7356b4532f Perform UTF-8 detection via flag lookup algorithm
- adds the alternative zwegner, range and lookup utf8 algorithms as well, for
ability to do "shootouts"
2019-11-25 11:49:44 -08:00
Jeremie Piotte 29fc51522a
Introducing concurrency mode in JsonStream. (#373)
* JsonStream threaded prototype

* JsonStream Threaded version working. Still supporting non-threaded version.

* Fix where invalid files would enter infinite loop.

* SingleHeader update

* I will remove -pthread in cmake for now.

* Attempt at resolving the -pthread issue
2019-11-21 11:22:06 -05:00
Jeremie Piotte bdc2b07339
Streams of JSON documents + Large files (>4GB) (#350) (#364)
* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* Fix for https://github.com/lemire/simdjson/issues/345

* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)

* Final (?) fix for https://github.com/lemire/simdjson/issues/345

* Verbose basictest

* Being more forgiving of powers of ten.

* Let us zero the tail end.

* add basic fuzzers (#348)

* add basic fuzzing using libFuzzer

* let cmake respect cflags, otherwise the fuzzer flags go unnoticed

also, integrates badly with oss-fuzz

* add new fuzzer for minification, simplify the old one

* add fuzzer for the dump example

* clang format

* adding Paul Dreik

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* Fixing issue 351 (#352)

* Fixing issues 351 and 353

* minor fixes and cleaning.

* removing warnings

* removing some copies

* Fix ARM compile errors on g++ 7.4 (#354)

* Fix ARM compilation errors

* Update singleheader

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* fix integer overflow in subnormal_power10 (#355)

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* Adding new test file, following https://github.com/lemire/simdjson/pull/355

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* merging main

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* rough prototype working.  Needs more test and fine tuning.

* minor fixes and cleaning.

* adding jsonstream to amalgamation

* merged main into branch

* Addind a JsonStream Demo to Amalgamation

* merging main

* merging main

* make file fix
2019-11-08 17:39:45 -05:00
Daniel Lemire 3439ce19c9
Adding a flag which allows us to disable AVX detection. This exposes a bug. (#356) 2019-11-06 10:39:26 -05:00
John Keiser 4bc128f07e Move compute_quote_mask to generic bitmask library 2019-11-05 13:44:04 -08:00
John Keiser c89d6bf68b Genericize utf-8 check 2019-11-05 13:37:32 -08:00
John Keiser 64872bddf4 Eliminate stage1_find_marks_flatten.h 2019-10-14 12:33:46 -07:00
John Keiser 9bbd6bd874 Move headers to implementation area
- jsoncharutils.h, numberparsing.h, simdprune_tables.h
2019-10-14 11:51:41 -07:00
John Keiser 69caa477fb Use struct for UTF-8 checks, remove templating
- Removes templating from simd_input, utf8_checker, and parse_string
- Make drone gcc run a lot faster
- Make drone clang run a little faster (NOTE:
https://hub.docker.com/r/silkeh/clang helps even more, but I wasn't sure
whether we wanted to trust that)
- Make drone arm run in parallel to get results quicker
2019-10-08 17:58:45 -07:00
Daniel Lemire 92334a8e28 Better tests. 2019-09-02 12:32:44 -04:00
John Keiser bf8083888d Validate perf against master, not v0.2.1 2019-08-26 13:35:18 -07:00
John Keiser 585f84a734 Move architecture-specific headers to src/ (#287)
* Use namespaces instead of templates for stage1 impls

* Move stage1 implementation into the src/ directory

* Move architecture-specific code to src/
2019-08-21 07:59:49 -04:00
John Keiser c8d50a6060 Make perf validation more stable, check no-AVX as well (#275)
* Compare against v0.2.1, fail only if perf is less 6 times in a row

* Run AVX and no-AVX perf tests in Circle CI

* Set % difference threshold to 0.2%
2019-08-15 20:43:21 -04:00
John Keiser 875e2f9d0d check for performance degradation in CI (#270)
* Add -n and -w arguments

* Add Dockerfile that compares perf against master

* Add checkperf to .drone.yml

* Clone from github instead of .git since CI doesn't have .git
2019-08-12 16:03:56 -04:00
John Keiser f3c3afd4cd Use direct call to templated flatten_bits instead of if (#262)
* Use direct call to templated flatten_bits instead of if

* Put really_inline back on find_structural_bits_64
2019-08-08 15:09:17 -04:00