Commit Graph

24 Commits

Author SHA1 Message Date
John Keiser 165e23773f Refactor stage 2 into structural_parser class 2020-01-02 13:12:22 -07:00
John Keiser d7c83397e4 lookup+cont-check algorithm 2019-12-18 14:37:21 -08:00
Daniel Lemire 102262c7ab
Fixing issue386 (#396)
* Creating arch-specific bitmanipulation.h files.
* Improving system and compiler portability.
* We want to allow trailing_zeroes on zero inputs.
2019-12-16 19:09:18 -05:00
John Keiser 7356b4532f Perform UTF-8 detection via flag lookup algorithm
- adds the alternative zwegner, range and lookup utf8 algorithms as well, for
ability to do "shootouts"
2019-11-25 11:49:44 -08:00
John Keiser 7d7bec856d Remove lookup_lower_4_bits
It's only a coincidence that it works in current uses: it doesn't do
what the name says. Particularly, if the high bit is 1 it will yield
0 even if the lower 4 bits would yield something else.
2019-11-25 11:49:44 -08:00
John Keiser ce824f8653 Decrease stage 1 step size to 64 bytes on Westmere/ARM
- Templatize scan_step() with STAGE1_STEP_SIZE
- Fix simd8::store()
- add NUM_CHUNKS to simd8
2019-11-18 21:58:07 -08:00
Jeremie Piotte bdc2b07339
Streams of JSON documents + Large files (>4GB) (#350) (#364)
* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* Fix for https://github.com/lemire/simdjson/issues/345

* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)

* Final (?) fix for https://github.com/lemire/simdjson/issues/345

* Verbose basictest

* Being more forgiving of powers of ten.

* Let us zero the tail end.

* add basic fuzzers (#348)

* add basic fuzzing using libFuzzer

* let cmake respect cflags, otherwise the fuzzer flags go unnoticed

also, integrates badly with oss-fuzz

* add new fuzzer for minification, simplify the old one

* add fuzzer for the dump example

* clang format

* adding Paul Dreik

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* Fixing issue 351 (#352)

* Fixing issues 351 and 353

* minor fixes and cleaning.

* removing warnings

* removing some copies

* Fix ARM compile errors on g++ 7.4 (#354)

* Fix ARM compilation errors

* Update singleheader

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* fix integer overflow in subnormal_power10 (#355)

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* Adding new test file, following https://github.com/lemire/simdjson/pull/355

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* merging main

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* rough prototype working.  Needs more test and fine tuning.

* minor fixes and cleaning.

* adding jsonstream to amalgamation

* merged main into branch

* Addind a JsonStream Demo to Amalgamation

* merging main

* merging main

* make file fix
2019-11-08 17:39:45 -05:00
John Keiser 3828e1e538 Fix performance issues:
1. Don't recast "int" result of movemask to uint32_t
2. Call max_epu8 with the mask first and the bytes second.
2019-11-05 13:44:04 -08:00
John Keiser d89046d515 Use simd8 helpers for find_bs_bits_and_quote_bits 2019-11-05 13:44:04 -08:00
John Keiser 4bc128f07e Move compute_quote_mask to generic bitmask library 2019-11-05 13:44:04 -08:00
John Keiser e383b7a6ab Use generic simd operators for find_whitespace_and_operators 2019-11-05 13:37:56 -08:00
John Keiser c89d6bf68b Genericize utf-8 check 2019-11-05 13:37:32 -08:00
John Keiser 64872bddf4 Eliminate stage1_find_marks_flatten.h 2019-10-14 12:33:46 -07:00
John Keiser 9bbd6bd874 Move headers to implementation area
- jsoncharutils.h, numberparsing.h, simdprune_tables.h
2019-10-14 11:51:41 -07:00
John Keiser 69caa477fb Use struct for UTF-8 checks, remove templating
- Removes templating from simd_input, utf8_checker, and parse_string
- Make drone gcc run a lot faster
- Make drone clang run a little faster (NOTE:
https://hub.docker.com/r/silkeh/clang helps even more, but I wasn't sure
whether we wanted to trust that)
- Make drone arm run in parallel to get results quicker
2019-10-08 17:58:45 -07:00
John Keiser de8df0a05f Combined performance patch (5% overall, 15% stage 1) (#317)
* Allow -f

* Support parse -s (force sse)

* Simplify flatten_bits

- Add directly to base instead of storing variable
- Don't modify base_ptr after beginning of function
- Eliminate base variable and increment base_ptr instead

* De-unroll the flatten_bits loops

* Decrease dependencies in stage 1

- Do all finalize_structurals work before computing the quote mask; mask
  out the quote mask later
- Join find_whitespace_and_structurals and finalize_structurals into
  single find_structurals call, to reduce variable leakage
- Rework pseudo_pred algorithm to refer to "primitive" for clarity and some
  dependency reduction
- Rename quote_mask to in_string to describe what we're trying to
  achieve ("mask" could mean many things)
- Break up find_quote_mask_and_bits into find_quote_mask and
  invalid_string_bytes to reduce data leakage (i.e. don't expose quote bits
  or odd_ends at all to find_structural_bits)
- Genericize overflow methods "follows" and "follows_odd_sequence" for
  descriptiveness and possible lifting into a generic simd parsing library

* Mark branches as likely/unlikely

* Reorder and unroll+interleave stage 1 loop

* Nest the cnt > 16 branch inside cnt > 8
2019-10-01 12:01:08 -04:00
John Keiser f7e893667d Use simd_input generic methods for utf8 checking (#301)
* Use generic each/reduce in simdutf8check

* Remove macros from generic simd_input uses

* Use array instead of members to store simd registers

* Default local checkperf to clone from .
2019-09-02 12:46:05 -04:00
John Keiser 7f249cd179 Use non-interleaved map() to make structurals clearer (#304) 2019-08-29 21:38:41 -04:00
John Keiser f4fa5b7340 Add MAP_CHUNKS2, make parameter name related to input 2019-08-26 09:46:49 -07:00
John Keiser 169568ca47 Use map() to interleave instructions for parallelism 2019-08-26 09:46:49 -07:00
John Keiser 9cc4ddfc88 Use map().to_bitmask() instead of build_bitmask() 2019-08-26 09:46:49 -07:00
John Keiser da0f1cacea Remove static modifiers 2019-08-26 09:46:48 -07:00
John Keiser b01222518d Genericize bitmask building to make algorithms clearer 2019-08-26 09:46:48 -07:00
John Keiser 585f84a734 Move architecture-specific headers to src/ (#287)
* Use namespaces instead of templates for stage1 impls

* Move stage1 implementation into the src/ directory

* Move architecture-specific code to src/
2019-08-21 07:59:49 -04:00