Commit Graph

228 Commits

Author SHA1 Message Date
John Keiser 76c706644a
Move stage 2 tape writing to ParsedJson ()
This is a first step to allowing alternate tape formats.
2020-02-04 14:28:42 -08:00
Daniel Lemire c924aaede9
Fix issue472: make JsonStream a template. ()
* Fix issue472: make JsonStream a template.

* Adding missing include.

* Tweaking headers and some minor formatting.

* Removing file from aggregation.

* Moving jsoncharutils

* Adding new header.

* Trying another header.

* Let us try to route around Visual Studio's nonesense.
2020-01-30 17:16:41 -05:00
Daniel Lemire 28710f8ad5
fix for Issue 467 ()
* Fix for issue467

* Updating single-header

* Let us make it so that JsonStream is constructed from a padded_string which will avoid dangerous overruns.

* Fixing parse_stream

* Updating documentation.
2020-01-29 19:00:18 -05:00
Daniel Lemire 3488c49d0a
Basically, haswell processor should be able to count on lzcnt. () 2020-01-22 16:52:55 -05:00
John Keiser adaef43bc6 Find all escaped characters with simpler algorithm () 2020-01-22 14:11:14 -05:00
Daniel Lemire 80b4dd2e8a
Removing all stdout, stderr from main library. ()
* Removing all stdout,stderr from main library.
2020-01-20 16:03:15 -05:00
Daniel Lemire ab6d4871d8
Adding haswell amal. tests ()
* Adding an extra test.

* Disabling the AVX-accelerated minifier.

* Updating amalgamation.
2020-01-15 19:49:11 -05:00
Daniel Lemire f611b65bc0
This updates the minifier. () 2020-01-15 13:45:32 -05:00
Daniel Lemire a804351a76
I think that i and idx should be size_t (64-bit). () 2020-01-13 17:42:52 -05:00
dbj 85e84fc1fa improved string padded ()
* dirent portable latest version

* improved

std::string argument passed by const reference
ctor added with std::string_view  argument
`allocate_padded_buffer()`  moved here with **optional** check on `length < 1`

* allocate_padded_buffer moved to padded_string.h
2020-01-10 10:15:48 -05:00
Daniel Lemire 951c4bedf8
Simpler jsonstream ()
* One simplification.

* Removing untested functions.
2020-01-07 19:10:02 -05:00
Daniel Lemire 4c0c1c9830 Updating a comment. 2020-01-06 22:01:23 -05:00
Daniel Lemire a9e990251d
removing left over debug 2020-01-04 12:50:04 -05:00
Daniel Lemire 7bde23590a
Debugging jsonstream ()
Fixes  (and provide tests for it), as well as 
2020-01-03 22:22:47 -05:00
Daniel Lemire 5042dd52ce
This is implementing @jkeiser optimization idea. () 2020-01-03 09:21:36 -05:00
Daniel Lemire a2d05b21ff Merge branch 'master' of github.com:lemire/simdjson 2020-01-02 15:27:00 -05:00
Daniel Lemire f4f5f670a2 Better documentation of the padding. 2020-01-02 15:25:03 -05:00
John Keiser 165e23773f Refactor stage 2 into structural_parser class 2020-01-02 13:12:22 -07:00
Paul Dreik 399d08c86c use unique_ptr in class parsedjson ()
* refactor parsedjson to use unique_ptr instead of owning raw pointer
* fix a potential undefined behavior
* output only first cpu in /proc/cpuinfo
2019-12-31 14:31:45 -05:00
Daniel Lemire 6f799435b6 Removing commented out stuff. 2019-12-30 22:21:04 -05:00
John Keiser d7c83397e4 lookup+cont-check algorithm 2019-12-18 14:37:21 -08:00
Daniel Lemire 1d621bba37 Being more explicit about EMPTY errors. 2019-12-18 14:39:48 +00:00
John Keiser e2f349e7bd Measure impact of utf-8 blocks and structurals per block directly 2019-12-17 11:41:13 -08:00
Daniel Lemire 102262c7ab
Fixing issue386 ()
* Creating arch-specific bitmanipulation.h files.
* Improving system and compiler portability.
* We want to allow trailing_zeroes on zero inputs.
2019-12-16 19:09:18 -05:00
Daniel Lemire f02babe427 Adding analysis by @sebpop from https://github.com/lemire/simdjson/pull/391#issuecomment-565551462 2019-12-13 13:39:15 -05:00
Daniel Lemire fc6133b58f
Fixes issue 388 () 2019-12-11 08:13:29 -05:00
mswilson d33208c7db Correct detection of NEON support ()
... as the test as it is currently implemented will always evaluate to true.

Fixes 
2019-12-10 13:12:17 -05:00
Daniel Lemire c9cd8e6211
PMULL is slow on ARM64, let us not rely on it? () 2019-12-09 17:15:34 -05:00
Daniel Lemire 1211c01ca1
Resolves issue 186 ()
* Resolves issue 186
https://github.com/lemire/simdjson/issues/186
2019-12-02 12:23:45 -05:00
Jeremie Piotte 4e1c90f76f
Fix memory allocation of the max_depth in JsonStream. 2019-11-28 13:55:31 -05:00
Jeremie Piotte f163155929 JsonStream documentation ()
* adding Multiline JSON competition chart to doc
* Completing the comments for JsonStream
* Adding a page for JsonStream's documentation.
2019-11-25 18:11:55 -05:00
John Keiser 9b6377fd80 Precalculate the ASCII path 2019-11-25 11:49:44 -08:00
John Keiser 7356b4532f Perform UTF-8 detection via flag lookup algorithm
- adds the alternative zwegner, range and lookup utf8 algorithms as well, for
ability to do "shootouts"
2019-11-25 11:49:44 -08:00
John Keiser 7d7bec856d Remove lookup_lower_4_bits
It's only a coincidence that it works in current uses: it doesn't do
what the name says. Particularly, if the high bit is 1 it will yield
0 even if the lower 4 bits would yield something else.
2019-11-25 11:49:44 -08:00
Paul Dreik 6d14afd80e
Make threads optional in the cmake build ()
Only the simdjson library should optionally depend on threads,
the executables that link to simdjson will get the dependency
indirectly.

* add option for controlling threads (default is on)
* add CI testing with threading on/off for msvc, gcc and clang
* fix an unrelated copy paste comment error in the cirlce ci build conf
2019-11-22 21:51:46 +01:00
Jeremie Piotte 29fc51522a
Introducing concurrency mode in JsonStream. ()
* JsonStream threaded prototype

* JsonStream Threaded version working. Still supporting non-threaded version.

* Fix where invalid files would enter infinite loop.

* SingleHeader update

* I will remove -pthread in cmake for now.

* Attempt at resolving the -pthread issue
2019-11-21 11:22:06 -05:00
John Keiser ce824f8653 Decrease stage 1 step size to 64 bytes on Westmere/ARM
- Templatize scan_step() with STAGE1_STEP_SIZE
- Fix simd8::store()
- add NUM_CHUNKS to simd8
2019-11-18 21:58:07 -08:00
John Keiser 708f4a094d Move inline functions out of class definition for templating 2019-11-18 21:58:07 -08:00
Daniel Lemire 58d249ca16
Introducing move assignments. () 2019-11-09 10:34:32 -05:00
Jeremie Piotte bdc2b07339
Streams of JSON documents + Large files (>4GB) () ()
* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* Fix for https://github.com/lemire/simdjson/issues/345

* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 ()

* Final (?) fix for https://github.com/lemire/simdjson/issues/345

* Verbose basictest

* Being more forgiving of powers of ten.

* Let us zero the tail end.

* add basic fuzzers ()

* add basic fuzzing using libFuzzer

* let cmake respect cflags, otherwise the fuzzer flags go unnoticed

also, integrates badly with oss-fuzz

* add new fuzzer for minification, simplify the old one

* add fuzzer for the dump example

* clang format

* adding Paul Dreik

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* Fixing issue 351 ()

* Fixing issues 351 and 353

* minor fixes and cleaning.

* removing warnings

* removing some copies

* Fix ARM compile errors on g++ 7.4 ()

* Fix ARM compilation errors

* Update singleheader

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* fix integer overflow in subnormal_power10 ()

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* Adding new test file, following https://github.com/lemire/simdjson/pull/355

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* merging main

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* rough prototype working.  Needs more test and fine tuning.

* minor fixes and cleaning.

* adding jsonstream to amalgamation

* merged main into branch

* Addind a JsonStream Demo to Amalgamation

* merging main

* merging main

* make file fix
2019-11-08 17:39:45 -05:00
Daniel Lemire c4f1baad31
Making get_corpus safer () 2019-11-06 12:22:42 -05:00
John Keiser 3828e1e538 Fix performance issues:
1. Don't recast "int" result of movemask to uint32_t
2. Call max_epu8 with the mask first and the bytes second.
2019-11-05 13:44:04 -08:00
John Keiser d89046d515 Use simd8 helpers for find_bs_bits_and_quote_bits 2019-11-05 13:44:04 -08:00
John Keiser 4bc128f07e Move compute_quote_mask to generic bitmask library 2019-11-05 13:44:04 -08:00
John Keiser e383b7a6ab Use generic simd operators for find_whitespace_and_operators 2019-11-05 13:37:56 -08:00
John Keiser c89d6bf68b Genericize utf-8 check 2019-11-05 13:37:32 -08:00
Paul Dreik cf493254b7 fix integer overflow in subnormal_power10 ()
detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714
2019-11-04 16:54:03 -05:00
John Keiser c97eb41dc6 Fix ARM compile errors on g++ 7.4 ()
* Fix ARM compilation errors

* Update singleheader
2019-11-04 10:36:34 -05:00
Daniel Lemire b1224a77db
Fixing issue 351 ()
* Fixing issues 351 and 353
2019-11-01 16:05:28 -04:00
Daniel Lemire 15740500af Let us zero the tail end. 2019-10-24 18:49:30 -04:00