Commit Graph

549 Commits

Author SHA1 Message Date
John Keiser a1aea4588f Move document stream state to implementation 2020-06-08 15:21:54 -07:00
John Keiser 1d4fffb799 Fix fallback implementation 2020-06-08 15:21:52 -07:00
John Keiser 6f90f5dc5f Remove templating from finish() method 2020-06-08 15:20:56 -07:00
John Keiser 9dd6972d26 Remove impossible checks, add EMPTY check to normal parser 2020-06-08 15:20:56 -07:00
John Keiser d731a7d52c Privatize structural_parser 2020-06-08 15:20:56 -07:00
John Keiser 059468b74e Eliminate streaming_structural_parser subclass with templates 2020-06-08 15:20:56 -07:00
John Keiser 5e69fb782a Call a function to parse structurals 2020-06-08 15:20:56 -07:00
John Keiser a5beffda78 Remove streaming_structural_parser.h 2020-06-08 15:20:56 -07:00
John Keiser 7de7ce5fdc Move document stream state to implementation 2020-06-08 15:20:56 -07:00
John Keiser 0dbda65e44 Fix fallback implementation 2020-06-08 14:52:23 -07:00
John Keiser d43a4e9df9 Remove SUCCESS_AND_HAS_MORE (internal only value) 2020-06-07 16:20:55 -07:00
John Keiser ef63a84a3e Move document stream state to implementation 2020-06-07 16:20:44 -07:00
John Keiser 8c16ba372e Acknowledge that we always have a remainder 2020-06-06 16:46:38 -07:00
John Keiser 9be4a17687 Separate definition from declaration, arrange top down 2020-06-06 16:46:38 -07:00
John Keiser ed0c815735 Move unclosed array check to stage 2 2020-06-05 12:39:13 -07:00
Daniel Lemire 7a69da16e4
Fixing issue 906 (#912)
* Fixing issue 906

* Safe patching.

* Now with explanations.

* Bumping up memory allocation.

* Putting the patch back.

* fallback fixes.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-05 15:37:09 -04:00
Daniel Lemire 52f44de257
This introduces a tiny simplification in number parsing. (#910)
* This introduces a tiny simplification in number parsing.

* Removing unnecessary function.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-04 17:13:02 -04:00
John Keiser b75fa26dc1 Move containing_scope and ret_address to .cpp 2020-06-01 12:15:55 -07:00
John Keiser 3d22a2d845 One weird trick: set a bogus error value in the parser impl
This makes us faster under both gcc and clang somehow.
2020-06-01 12:15:55 -07:00
John Keiser 1aab4752e2 Store all parser state in the implementation 2020-06-01 12:15:54 -07:00
John Keiser 86f8a4a9d2 Don't set parser.valid or parser.error
This regresses performance and is ONLY here because the next
two commits are here; this lets us see the impact of removing
parser.error separately from the impact of the next commit.
2020-06-01 12:14:09 -07:00
John Keiser db2cb061cb Remove on_error function
Solely here to make the next patch smaller and more isolatable
2020-06-01 12:14:09 -07:00
John Keiser 6a71b24495 Reuse stored buf and len from parser 2020-06-01 12:14:09 -07:00
John Keiser 84712a8bbc Store buf and len in parser implementation 2020-06-01 12:14:09 -07:00
John Keiser b86fb95306 Rename doc_parser -> parser 2020-06-01 12:14:09 -07:00
John Keiser a3a9bde83e Move DOM parsing into concrete interface implementation 2020-06-01 12:14:09 -07:00
Daniel Lemire 12150baa5e
Using just ASCII. (#899)
* Using just ASCII.

* Let us prune checkperf.

* Moving the description of lookup2 to the HACKING.md file.
2020-05-21 21:59:06 -04:00
John Keiser 4551e60f8b Don't write start object/array until the end 2020-05-21 14:28:47 -07:00
John Keiser 5651fbedc4 Add logging to stage 2 2020-05-21 09:47:19 -07:00
Daniel Lemire 40d57da83c
fixes issue 891 (#893) 2020-05-20 11:54:53 -04:00
John Keiser e6c9dfbd91 Make include files more fine-grained 2020-05-19 14:42:04 -07:00
John Keiser 64abc3e86c Include top-level .h files outside #if statements 2020-05-19 13:33:14 -07:00
John Keiser 7ad4020829 Make main compilation chunks into .cpp files 2020-05-19 13:32:35 -07:00
John Keiser 72ab0d11ff Move stage 1 and 2 files to their own directories 2020-05-19 13:30:34 -07:00
John Keiser 4ea866f050 Move stage2 classes into their own files 2020-05-19 13:30:34 -07:00
John Keiser a476531524 Share ref_address everywhere it's used 2020-05-19 13:30:34 -07:00
John Keiser dbb3316511 Move current_string_buf_loc to stage 2 2020-05-11 06:11:32 -07:00
John Keiser cd6f204c77 Move write_tape() to stage 2 code 2020-05-11 06:09:48 -07:00
John Keiser 269131ed21 Move on_number_* to stage 2 code 2020-05-11 06:04:54 -07:00
John Keiser 65d784e88e Move on_start/end_string to stage 2 code 2020-05-11 05:49:40 -07:00
John Keiser 35afb6cae0 Move on_error, on_success to stage 2 code 2020-05-11 05:46:18 -07:00
John Keiser 27bce09be8 Consolidate start_scope/end_scope 2020-05-11 05:40:02 -07:00
John Keiser 4f25b6ac0c Move on_end_* to stage 2 code 2020-05-11 05:34:49 -07:00
John Keiser 3d5ed1a7e3 Move on_start_* to stage 2 code 2020-05-11 05:30:35 -07:00
John Keiser a03115a4a6 Move end_scope to stage 2 code 2020-05-11 05:24:12 -07:00
John Keiser 7219d28a31 Call end_scope directly from stage 2 code 2020-05-11 05:20:04 -07:00
John Keiser 0875bce68f Don't pass depth to on_end_* 2020-05-11 05:15:39 -07:00
John Keiser 54fe302907 Don't pass depth to end_scope 2020-05-11 05:06:41 -07:00
John Keiser edaa8f811f Move on_start_* depth management to stage 2 code 2020-05-11 05:03:25 -07:00
John Keiser 2c8fd109de Move increment_count to stage 2 2020-05-11 04:58:50 -07:00
John Keiser 07fe7ad1a2 Use the same increment_count() everywhere 2020-05-11 04:48:15 -07:00
John Keiser 16d88cc095 Don't pass depth to increment_count 2020-05-11 04:15:02 -07:00
Daniel Lemire 3c3a4db54e
Compile under Visual Studio for ARM64 (#861)
* Modifications so that we can compile under Visual Studio for ARM64
* Let us throw appveyor at this beast.
2020-05-06 23:08:10 -04:00
John Keiser afb369950c Disable Intellisense-only warnings in simdjson.h/cpp 2020-05-04 11:47:04 -07:00
John Keiser 1d06624d38 Unset /D_CRT_SECURE_NO_WARNINGS
- Also localize DISABLE_DEPRECATED_WARNING so that we catch other
  deprecations
2020-05-04 11:35:05 -07:00
Pavel P d40069a018 Disable deprecation warnings for VS builds
fopen/getenv are standard c++ that are not deprecated.
2020-05-04 11:34:00 -07:00
Furkan Usta af968c5b44 Merge branch 'master' of github.com:simdjson/simdjson into cmake-flags 2020-05-03 02:12:23 +03:00
Daniel Lemire 1c34707925 A gift to John. 2020-05-02 15:01:22 -04:00
Furkan Usta 293c104cc4 CMake: Separate public and private compilation flags
simdjson-internal-flags for macros and warnings
simdjson-flags for pthread, sanitizer, and libcpp
2020-05-02 04:08:47 +03:00
Daniel Lemire 9863f62321
Trying to avoid unused warnings in isa detection. (#846) 2020-05-01 11:43:31 -04:00
Daniel Lemire 8c45a18524
Hiding cpuid_*_bit as well as related enum. (#843) 2020-04-30 21:08:08 -04:00
Daniel Lemire 073ad0dada
This function is unused. (#842) 2020-04-30 19:45:45 -04:00
Furkan Usta 73d7d704c1 CMake: Remove export_private_library
Since we are exporting all the targets as part of the main simdjson target we do not need private
exports anymore
2020-04-30 02:06:19 +03:00
Daniel Lemire 2a1f8fa8f1
Provides support for clang under Windows. (#817) 2020-04-27 22:09:27 -04:00
PavelP 0514588175
Improves clang-cl build with Visual Studio (#809) 2020-04-27 08:59:32 -04:00
Pavel P 24a185d26b Amalgamate src/simdjson.cpp as-is
amalgamation.sh shouldn't change contents of src/simdjson.cpp by forcing dmalloc.h that didn't exist in non-amalgamated version and shouldn't change order of includes by placing simdjson.h at the top

fixes #739
2020-04-26 08:27:19 +06:00
Daniel Lemire c750095241
Trying to improve a bit. (#791)
* Trying to improve a bit.

* Correcting typo
2020-04-23 22:36:08 -04:00
John Keiser 66acab4130 Enable /sdl warnings on Windows 2020-04-23 15:12:21 -07:00
ostri 87acab0846
elimination of most of g++ -Weffc++ warnings (#764)
Co-authored-by: Matjaž Ostroveršnik <ostri@localhost.localdomain>
Co-authored-by: Daniel Lemire <lemire@gmail.com>
2020-04-23 10:06:44 -04:00
John Keiser a198abc485 Use int as index to reduce cast operations
Decreases the number of instructions per block by almost 1
2020-04-22 14:21:33 -07:00
John Keiser d4a37f6ef5 Enable conversion warnings on Linux and Windows 2020-04-22 14:21:30 -07:00
John Keiser a116e68a47
Merge pull request #729 from simdjson/jkeiser/cmake-amalgamate
Add amalgamation support to cmake
2020-04-22 14:16:10 -07:00
Daniel Lemire 536fe28f8f
Being explicit regarding the initialization of two member variables. (#765) 2020-04-22 12:46:55 -04:00
Daniel Lemire 5f04208dbd
This removes the problematic use of the intrinsic _addcarry_u64 for Visual Studio (#758)
in the ARM 64-bit kernel. This intrinsic does not appear in the documentation
https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=vs-2019
and should probably not be used. Note that we expect the compiler to produce
efficient code out of our implementation.
2020-04-21 17:13:22 -04:00
PavelP ffaa292006
Master vs2019 x86 compile fixes (#743)
* Added bitexact implementations of _BitScanForward64 and _BitScanReverse64 for VS2019 32-bit builds

* Added bitexact implementations of _umul128 for VS2019 x86, arm, arm64 builds

* Implement mul_overflow for VS2019 arm64 builds

 + implement mul_overflow using __umulh (msvc/clang results: https://godbolt.org/z/smRwA7)

* Added Win32 for VS2019 to .appveyor.yml

* Update amalgamated headers (fix x86 builds with VS2019)
2020-04-21 14:42:53 -04:00
John Keiser d3e44b1108 Add amalgamation support to cmake 2020-04-20 19:50:51 -07:00
John Keiser fbf274a42b
Merge pull request #727 from simdjson/jkeiser/cmake-checkperf
Add checkperf to cmake
2020-04-20 19:45:45 -07:00
Daniel Lemire 3c1b403c4e Fixing typo. 2020-04-20 18:19:38 -04:00
John Keiser 9bf9fba2ec Add checkperf to cmake 2020-04-20 11:14:46 -07:00
John Keiser 22b9a53bef Add SIMDJSON_FORCE_IMPLEMENTATION 2020-04-18 18:21:56 -07:00
John Keiser 289cc3e7a0 Treat warnings as errors during compilation 2020-04-15 19:59:38 -07:00
John Keiser fd418f568c Fix c++11 warnings on clang
- namespace x::y is C++17
- static_assert requires message in C++11
2020-04-15 17:27:48 -07:00
John Keiser 09cf18a646 Add C++11 tests to cmake
- Add simdjson-flags target so callers don't have flags forced on them
2020-04-15 17:26:25 -07:00
Daniel Lemire b523c43927
Can we provide a size() function to arrays and objects? (eager approach) [TO BE MERGED] (#690)
* This is an implementation of "size()" for arrays and objects.
* Adding benchmark
* Adding a size() remark in the documentation.
* Extending size() to result types.
2020-04-15 10:15:48 -04:00
John Keiser 6835dd73bc Only apply compile flags to simdjson 2020-04-09 08:52:29 -07:00
John Keiser beaa6a9a7a Create simdjson-windows-headers interface library 2020-04-08 14:52:56 -07:00
John Keiser a9c8224f40 Add numberparsingcheck and stringparsingcheck tests 2020-04-08 14:52:56 -07:00
John Keiser 3dcc188d93 Add more tests to cmake 2020-04-08 14:52:56 -07:00
John Keiser 54b7291c34 Reference simdjson by name, don't specify include files individually 2020-04-08 14:52:55 -07:00
John Keiser 1e30b6e334 Compile under C++ 11 2020-04-08 14:00:13 -07:00
John Keiser 406240bae3 Support C++ 14 2020-04-08 14:00:13 -07:00
John Keiser 7656bd50ee Generate API docs at /api/docs 2020-03-29 17:01:12 -07:00
John Keiser 03746b966b Move document/element/etc. under dom 2020-03-28 13:42:21 -07:00
Daniel Lemire 5d1e3efce8
faster minifier (#568)
* Fallback should use our scalar code.
* parse should have a nicer error message.
* Making it so that "minify" can use different architectures.
* Let us change the minifier competition so that it tests all implementations.
* Documenting the untaken optimization opportunity.

Co-authored-by: John Keiser <john@johnkeiser.com>
2020-03-20 16:14:47 -04:00
John Keiser f1744f5495 Break out string/structural scanning from tokenizer 2020-03-18 10:40:06 -07:00
John Keiser 5a071c1907 Remove TARGET_FALLBACK 2020-03-17 14:59:47 -07:00
John Keiser 7cf3a7511b Add fallback implementation to CI
- Also add SIMDJSON_IMPLEMENTATION_HASWELL/WESTMERE/ARM64/FALLBACK=1/0 to
enable/disable various implemnentations
2020-03-17 14:59:47 -07:00
John Keiser af203aaf86 Add fallback parser for pre-SSE4.2 machines 2020-03-17 14:59:47 -07:00
John Keiser 1a5d8f1957 Add tests for SIMDJSON_EXCEPTIONS=0, add `tie()` support 2020-03-17 13:54:37 -07:00
Daniel Lemire 758dc511fb Better comment. 2020-03-17 14:22:40 -04:00
Daniel Lemire 0164723a8e Adding comments. 2020-03-16 20:17:48 -04:00
Daniel Lemire da3e064fc7 Added a comment. 2020-03-15 22:35:21 -04:00
Daniel Lemire 317fc6ba0e
accurate number parsing (#558) 2020-03-15 22:30:21 -04:00
John Keiser 1aaad223c0 Simplify atom parsing 2020-03-13 19:00:08 -07:00
John Keiser 81c86d7090 Structural iterator 2020-03-13 19:00:08 -07:00
John Keiser 40c6213d7e Add parser.load() and load_many() to load files 2020-03-11 17:19:41 -07:00
Daniel Lemire f669aafcf2 Correcting typo 2020-03-09 17:55:26 -04:00
John Keiser 31e8a12e88 Make error_message(error_code) return C string
- Also move all error message logic to include inline
2020-03-06 15:41:51 -08:00
John Keiser b2220d6157 Fix amalgamation 2020-03-05 11:13:25 -08:00
John Keiser 5ff941ae3d Include <simdjson> directly, move document_parser_callbacks to top level 2020-03-04 14:26:54 -08:00
John Keiser 5525c6f729 Stop using jsoncharutils.h in JsonStream 2020-03-04 14:26:54 -08:00
John Keiser eb147d9868 Mark jsonformatutils.h/isadetection.h internal
- Move jsonformatutils.h to internal/jsonformatutils.h (it is used by
document::print_json)
- Move isadetection.h to src/ (it is only used internally)
2020-03-04 14:26:54 -08:00
John Keiser f58a5d534e Move parser inline implementation to .cpp 2020-03-04 14:26:54 -08:00
John Keiser b3ea8c406e Add simdjson.cpp for unified use (#515) 2020-03-04 10:12:27 -08:00
John Keiser 99667f7c55 Create top level simdjson.h (#515)
- Allows everyone to #include the same way, singleheader or not.
2020-03-04 10:12:27 -08:00
John Keiser 0b21203141 Document navigation API 2020-03-02 14:49:03 -08:00
John Keiser 910f272467
Add parser implementation interface and selection API (#501)
* Make architecture implementations virtual functions

- Easier to add new architectures (add implementation to implementation.cpp)
- Easier to add new algorithms / functions to architecture selection
(add to implementation.h, implement)
- Automatically select best implementation in static initialization
- Allow user to explicitly select implementation with a string (i.e.
parameter)
- Allow user to inspect current implementation name/description
- Allow user to list available implementations
- Eliminate architecture enum and architecture-based templating
- Add noexcept in non-inline functions

* Move implementation static methods to their own classes

* Detect best supported implementation on first use

* available_implementationsI() -> available_implementations
2020-02-21 16:34:27 -05:00
John Keiser 1f76737510 Make valstat-ish parse APIs 2020-02-18 08:37:07 -08:00
John Keiser bc8bc7d1a8
Lowercase Architecture and ErrorValues (#487)
ErrorValues -> error_code, Architecture -> architecture
2020-02-14 15:21:28 -08:00
John Keiser 8e7d1a5f09
Separate document state from ParsedJson
This creates a "document" class with only user-facing document state (no parser internals).

- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)

Usage:

```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```

```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
2020-02-07 10:02:36 -08:00
Daniel Lemire c879b56f41 Fixing logical error. 2020-02-07 10:44:17 -05:00
John Keiser 76c706644a
Move stage 2 tape writing to ParsedJson (#477)
This is a first step to allowing alternate tape formats.
2020-02-04 14:28:42 -08:00
Daniel Lemire c924aaede9
Fix issue472: make JsonStream a template. (#473)
* Fix issue472: make JsonStream a template.

* Adding missing include.

* Tweaking headers and some minor formatting.

* Removing file from aggregation.

* Moving jsoncharutils

* Adding new header.

* Trying another header.

* Let us try to route around Visual Studio's nonesense.
2020-01-30 17:16:41 -05:00
Daniel Lemire 28710f8ad5
fix for Issue 467 (#469)
* Fix for issue467

* Updating single-header

* Let us make it so that JsonStream is constructed from a padded_string which will avoid dangerous overruns.

* Fixing parse_stream

* Updating documentation.
2020-01-29 19:00:18 -05:00
Daniel Lemire 3488c49d0a
Basically, haswell processor should be able to count on lzcnt. (#458) 2020-01-22 16:52:55 -05:00
John Keiser adaef43bc6 Find all escaped characters with simpler algorithm (#450) 2020-01-22 14:11:14 -05:00
Daniel Lemire 80b4dd2e8a
Removing all stdout, stderr from main library. (#455)
* Removing all stdout,stderr from main library.
2020-01-20 16:03:15 -05:00
Daniel Lemire ab6d4871d8
Adding haswell amal. tests (#447)
* Adding an extra test.

* Disabling the AVX-accelerated minifier.

* Updating amalgamation.
2020-01-15 19:49:11 -05:00
Daniel Lemire f611b65bc0
This updates the minifier. (#446) 2020-01-15 13:45:32 -05:00
Daniel Lemire a804351a76
I think that i and idx should be size_t (64-bit). (#438) 2020-01-13 17:42:52 -05:00
dbj 85e84fc1fa improved string padded (#440)
* dirent portable latest version

* improved

std::string argument passed by const reference
ctor added with std::string_view  argument
`allocate_padded_buffer()`  moved here with **optional** check on `length < 1`

* allocate_padded_buffer moved to padded_string.h
2020-01-10 10:15:48 -05:00
Daniel Lemire 951c4bedf8
Simpler jsonstream (#436)
* One simplification.

* Removing untested functions.
2020-01-07 19:10:02 -05:00
Daniel Lemire 4c0c1c9830 Updating a comment. 2020-01-06 22:01:23 -05:00
Daniel Lemire a9e990251d
removing left over debug 2020-01-04 12:50:04 -05:00
Daniel Lemire 7bde23590a
Debugging jsonstream (#432)
Fixes #424 (and provide tests for it), as well as #401
2020-01-03 22:22:47 -05:00
Daniel Lemire 5042dd52ce
This is implementing @jkeiser optimization idea. (#431) 2020-01-03 09:21:36 -05:00
Daniel Lemire a2d05b21ff Merge branch 'master' of github.com:lemire/simdjson 2020-01-02 15:27:00 -05:00
Daniel Lemire f4f5f670a2 Better documentation of the padding. 2020-01-02 15:25:03 -05:00
John Keiser 165e23773f Refactor stage 2 into structural_parser class 2020-01-02 13:12:22 -07:00
Paul Dreik 399d08c86c use unique_ptr in class parsedjson (#417)
* refactor parsedjson to use unique_ptr instead of owning raw pointer
* fix a potential undefined behavior
* output only first cpu in /proc/cpuinfo
2019-12-31 14:31:45 -05:00
Daniel Lemire 6f799435b6 Removing commented out stuff. 2019-12-30 22:21:04 -05:00
John Keiser d7c83397e4 lookup+cont-check algorithm 2019-12-18 14:37:21 -08:00
Daniel Lemire 1d621bba37 Being more explicit about EMPTY errors. 2019-12-18 14:39:48 +00:00
John Keiser e2f349e7bd Measure impact of utf-8 blocks and structurals per block directly 2019-12-17 11:41:13 -08:00
Daniel Lemire 102262c7ab
Fixing issue386 (#396)
* Creating arch-specific bitmanipulation.h files.
* Improving system and compiler portability.
* We want to allow trailing_zeroes on zero inputs.
2019-12-16 19:09:18 -05:00
Daniel Lemire f02babe427 Adding analysis by @sebpop from https://github.com/lemire/simdjson/pull/391#issuecomment-565551462 2019-12-13 13:39:15 -05:00
Daniel Lemire fc6133b58f
Fixes issue 388 (#394) 2019-12-11 08:13:29 -05:00
mswilson d33208c7db Correct detection of NEON support (#392)
... as the test as it is currently implemented will always evaluate to true.

Fixes #389
2019-12-10 13:12:17 -05:00
Daniel Lemire c9cd8e6211
PMULL is slow on ARM64, let us not rely on it? (#391) 2019-12-09 17:15:34 -05:00
Daniel Lemire 1211c01ca1
Resolves issue 186 (#383)
* Resolves issue 186
https://github.com/lemire/simdjson/issues/186
2019-12-02 12:23:45 -05:00
Jeremie Piotte 4e1c90f76f
Fix memory allocation of the max_depth in JsonStream. 2019-11-28 13:55:31 -05:00
Jeremie Piotte f163155929 JsonStream documentation (#381)
* adding Multiline JSON competition chart to doc
* Completing the comments for JsonStream
* Adding a page for JsonStream's documentation.
2019-11-25 18:11:55 -05:00
John Keiser 9b6377fd80 Precalculate the ASCII path 2019-11-25 11:49:44 -08:00
John Keiser 7356b4532f Perform UTF-8 detection via flag lookup algorithm
- adds the alternative zwegner, range and lookup utf8 algorithms as well, for
ability to do "shootouts"
2019-11-25 11:49:44 -08:00
John Keiser 7d7bec856d Remove lookup_lower_4_bits
It's only a coincidence that it works in current uses: it doesn't do
what the name says. Particularly, if the high bit is 1 it will yield
0 even if the lower 4 bits would yield something else.
2019-11-25 11:49:44 -08:00
Paul Dreik 6d14afd80e
Make threads optional in the cmake build (#376)
Only the simdjson library should optionally depend on threads,
the executables that link to simdjson will get the dependency
indirectly.

* add option for controlling threads (default is on)
* add CI testing with threading on/off for msvc, gcc and clang
* fix an unrelated copy paste comment error in the cirlce ci build conf
2019-11-22 21:51:46 +01:00
Jeremie Piotte 29fc51522a
Introducing concurrency mode in JsonStream. (#373)
* JsonStream threaded prototype

* JsonStream Threaded version working. Still supporting non-threaded version.

* Fix where invalid files would enter infinite loop.

* SingleHeader update

* I will remove -pthread in cmake for now.

* Attempt at resolving the -pthread issue
2019-11-21 11:22:06 -05:00
John Keiser ce824f8653 Decrease stage 1 step size to 64 bytes on Westmere/ARM
- Templatize scan_step() with STAGE1_STEP_SIZE
- Fix simd8::store()
- add NUM_CHUNKS to simd8
2019-11-18 21:58:07 -08:00
John Keiser 708f4a094d Move inline functions out of class definition for templating 2019-11-18 21:58:07 -08:00
Daniel Lemire 58d249ca16
Introducing move assignments. (#363) 2019-11-09 10:34:32 -05:00
Jeremie Piotte bdc2b07339
Streams of JSON documents + Large files (>4GB) (#350) (#364)
* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* Fix for https://github.com/lemire/simdjson/issues/345

* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)

* Final (?) fix for https://github.com/lemire/simdjson/issues/345

* Verbose basictest

* Being more forgiving of powers of ten.

* Let us zero the tail end.

* add basic fuzzers (#348)

* add basic fuzzing using libFuzzer

* let cmake respect cflags, otherwise the fuzzer flags go unnoticed

also, integrates badly with oss-fuzz

* add new fuzzer for minification, simplify the old one

* add fuzzer for the dump example

* clang format

* adding Paul Dreik

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* Fixing issue 351 (#352)

* Fixing issues 351 and 353

* minor fixes and cleaning.

* removing warnings

* removing some copies

* Fix ARM compile errors on g++ 7.4 (#354)

* Fix ARM compilation errors

* Update singleheader

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* fix integer overflow in subnormal_power10 (#355)

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* Adding new test file, following https://github.com/lemire/simdjson/pull/355

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* merging main

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* rough prototype working.  Needs more test and fine tuning.

* minor fixes and cleaning.

* adding jsonstream to amalgamation

* merged main into branch

* Addind a JsonStream Demo to Amalgamation

* merging main

* merging main

* make file fix
2019-11-08 17:39:45 -05:00
Daniel Lemire c4f1baad31
Making get_corpus safer (#360) 2019-11-06 12:22:42 -05:00
John Keiser 3828e1e538 Fix performance issues:
1. Don't recast "int" result of movemask to uint32_t
2. Call max_epu8 with the mask first and the bytes second.
2019-11-05 13:44:04 -08:00
John Keiser d89046d515 Use simd8 helpers for find_bs_bits_and_quote_bits 2019-11-05 13:44:04 -08:00
John Keiser 4bc128f07e Move compute_quote_mask to generic bitmask library 2019-11-05 13:44:04 -08:00
John Keiser e383b7a6ab Use generic simd operators for find_whitespace_and_operators 2019-11-05 13:37:56 -08:00
John Keiser c89d6bf68b Genericize utf-8 check 2019-11-05 13:37:32 -08:00
Paul Dreik cf493254b7 fix integer overflow in subnormal_power10 (#355)
detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714
2019-11-04 16:54:03 -05:00
John Keiser c97eb41dc6 Fix ARM compile errors on g++ 7.4 (#354)
* Fix ARM compilation errors

* Update singleheader
2019-11-04 10:36:34 -05:00
Daniel Lemire b1224a77db
Fixing issue 351 (#352)
* Fixing issues 351 and 353
2019-11-01 16:05:28 -04:00
Daniel Lemire 15740500af Let us zero the tail end. 2019-10-24 18:49:30 -04:00
Daniel Lemire 59cad23aeb Merge branch 'master' of github.com:lemire/simdjson 2019-10-24 16:34:10 -04:00
Daniel Lemire da1c35d04b Final (?) fix for https://github.com/lemire/simdjson/issues/345 2019-10-24 16:33:37 -04:00
Daniel Lemire c469aed047
Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) 2019-10-24 16:06:29 -04:00
John Keiser 64872bddf4 Eliminate stage1_find_marks_flatten.h 2019-10-14 12:33:46 -07:00
John Keiser 81f2249575 Move stage1 into a class to pass fewer parameters 2019-10-14 12:33:46 -07:00
John Keiser 9bbd6bd874 Move headers to implementation area
- jsoncharutils.h, numberparsing.h, simdprune_tables.h
2019-10-14 11:51:41 -07:00
John Keiser 69caa477fb Use struct for UTF-8 checks, remove templating
- Removes templating from simd_input, utf8_checker, and parse_string
- Make drone gcc run a lot faster
- Make drone clang run a little faster (NOTE:
https://hub.docker.com/r/silkeh/clang helps even more, but I wasn't sure
whether we wanted to trust that)
- Make drone arm run in parallel to get results quicker
2019-10-08 17:58:45 -07:00
Juho Lauri cf9dbe583d improved const correctness (#321) 2019-10-02 14:25:28 -04:00
John Keiser de8df0a05f Combined performance patch (5% overall, 15% stage 1) (#317)
* Allow -f

* Support parse -s (force sse)

* Simplify flatten_bits

- Add directly to base instead of storing variable
- Don't modify base_ptr after beginning of function
- Eliminate base variable and increment base_ptr instead

* De-unroll the flatten_bits loops

* Decrease dependencies in stage 1

- Do all finalize_structurals work before computing the quote mask; mask
  out the quote mask later
- Join find_whitespace_and_structurals and finalize_structurals into
  single find_structurals call, to reduce variable leakage
- Rework pseudo_pred algorithm to refer to "primitive" for clarity and some
  dependency reduction
- Rename quote_mask to in_string to describe what we're trying to
  achieve ("mask" could mean many things)
- Break up find_quote_mask_and_bits into find_quote_mask and
  invalid_string_bytes to reduce data leakage (i.e. don't expose quote bits
  or odd_ends at all to find_structural_bits)
- Genericize overflow methods "follows" and "follows_odd_sequence" for
  descriptiveness and possible lifting into a generic simd parsing library

* Mark branches as likely/unlikely

* Reorder and unroll+interleave stage 1 loop

* Nest the cnt > 16 branch inside cnt > 8
2019-10-01 12:01:08 -04:00
Daniel Lemire 53b6deaeae
Safer handling of error codes, fixes https://github.com/lemire/simdjson/issues/318 (#319) 2019-09-29 12:12:15 -04:00
Opemipo 462858efa3 Fix Typo (#311)
escapted -> escaped
2019-09-12 10:16:33 -04:00
John Keiser f7e893667d Use simd_input generic methods for utf8 checking (#301)
* Use generic each/reduce in simdutf8check

* Remove macros from generic simd_input uses

* Use array instead of members to store simd registers

* Default local checkperf to clone from .
2019-09-02 12:46:05 -04:00
saka1 c1f27fb848 Accept large unsigned integers (#295)
* handle uint64 value in JSON
* Add integer_tests
* Add get_unsigned_integer() on  ParsedJson::BasicIterator
* Write 'u' to tape when the value seems unsigned
* Add to handle 'u' element
* Brush up integer_tests.cpp
* Append tests/integer_tests in .gitignore
* Add comments to is_integer and is_unsigned_integer
2019-09-02 10:50:24 -04:00
John Keiser 7f249cd179 Use non-interleaved map() to make structurals clearer (#304) 2019-08-29 21:38:41 -04:00
John Keiser f4fa5b7340 Add MAP_CHUNKS2, make parameter name related to input 2019-08-26 09:46:49 -07:00
John Keiser 169568ca47 Use map() to interleave instructions for parallelism 2019-08-26 09:46:49 -07:00
John Keiser 9cc4ddfc88 Use map().to_bitmask() instead of build_bitmask() 2019-08-26 09:46:49 -07:00
John Keiser 441963c84c Add AMD64 build_bitmask 2019-08-26 09:46:49 -07:00
John Keiser da0f1cacea Remove static modifiers 2019-08-26 09:46:48 -07:00
John Keiser b01222518d Genericize bitmask building to make algorithms clearer 2019-08-26 09:46:48 -07:00
John Keiser 585f84a734 Move architecture-specific headers to src/ (#287)
* Use namespaces instead of templates for stage1 impls

* Move stage1 implementation into the src/ directory

* Move architecture-specific code to src/
2019-08-21 07:59:49 -04:00
Vitaly Baranov e9be643db5 Fix condition in ParsedJson::allocate_capacity(). (#283) 2019-08-16 08:38:59 -04:00
Vitaly Baranov 6a2728e730 No allocation in the iterator's constructor (#276)
* Get rid of dynamic allocation in ParsedJson::Iterator.

* Implement copy assignment operator for ParsedJson::Iterator.

* ParsedJson::Iterator is now a template class.
2019-08-15 19:42:15 -04:00
Daniel Lemire 3fb82502f7
This gets rid of the silly ALLOW_SAME_PAGE_BUFFER_OVERRUN (#268) 2019-08-09 17:36:32 -04:00
Vitaly Baranov 0b927f059c Make dynamic dispatch free of TSan warnings (#256) 2019-08-08 16:16:35 -04:00
John Keiser f3c3afd4cd Use direct call to templated flatten_bits instead of if (#262)
* Use direct call to templated flatten_bits instead of if

* Put really_inline back on find_structural_bits_64
2019-08-08 15:09:17 -04:00
John Keiser b1beacd1f3 Make headers show up in Header Files in VS2019 (#257) 2019-08-05 16:36:52 -04:00
John Keiser d9a0e2b8f4 Fix Intellisense errors opening .h files on VS2019 (#253) 2019-08-04 19:57:55 -04:00
ioioioio 2a24567370
Replace macros by include files (#236) (#248)
* stage1 compiles without macros

* cleaning

* amalgation is weird but works

* macros are removed from stringparsing

* amalgation fixed

* Huge macros are removed.

* clang-format
2019-08-04 15:58:35 -04:00
Daniel Lemire 99a153d9e8
Hiding the pointer away... (#252)
* Hiding the runtime dispatch pointer in a source file so it is not an exported symbol
* Disabling hard failure on style check.
* Fixes https://github.com/lemire/simdjson/issues/250
2019-08-04 15:41:00 -04:00
Daniel Lemire 038b18edf1
Adding style scripts. (#243)
* Adding style scripts.
2019-08-01 16:09:26 -04:00
Daniel Lemire d83aef4e86 This should fix a warning in Visual Studio. 2019-07-31 18:12:58 -04:00
John Keiser bf59ba76f5 Fix most warnings on VS2019 (#241) 2019-07-31 17:43:45 -04:00
ioioioio c2eea8abba Style uniformization (#238)
* massive clang-format -style=LLVM

* naming harmonization

* adding commentary about sysinfoapi.h
2019-07-30 17:18:10 -04:00
Daniel Lemire 771e9cd68a
Trying again... (#235) 2019-07-29 13:55:13 -04:00
Daniel Lemire c328afee57 This should fix master. 2019-07-29 13:44:25 -04:00
Daniel Lemire a53d95099c
Intrinsic-based flatten (#234)
* Providing a flatten function with intrinsics (for Visual Studio).
2019-07-29 13:28:02 -04:00
Daniel Lemire f76ee5e5ef Fixes issue 221 (#222)
https://github.com/lemire/simdjson/issues/221
2019-07-29 10:07:07 -04:00
Daniel Lemire eba02dc1b9 Runtime dispatch
* Attempt 1 - fn targeting

GCC won't work with templates with different targets, need to specialize all the way up the call stack.

* Compiles properly with cmake. Does not with the Makefile.

* Compilation works with Makefile

* instruction_set changes to architecture

* some aesthetic changes

* fix amalgation and tests + aesthetic changes

* This now compiles and passes tests under CLANG

* Minor correction.

* Trying to make it work on ARM

* Adding missing namespace

* Missing bracket

* Fixing minor compilation issues.

* Getting parse to use runtime dispatch

* Fixing amalgamation script.

* Making sure that NEON is supported.

* Fixing typo

* Merging https://github.com/lemire/simdjson/pull/229

* Manual merge of
https://github.com/lemire/simdjson/pull/229
by @jkeiser  (second part)

* Trying another way.

* Removing the paral.

* Fixing the make file

* Let us make the practice run long enough.

* Resolved the awful slowness.

* Cleaning the README.md

* With runtime dispatching, we should not need flags anymore.

* Changing isa detection file's name + fixing typos.
2019-07-28 22:46:33 -04:00
ioioioio bcabdfc1ae Json pointer (#220)
* json pointer support

* Addition of tests for the json pointer

* Adding a new tool for the JSON Pointer support, and some documentation.
2019-07-26 18:38:10 -04:00
Daniel Lemire be956654b2 Minor cleaning = annotating simdjson namespaces and making sure that we don't have headers all over. 2019-07-09 19:24:08 -04:00
Daniel Lemire fba27ef4b9 I missed a few. Building up VS support. 2019-07-04 17:45:45 -04:00
Daniel Lemire 19cdc09928 Improving support for VS 2019-07-04 17:36:26 -04:00
ioioioio 861a6a17e4 SSE implementation integrated 2019-07-03 17:15:21 -04:00
ioioioio 036f9d5a45 Merge branch 'master' of https://github.com/lemire/simdjson into Multiple_implementation_refactoring_stage2 2019-07-03 10:34:58 -04:00
ioioioio 3f24879157 Stage2 refactored to simplify multiple implementations 2019-07-02 17:12:00 -04:00
ioioioio 9230588ce8 conflicts are solved 2019-07-02 15:21:00 -04:00
Daniel Lemire aa78b70d69 Introducing a "native" instruction set so that you do not need to do #ifdef to select the right SIMD set all the time.
Fixing indentation.
Removing some obsolete WARN_UNUSED.
Fixing a weird warning with optind variable.
2019-07-01 14:18:30 -04:00
ioioioio de08df6a7e Correction of identation. 2019-06-28 15:33:30 -04:00
ioioioio 6723221a42 Refactoring stage1 to facilitate multiple implementations. 2019-06-28 15:14:42 -04:00
Daniel Lemire d7f7f1b200
Fixing issue. (#193) 2019-06-20 18:49:47 -04:00
Daniel Lemire b1e8990654
Moving iterator functions in the header file (#189)
We want the compiler to inline hot functions in the iterators. Let us leave them in the header file. Please.
2019-06-11 21:09:58 -04:00
Daniel Lemire b32c72f1fc Adding a new compile-time flag (SIMDJSON_NAIVE_STRUCTURAL) for research purposes. 2019-06-03 16:41:50 -04:00
Daniel Lemire e27a46973c Introducing a fallback for clmul. 2019-06-03 15:10:14 -04:00
Daniel Lemire 06461a465b Tweaking. 2019-06-03 14:25:22 -04:00
Daniel Lemire cf6f231be6 Allowing users to provide additional flags. 2019-06-03 14:17:42 -04:00
Daniel Lemire 9239f75123
Adding compile-time option to test the speed of the fast flatten (research-oriented). (#181) 2019-06-03 13:37:09 -04:00
Daniel Lemire 642132920f Fixing performance regression caused by helpful code contributions
that moved inlineable functions into the source file combined with
helpful compilers which aren't smart enough to do the inlinining in
any case.
2019-05-31 18:16:12 -04:00
Daniel Lemire 8526387acb
Improving error codes. (#176)
* This commit adds new error codes.
2019-05-24 17:28:56 -04:00
Daniel Lemire 17ac5c0525
This adds guards so that we can better detect the case where we have neither AVX2 nor ARM NEON. (#173) 2019-05-24 17:26:29 -04:00
Daniel Lemire 43dba8ac7f
A slightly better "flatten"? (#166)
* This seems beneficial.
2019-05-19 12:33:45 -04:00
Daniel Lemire dcd0cb8080
Fix for https://github.com/lemire/simdjson/issues/58 (#168) 2019-05-19 12:25:27 -04:00
Daniel Lemire 47beaff152 Adding white-listing for memory sanitizer. 2019-05-19 11:18:54 -04:00
Daniel Lemire f75280ac9c
Fix for issue 150 (#162)
* Checks for issue 150. We run through the test files with sanitizers on.

* Fix for issue 150: the remaining issues were an overrun on the depth capacity and an "off-by-1" overrun on tape capacity.

* Improving makefile.

* Safer git submodule command.

* Getting get 'git' on circleci
2019-05-09 20:51:33 -04:00
Daniel Lemire e370a65383
Fix for issues 32, 50, 131, 137
* Improving portability.

* Revisiting faulty logic regarding same-page overruns.

* Disabling same-page overruns under VS.

* Clarifying the documentation

* Fix for issue 131 + being more explicit regarding memory realloc.

* Fix for issue 137.

* removing "using namespace std" throughout. Fix for 50

* Introducing typed malloc/free.

* Introducing a custom class (padded_string) that solves several minor usability issues.

* Updating amalgamation for testing.
2019-05-09 17:59:51 -04:00
Daniel Lemire 20cda07eef
Minor grammatical thing ("an integer" vs "a integer") 2019-05-09 10:48:31 -04:00
Heinz N. Gies c1975166a0 False atom fix (#156)
* Add failing test for falsy atom

* Fix false atom parsing
2019-05-09 10:45:42 -04:00
Daniel Lemire f0574d492c
Fix for issue 154 (#157)
* Changes necessary to reproduce

https://github.com/lemire/simdjson/issues/154

* Fixing issue 154.
2019-05-08 22:33:11 -04:00
technateNG 6f0d350f2c Fix to issue #148. (#151)
* Issue #148 fix.

* Test cases for issue #148.
2019-05-07 20:56:36 -04:00
saka1 719dff1312 Add predicates to ParsedJson::iterator (#153) 2019-05-07 14:11:33 -04:00
Daniel Lemire 681cd33698 Making the iterator a tad safer (tweaking the constructor so that it can throw). 2019-04-22 10:53:25 -04:00
Dong Xie 1153778f92 fix a bug in copy constructor of ParsedJson::iterator. (#146) 2019-04-22 10:37:02 -04:00
Geoff Langdale 0250352139 Merge branch 'master' of https://github.com/lemire/simdjson 2019-04-01 02:08:15 -04:00
Geoff Langdale 134ba8d1dd Ratty version of transposed ARM SIMD stuff. Needs cleanup. 2019-04-01 02:07:38 -04:00
Geoff Langdale 777b9c9a9e Unbreak x86. Durp. 2019-03-30 15:50:35 +11:00
Geoff Langdale 5ba29122fd First cut of ARM port. Needs hand-hacked Makefile. 2019-03-30 00:47:35 -04:00
Geoff Langdale b4c815a60c Concentrate and encapsulate SIMD use somewhat in preparation for ARM port. 2019-03-21 15:15:41 +11:00
Geoff Langdale 473ab12a0a Stage 2 doesn't need to know about intrinsics either (for itself) 2019-03-21 11:41:15 +11:00
Daniel Lemire df8f792183
Store the string lengths on the string tape (#101)
* Store string length in the string-tape item.
* Files are now limited to 4GB.
* Moving detection of unescaped chars to stage 1 to reduce the burden due to string parsing.

Fixes https://github.com/lemire/simdjson/issues/114

Fixes https://github.com/lemire/simdjson/issues/87
2019-03-13 19:32:57 -04:00