Commit Graph

582 Commits

Author SHA1 Message Date
John Keiser b4b968ff44 Fix #953 2020-06-23 09:53:24 -07:00
Daniel Lemire 2bb101bd19 Code reformatting. 2020-06-22 16:50:57 -04:00
Daniel Lemire a6cbf1f922 Going generic... 2020-06-22 16:25:11 -04:00
Daniel Lemire b836164a38 Fix. 2020-06-22 02:12:49 +00:00
Daniel Lemire 058507badf Putting back the loop 2020-06-21 21:21:49 -04:00
Daniel Lemire ad40e90790 Patching. 2020-06-21 20:14:00 -04:00
Daniel Lemire 066269153e Explaining decision. 2020-06-21 18:02:34 -04:00
Daniel Lemire 5dbcdf1484 Ok 2020-06-21 17:52:30 -04:00
Daniel Lemire f03a6ab5a4 Tweaking. 2020-06-21 17:39:24 -04:00
Daniel Lemire 5dc07ed295 It builds. 2020-06-21 17:20:33 -04:00
Daniel Lemire 064d4255d5 Ok. 2020-06-21 17:09:06 -04:00
Daniel Lemire 04139eb82e Ok. 2020-06-21 17:05:55 -04:00
John Keiser 76c9f4f5a6
Merge pull request #941 from simdjson/jkeiser/forgot
Remove unnecessary functions
2020-06-17 09:09:28 -07:00
Daniel Lemire 942ef3b7f2
Merge pull request #939 from simdjson/dlemire/lookup3
Introducing lookup3 (UTF-8 validation).
2020-06-17 11:19:09 -04:00
John Keiser f8f36c085c Remove unnecessary functions 2020-06-17 07:11:53 -07:00
John Keiser 7339f67dd7
Merge pull request #462 from simdjson/jkeiser/if-backslash
Wrap backslash processing in a branch
2020-06-17 07:07:58 -07:00
Daniel Lemire 71a889ed73 Introducing lookup3 (UTF-8 validation). 2020-06-16 19:08:25 -04:00
John Keiser 610c79fbf3 Don't use backslash branch on ARM 2020-06-13 07:51:28 -07:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire bd2d0f769f
One unlikely too many (#930) 2020-06-12 17:58:10 -04:00
John Keiser 664b03bb13 Short circuit find escapes if there is a backslash 2020-06-12 10:10:35 -07:00
John Keiser bbd61eb13f Let tape writing be put in a register 2020-06-12 09:18:20 -07:00
John Keiser e15e1e253d peek_char -> peek_next_char 2020-06-12 09:10:16 -07:00
Daniel Lemire a6e4933d93 Exposing the string minifier. 2020-06-11 13:07:18 -04:00
John Keiser ea08e7d192 Remove unused extra copy of find_next_document_index 2020-06-09 17:52:13 -07:00
John Keiser d178e089a6 Stop caching current structural, keep current index around instead of
next
2020-06-08 15:21:54 -07:00
John Keiser 5f00b37e21 Stop caching the buffer index 2020-06-08 15:21:54 -07:00
John Keiser 8a8792d47f Remove most uses of current_char() 2020-06-08 15:21:54 -07:00
John Keiser 59d9bc9e48 Store the pointer to the next structural instead of base
structural_indexes and an index
2020-06-08 15:21:54 -07:00
John Keiser 8793dd3ceb Don't store len locally 2020-06-08 15:21:54 -07:00
John Keiser 48062380fa Move parser to structural_iterator 2020-06-08 15:21:54 -07:00
John Keiser 3636aa5522 Extend structural_parser from structural_iterator 2020-06-08 15:21:54 -07:00
John Keiser a1aea4588f Move document stream state to implementation 2020-06-08 15:21:54 -07:00
John Keiser 1d4fffb799 Fix fallback implementation 2020-06-08 15:21:52 -07:00
John Keiser 6f90f5dc5f Remove templating from finish() method 2020-06-08 15:20:56 -07:00
John Keiser 9dd6972d26 Remove impossible checks, add EMPTY check to normal parser 2020-06-08 15:20:56 -07:00
John Keiser d731a7d52c Privatize structural_parser 2020-06-08 15:20:56 -07:00
John Keiser 059468b74e Eliminate streaming_structural_parser subclass with templates 2020-06-08 15:20:56 -07:00
John Keiser 5e69fb782a Call a function to parse structurals 2020-06-08 15:20:56 -07:00
John Keiser a5beffda78 Remove streaming_structural_parser.h 2020-06-08 15:20:56 -07:00
John Keiser 7de7ce5fdc Move document stream state to implementation 2020-06-08 15:20:56 -07:00
John Keiser 0dbda65e44 Fix fallback implementation 2020-06-08 14:52:23 -07:00
John Keiser d43a4e9df9 Remove SUCCESS_AND_HAS_MORE (internal only value) 2020-06-07 16:20:55 -07:00
John Keiser ef63a84a3e Move document stream state to implementation 2020-06-07 16:20:44 -07:00
John Keiser 8c16ba372e Acknowledge that we always have a remainder 2020-06-06 16:46:38 -07:00
John Keiser 9be4a17687 Separate definition from declaration, arrange top down 2020-06-06 16:46:38 -07:00
John Keiser ed0c815735 Move unclosed array check to stage 2 2020-06-05 12:39:13 -07:00
Daniel Lemire 7a69da16e4
Fixing issue 906 (#912)
* Fixing issue 906

* Safe patching.

* Now with explanations.

* Bumping up memory allocation.

* Putting the patch back.

* fallback fixes.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-05 15:37:09 -04:00
Daniel Lemire 52f44de257
This introduces a tiny simplification in number parsing. (#910)
* This introduces a tiny simplification in number parsing.

* Removing unnecessary function.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-04 17:13:02 -04:00
John Keiser b75fa26dc1 Move containing_scope and ret_address to .cpp 2020-06-01 12:15:55 -07:00
John Keiser 3d22a2d845 One weird trick: set a bogus error value in the parser impl
This makes us faster under both gcc and clang somehow.
2020-06-01 12:15:55 -07:00
John Keiser 1aab4752e2 Store all parser state in the implementation 2020-06-01 12:15:54 -07:00
John Keiser 86f8a4a9d2 Don't set parser.valid or parser.error
This regresses performance and is ONLY here because the next
two commits are here; this lets us see the impact of removing
parser.error separately from the impact of the next commit.
2020-06-01 12:14:09 -07:00
John Keiser db2cb061cb Remove on_error function
Solely here to make the next patch smaller and more isolatable
2020-06-01 12:14:09 -07:00
John Keiser 6a71b24495 Reuse stored buf and len from parser 2020-06-01 12:14:09 -07:00
John Keiser 84712a8bbc Store buf and len in parser implementation 2020-06-01 12:14:09 -07:00
John Keiser b86fb95306 Rename doc_parser -> parser 2020-06-01 12:14:09 -07:00
John Keiser a3a9bde83e Move DOM parsing into concrete interface implementation 2020-06-01 12:14:09 -07:00
Daniel Lemire 12150baa5e
Using just ASCII. (#899)
* Using just ASCII.

* Let us prune checkperf.

* Moving the description of lookup2 to the HACKING.md file.
2020-05-21 21:59:06 -04:00
John Keiser 4551e60f8b Don't write start object/array until the end 2020-05-21 14:28:47 -07:00
John Keiser 5651fbedc4 Add logging to stage 2 2020-05-21 09:47:19 -07:00
Daniel Lemire 40d57da83c
fixes issue 891 (#893) 2020-05-20 11:54:53 -04:00
John Keiser e6c9dfbd91 Make include files more fine-grained 2020-05-19 14:42:04 -07:00
John Keiser 64abc3e86c Include top-level .h files outside #if statements 2020-05-19 13:33:14 -07:00
John Keiser 7ad4020829 Make main compilation chunks into .cpp files 2020-05-19 13:32:35 -07:00
John Keiser 72ab0d11ff Move stage 1 and 2 files to their own directories 2020-05-19 13:30:34 -07:00
John Keiser 4ea866f050 Move stage2 classes into their own files 2020-05-19 13:30:34 -07:00
John Keiser a476531524 Share ref_address everywhere it's used 2020-05-19 13:30:34 -07:00
John Keiser dbb3316511 Move current_string_buf_loc to stage 2 2020-05-11 06:11:32 -07:00
John Keiser cd6f204c77 Move write_tape() to stage 2 code 2020-05-11 06:09:48 -07:00
John Keiser 269131ed21 Move on_number_* to stage 2 code 2020-05-11 06:04:54 -07:00
John Keiser 65d784e88e Move on_start/end_string to stage 2 code 2020-05-11 05:49:40 -07:00
John Keiser 35afb6cae0 Move on_error, on_success to stage 2 code 2020-05-11 05:46:18 -07:00
John Keiser 27bce09be8 Consolidate start_scope/end_scope 2020-05-11 05:40:02 -07:00
John Keiser 4f25b6ac0c Move on_end_* to stage 2 code 2020-05-11 05:34:49 -07:00
John Keiser 3d5ed1a7e3 Move on_start_* to stage 2 code 2020-05-11 05:30:35 -07:00
John Keiser a03115a4a6 Move end_scope to stage 2 code 2020-05-11 05:24:12 -07:00
John Keiser 7219d28a31 Call end_scope directly from stage 2 code 2020-05-11 05:20:04 -07:00
John Keiser 0875bce68f Don't pass depth to on_end_* 2020-05-11 05:15:39 -07:00
John Keiser 54fe302907 Don't pass depth to end_scope 2020-05-11 05:06:41 -07:00
John Keiser edaa8f811f Move on_start_* depth management to stage 2 code 2020-05-11 05:03:25 -07:00
John Keiser 2c8fd109de Move increment_count to stage 2 2020-05-11 04:58:50 -07:00
John Keiser 07fe7ad1a2 Use the same increment_count() everywhere 2020-05-11 04:48:15 -07:00
John Keiser 16d88cc095 Don't pass depth to increment_count 2020-05-11 04:15:02 -07:00
Daniel Lemire 3c3a4db54e
Compile under Visual Studio for ARM64 (#861)
* Modifications so that we can compile under Visual Studio for ARM64
* Let us throw appveyor at this beast.
2020-05-06 23:08:10 -04:00
John Keiser afb369950c Disable Intellisense-only warnings in simdjson.h/cpp 2020-05-04 11:47:04 -07:00
John Keiser 1d06624d38 Unset /D_CRT_SECURE_NO_WARNINGS
- Also localize DISABLE_DEPRECATED_WARNING so that we catch other
  deprecations
2020-05-04 11:35:05 -07:00
Pavel P d40069a018 Disable deprecation warnings for VS builds
fopen/getenv are standard c++ that are not deprecated.
2020-05-04 11:34:00 -07:00
Furkan Usta af968c5b44 Merge branch 'master' of github.com:simdjson/simdjson into cmake-flags 2020-05-03 02:12:23 +03:00
Daniel Lemire 1c34707925 A gift to John. 2020-05-02 15:01:22 -04:00
Furkan Usta 293c104cc4 CMake: Separate public and private compilation flags
simdjson-internal-flags for macros and warnings
simdjson-flags for pthread, sanitizer, and libcpp
2020-05-02 04:08:47 +03:00
Daniel Lemire 9863f62321
Trying to avoid unused warnings in isa detection. (#846) 2020-05-01 11:43:31 -04:00
Daniel Lemire 8c45a18524
Hiding cpuid_*_bit as well as related enum. (#843) 2020-04-30 21:08:08 -04:00
Daniel Lemire 073ad0dada
This function is unused. (#842) 2020-04-30 19:45:45 -04:00
Furkan Usta 73d7d704c1 CMake: Remove export_private_library
Since we are exporting all the targets as part of the main simdjson target we do not need private
exports anymore
2020-04-30 02:06:19 +03:00
Daniel Lemire 2a1f8fa8f1
Provides support for clang under Windows. (#817) 2020-04-27 22:09:27 -04:00
PavelP 0514588175
Improves clang-cl build with Visual Studio (#809) 2020-04-27 08:59:32 -04:00
Pavel P 24a185d26b Amalgamate src/simdjson.cpp as-is
amalgamation.sh shouldn't change contents of src/simdjson.cpp by forcing dmalloc.h that didn't exist in non-amalgamated version and shouldn't change order of includes by placing simdjson.h at the top

fixes #739
2020-04-26 08:27:19 +06:00
Daniel Lemire c750095241
Trying to improve a bit. (#791)
* Trying to improve a bit.

* Correcting typo
2020-04-23 22:36:08 -04:00
John Keiser 66acab4130 Enable /sdl warnings on Windows 2020-04-23 15:12:21 -07:00
ostri 87acab0846
elimination of most of g++ -Weffc++ warnings (#764)
Co-authored-by: Matjaž Ostroveršnik <ostri@localhost.localdomain>
Co-authored-by: Daniel Lemire <lemire@gmail.com>
2020-04-23 10:06:44 -04:00
John Keiser a198abc485 Use int as index to reduce cast operations
Decreases the number of instructions per block by almost 1
2020-04-22 14:21:33 -07:00
John Keiser d4a37f6ef5 Enable conversion warnings on Linux and Windows 2020-04-22 14:21:30 -07:00
John Keiser a116e68a47
Merge pull request #729 from simdjson/jkeiser/cmake-amalgamate
Add amalgamation support to cmake
2020-04-22 14:16:10 -07:00
Daniel Lemire 536fe28f8f
Being explicit regarding the initialization of two member variables. (#765) 2020-04-22 12:46:55 -04:00
Daniel Lemire 5f04208dbd
This removes the problematic use of the intrinsic _addcarry_u64 for Visual Studio (#758)
in the ARM 64-bit kernel. This intrinsic does not appear in the documentation
https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=vs-2019
and should probably not be used. Note that we expect the compiler to produce
efficient code out of our implementation.
2020-04-21 17:13:22 -04:00
PavelP ffaa292006
Master vs2019 x86 compile fixes (#743)
* Added bitexact implementations of _BitScanForward64 and _BitScanReverse64 for VS2019 32-bit builds

* Added bitexact implementations of _umul128 for VS2019 x86, arm, arm64 builds

* Implement mul_overflow for VS2019 arm64 builds

 + implement mul_overflow using __umulh (msvc/clang results: https://godbolt.org/z/smRwA7)

* Added Win32 for VS2019 to .appveyor.yml

* Update amalgamated headers (fix x86 builds with VS2019)
2020-04-21 14:42:53 -04:00
John Keiser d3e44b1108 Add amalgamation support to cmake 2020-04-20 19:50:51 -07:00
John Keiser fbf274a42b
Merge pull request #727 from simdjson/jkeiser/cmake-checkperf
Add checkperf to cmake
2020-04-20 19:45:45 -07:00
Daniel Lemire 3c1b403c4e Fixing typo. 2020-04-20 18:19:38 -04:00
John Keiser 9bf9fba2ec Add checkperf to cmake 2020-04-20 11:14:46 -07:00
John Keiser 22b9a53bef Add SIMDJSON_FORCE_IMPLEMENTATION 2020-04-18 18:21:56 -07:00
John Keiser 289cc3e7a0 Treat warnings as errors during compilation 2020-04-15 19:59:38 -07:00
John Keiser fd418f568c Fix c++11 warnings on clang
- namespace x::y is C++17
- static_assert requires message in C++11
2020-04-15 17:27:48 -07:00
John Keiser 09cf18a646 Add C++11 tests to cmake
- Add simdjson-flags target so callers don't have flags forced on them
2020-04-15 17:26:25 -07:00
Daniel Lemire b523c43927
Can we provide a size() function to arrays and objects? (eager approach) [TO BE MERGED] (#690)
* This is an implementation of "size()" for arrays and objects.
* Adding benchmark
* Adding a size() remark in the documentation.
* Extending size() to result types.
2020-04-15 10:15:48 -04:00
John Keiser 6835dd73bc Only apply compile flags to simdjson 2020-04-09 08:52:29 -07:00
John Keiser beaa6a9a7a Create simdjson-windows-headers interface library 2020-04-08 14:52:56 -07:00
John Keiser a9c8224f40 Add numberparsingcheck and stringparsingcheck tests 2020-04-08 14:52:56 -07:00
John Keiser 3dcc188d93 Add more tests to cmake 2020-04-08 14:52:56 -07:00
John Keiser 54b7291c34 Reference simdjson by name, don't specify include files individually 2020-04-08 14:52:55 -07:00
John Keiser 1e30b6e334 Compile under C++ 11 2020-04-08 14:00:13 -07:00
John Keiser 406240bae3 Support C++ 14 2020-04-08 14:00:13 -07:00
John Keiser 7656bd50ee Generate API docs at /api/docs 2020-03-29 17:01:12 -07:00
John Keiser 03746b966b Move document/element/etc. under dom 2020-03-28 13:42:21 -07:00
Daniel Lemire 5d1e3efce8
faster minifier (#568)
* Fallback should use our scalar code.
* parse should have a nicer error message.
* Making it so that "minify" can use different architectures.
* Let us change the minifier competition so that it tests all implementations.
* Documenting the untaken optimization opportunity.

Co-authored-by: John Keiser <john@johnkeiser.com>
2020-03-20 16:14:47 -04:00
John Keiser f1744f5495 Break out string/structural scanning from tokenizer 2020-03-18 10:40:06 -07:00
John Keiser 5a071c1907 Remove TARGET_FALLBACK 2020-03-17 14:59:47 -07:00
John Keiser 7cf3a7511b Add fallback implementation to CI
- Also add SIMDJSON_IMPLEMENTATION_HASWELL/WESTMERE/ARM64/FALLBACK=1/0 to
enable/disable various implemnentations
2020-03-17 14:59:47 -07:00
John Keiser af203aaf86 Add fallback parser for pre-SSE4.2 machines 2020-03-17 14:59:47 -07:00
John Keiser 1a5d8f1957 Add tests for SIMDJSON_EXCEPTIONS=0, add `tie()` support 2020-03-17 13:54:37 -07:00
Daniel Lemire 758dc511fb Better comment. 2020-03-17 14:22:40 -04:00
Daniel Lemire 0164723a8e Adding comments. 2020-03-16 20:17:48 -04:00
Daniel Lemire da3e064fc7 Added a comment. 2020-03-15 22:35:21 -04:00
Daniel Lemire 317fc6ba0e
accurate number parsing (#558) 2020-03-15 22:30:21 -04:00
John Keiser 1aaad223c0 Simplify atom parsing 2020-03-13 19:00:08 -07:00
John Keiser 81c86d7090 Structural iterator 2020-03-13 19:00:08 -07:00
John Keiser 40c6213d7e Add parser.load() and load_many() to load files 2020-03-11 17:19:41 -07:00
Daniel Lemire f669aafcf2 Correcting typo 2020-03-09 17:55:26 -04:00
John Keiser 31e8a12e88 Make error_message(error_code) return C string
- Also move all error message logic to include inline
2020-03-06 15:41:51 -08:00
John Keiser b2220d6157 Fix amalgamation 2020-03-05 11:13:25 -08:00
John Keiser 5ff941ae3d Include <simdjson> directly, move document_parser_callbacks to top level 2020-03-04 14:26:54 -08:00
John Keiser 5525c6f729 Stop using jsoncharutils.h in JsonStream 2020-03-04 14:26:54 -08:00
John Keiser eb147d9868 Mark jsonformatutils.h/isadetection.h internal
- Move jsonformatutils.h to internal/jsonformatutils.h (it is used by
document::print_json)
- Move isadetection.h to src/ (it is only used internally)
2020-03-04 14:26:54 -08:00
John Keiser f58a5d534e Move parser inline implementation to .cpp 2020-03-04 14:26:54 -08:00
John Keiser b3ea8c406e Add simdjson.cpp for unified use (#515) 2020-03-04 10:12:27 -08:00
John Keiser 99667f7c55 Create top level simdjson.h (#515)
- Allows everyone to #include the same way, singleheader or not.
2020-03-04 10:12:27 -08:00
John Keiser 0b21203141 Document navigation API 2020-03-02 14:49:03 -08:00
John Keiser 910f272467
Add parser implementation interface and selection API (#501)
* Make architecture implementations virtual functions

- Easier to add new architectures (add implementation to implementation.cpp)
- Easier to add new algorithms / functions to architecture selection
(add to implementation.h, implement)
- Automatically select best implementation in static initialization
- Allow user to explicitly select implementation with a string (i.e.
parameter)
- Allow user to inspect current implementation name/description
- Allow user to list available implementations
- Eliminate architecture enum and architecture-based templating
- Add noexcept in non-inline functions

* Move implementation static methods to their own classes

* Detect best supported implementation on first use

* available_implementationsI() -> available_implementations
2020-02-21 16:34:27 -05:00
John Keiser 1f76737510 Make valstat-ish parse APIs 2020-02-18 08:37:07 -08:00
John Keiser bc8bc7d1a8
Lowercase Architecture and ErrorValues (#487)
ErrorValues -> error_code, Architecture -> architecture
2020-02-14 15:21:28 -08:00
John Keiser 8e7d1a5f09
Separate document state from ParsedJson
This creates a "document" class with only user-facing document state (no parser internals).

- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)

Usage:

```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```

```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
2020-02-07 10:02:36 -08:00
Daniel Lemire c879b56f41 Fixing logical error. 2020-02-07 10:44:17 -05:00
John Keiser 76c706644a
Move stage 2 tape writing to ParsedJson (#477)
This is a first step to allowing alternate tape formats.
2020-02-04 14:28:42 -08:00
Daniel Lemire c924aaede9
Fix issue472: make JsonStream a template. (#473)
* Fix issue472: make JsonStream a template.

* Adding missing include.

* Tweaking headers and some minor formatting.

* Removing file from aggregation.

* Moving jsoncharutils

* Adding new header.

* Trying another header.

* Let us try to route around Visual Studio's nonesense.
2020-01-30 17:16:41 -05:00
Daniel Lemire 28710f8ad5
fix for Issue 467 (#469)
* Fix for issue467

* Updating single-header

* Let us make it so that JsonStream is constructed from a padded_string which will avoid dangerous overruns.

* Fixing parse_stream

* Updating documentation.
2020-01-29 19:00:18 -05:00
Daniel Lemire 3488c49d0a
Basically, haswell processor should be able to count on lzcnt. (#458) 2020-01-22 16:52:55 -05:00
John Keiser adaef43bc6 Find all escaped characters with simpler algorithm (#450) 2020-01-22 14:11:14 -05:00
Daniel Lemire 80b4dd2e8a
Removing all stdout, stderr from main library. (#455)
* Removing all stdout,stderr from main library.
2020-01-20 16:03:15 -05:00
Daniel Lemire ab6d4871d8
Adding haswell amal. tests (#447)
* Adding an extra test.

* Disabling the AVX-accelerated minifier.

* Updating amalgamation.
2020-01-15 19:49:11 -05:00
Daniel Lemire f611b65bc0
This updates the minifier. (#446) 2020-01-15 13:45:32 -05:00
Daniel Lemire a804351a76
I think that i and idx should be size_t (64-bit). (#438) 2020-01-13 17:42:52 -05:00
dbj 85e84fc1fa improved string padded (#440)
* dirent portable latest version

* improved

std::string argument passed by const reference
ctor added with std::string_view  argument
`allocate_padded_buffer()`  moved here with **optional** check on `length < 1`

* allocate_padded_buffer moved to padded_string.h
2020-01-10 10:15:48 -05:00
Daniel Lemire 951c4bedf8
Simpler jsonstream (#436)
* One simplification.

* Removing untested functions.
2020-01-07 19:10:02 -05:00
Daniel Lemire 4c0c1c9830 Updating a comment. 2020-01-06 22:01:23 -05:00
Daniel Lemire a9e990251d
removing left over debug 2020-01-04 12:50:04 -05:00
Daniel Lemire 7bde23590a
Debugging jsonstream (#432)
Fixes #424 (and provide tests for it), as well as #401
2020-01-03 22:22:47 -05:00
Daniel Lemire 5042dd52ce
This is implementing @jkeiser optimization idea. (#431) 2020-01-03 09:21:36 -05:00
Daniel Lemire a2d05b21ff Merge branch 'master' of github.com:lemire/simdjson 2020-01-02 15:27:00 -05:00
Daniel Lemire f4f5f670a2 Better documentation of the padding. 2020-01-02 15:25:03 -05:00
John Keiser 165e23773f Refactor stage 2 into structural_parser class 2020-01-02 13:12:22 -07:00
Paul Dreik 399d08c86c use unique_ptr in class parsedjson (#417)
* refactor parsedjson to use unique_ptr instead of owning raw pointer
* fix a potential undefined behavior
* output only first cpu in /proc/cpuinfo
2019-12-31 14:31:45 -05:00
Daniel Lemire 6f799435b6 Removing commented out stuff. 2019-12-30 22:21:04 -05:00
John Keiser d7c83397e4 lookup+cont-check algorithm 2019-12-18 14:37:21 -08:00
Daniel Lemire 1d621bba37 Being more explicit about EMPTY errors. 2019-12-18 14:39:48 +00:00
John Keiser e2f349e7bd Measure impact of utf-8 blocks and structurals per block directly 2019-12-17 11:41:13 -08:00
Daniel Lemire 102262c7ab
Fixing issue386 (#396)
* Creating arch-specific bitmanipulation.h files.
* Improving system and compiler portability.
* We want to allow trailing_zeroes on zero inputs.
2019-12-16 19:09:18 -05:00
Daniel Lemire f02babe427 Adding analysis by @sebpop from https://github.com/lemire/simdjson/pull/391#issuecomment-565551462 2019-12-13 13:39:15 -05:00
Daniel Lemire fc6133b58f
Fixes issue 388 (#394) 2019-12-11 08:13:29 -05:00
mswilson d33208c7db Correct detection of NEON support (#392)
... as the test as it is currently implemented will always evaluate to true.

Fixes #389
2019-12-10 13:12:17 -05:00
Daniel Lemire c9cd8e6211
PMULL is slow on ARM64, let us not rely on it? (#391) 2019-12-09 17:15:34 -05:00
Daniel Lemire 1211c01ca1
Resolves issue 186 (#383)
* Resolves issue 186
https://github.com/lemire/simdjson/issues/186
2019-12-02 12:23:45 -05:00
Jeremie Piotte 4e1c90f76f
Fix memory allocation of the max_depth in JsonStream. 2019-11-28 13:55:31 -05:00
Jeremie Piotte f163155929 JsonStream documentation (#381)
* adding Multiline JSON competition chart to doc
* Completing the comments for JsonStream
* Adding a page for JsonStream's documentation.
2019-11-25 18:11:55 -05:00
John Keiser 9b6377fd80 Precalculate the ASCII path 2019-11-25 11:49:44 -08:00
John Keiser 7356b4532f Perform UTF-8 detection via flag lookup algorithm
- adds the alternative zwegner, range and lookup utf8 algorithms as well, for
ability to do "shootouts"
2019-11-25 11:49:44 -08:00
John Keiser 7d7bec856d Remove lookup_lower_4_bits
It's only a coincidence that it works in current uses: it doesn't do
what the name says. Particularly, if the high bit is 1 it will yield
0 even if the lower 4 bits would yield something else.
2019-11-25 11:49:44 -08:00
Paul Dreik 6d14afd80e
Make threads optional in the cmake build (#376)
Only the simdjson library should optionally depend on threads,
the executables that link to simdjson will get the dependency
indirectly.

* add option for controlling threads (default is on)
* add CI testing with threading on/off for msvc, gcc and clang
* fix an unrelated copy paste comment error in the cirlce ci build conf
2019-11-22 21:51:46 +01:00
Jeremie Piotte 29fc51522a
Introducing concurrency mode in JsonStream. (#373)
* JsonStream threaded prototype

* JsonStream Threaded version working. Still supporting non-threaded version.

* Fix where invalid files would enter infinite loop.

* SingleHeader update

* I will remove -pthread in cmake for now.

* Attempt at resolving the -pthread issue
2019-11-21 11:22:06 -05:00
John Keiser ce824f8653 Decrease stage 1 step size to 64 bytes on Westmere/ARM
- Templatize scan_step() with STAGE1_STEP_SIZE
- Fix simd8::store()
- add NUM_CHUNKS to simd8
2019-11-18 21:58:07 -08:00
John Keiser 708f4a094d Move inline functions out of class definition for templating 2019-11-18 21:58:07 -08:00
Daniel Lemire 58d249ca16
Introducing move assignments. (#363) 2019-11-09 10:34:32 -05:00
Jeremie Piotte bdc2b07339
Streams of JSON documents + Large files (>4GB) (#350) (#364)
* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* Fix for https://github.com/lemire/simdjson/issues/345

* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)

* Final (?) fix for https://github.com/lemire/simdjson/issues/345

* Verbose basictest

* Being more forgiving of powers of ten.

* Let us zero the tail end.

* add basic fuzzers (#348)

* add basic fuzzing using libFuzzer

* let cmake respect cflags, otherwise the fuzzer flags go unnoticed

also, integrates badly with oss-fuzz

* add new fuzzer for minification, simplify the old one

* add fuzzer for the dump example

* clang format

* adding Paul Dreik

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* Fixing issue 351 (#352)

* Fixing issues 351 and 353

* minor fixes and cleaning.

* removing warnings

* removing some copies

* Fix ARM compile errors on g++ 7.4 (#354)

* Fix ARM compilation errors

* Update singleheader

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* fix integer overflow in subnormal_power10 (#355)

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* Adding new test file, following https://github.com/lemire/simdjson/pull/355

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* merging main

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* rough prototype working.  Needs more test and fine tuning.

* minor fixes and cleaning.

* adding jsonstream to amalgamation

* merged main into branch

* Addind a JsonStream Demo to Amalgamation

* merging main

* merging main

* make file fix
2019-11-08 17:39:45 -05:00
Daniel Lemire c4f1baad31
Making get_corpus safer (#360) 2019-11-06 12:22:42 -05:00
John Keiser 3828e1e538 Fix performance issues:
1. Don't recast "int" result of movemask to uint32_t
2. Call max_epu8 with the mask first and the bytes second.
2019-11-05 13:44:04 -08:00
John Keiser d89046d515 Use simd8 helpers for find_bs_bits_and_quote_bits 2019-11-05 13:44:04 -08:00
John Keiser 4bc128f07e Move compute_quote_mask to generic bitmask library 2019-11-05 13:44:04 -08:00
John Keiser e383b7a6ab Use generic simd operators for find_whitespace_and_operators 2019-11-05 13:37:56 -08:00
John Keiser c89d6bf68b Genericize utf-8 check 2019-11-05 13:37:32 -08:00
Paul Dreik cf493254b7 fix integer overflow in subnormal_power10 (#355)
detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714
2019-11-04 16:54:03 -05:00
John Keiser c97eb41dc6 Fix ARM compile errors on g++ 7.4 (#354)
* Fix ARM compilation errors

* Update singleheader
2019-11-04 10:36:34 -05:00
Daniel Lemire b1224a77db
Fixing issue 351 (#352)
* Fixing issues 351 and 353
2019-11-01 16:05:28 -04:00
Daniel Lemire 15740500af Let us zero the tail end. 2019-10-24 18:49:30 -04:00
Daniel Lemire 59cad23aeb Merge branch 'master' of github.com:lemire/simdjson 2019-10-24 16:34:10 -04:00
Daniel Lemire da1c35d04b Final (?) fix for https://github.com/lemire/simdjson/issues/345 2019-10-24 16:33:37 -04:00
Daniel Lemire c469aed047
Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) 2019-10-24 16:06:29 -04:00
John Keiser 64872bddf4 Eliminate stage1_find_marks_flatten.h 2019-10-14 12:33:46 -07:00
John Keiser 81f2249575 Move stage1 into a class to pass fewer parameters 2019-10-14 12:33:46 -07:00
John Keiser 9bbd6bd874 Move headers to implementation area
- jsoncharutils.h, numberparsing.h, simdprune_tables.h
2019-10-14 11:51:41 -07:00
John Keiser 69caa477fb Use struct for UTF-8 checks, remove templating
- Removes templating from simd_input, utf8_checker, and parse_string
- Make drone gcc run a lot faster
- Make drone clang run a little faster (NOTE:
https://hub.docker.com/r/silkeh/clang helps even more, but I wasn't sure
whether we wanted to trust that)
- Make drone arm run in parallel to get results quicker
2019-10-08 17:58:45 -07:00
Juho Lauri cf9dbe583d improved const correctness (#321) 2019-10-02 14:25:28 -04:00
John Keiser de8df0a05f Combined performance patch (5% overall, 15% stage 1) (#317)
* Allow -f

* Support parse -s (force sse)

* Simplify flatten_bits

- Add directly to base instead of storing variable
- Don't modify base_ptr after beginning of function
- Eliminate base variable and increment base_ptr instead

* De-unroll the flatten_bits loops

* Decrease dependencies in stage 1

- Do all finalize_structurals work before computing the quote mask; mask
  out the quote mask later
- Join find_whitespace_and_structurals and finalize_structurals into
  single find_structurals call, to reduce variable leakage
- Rework pseudo_pred algorithm to refer to "primitive" for clarity and some
  dependency reduction
- Rename quote_mask to in_string to describe what we're trying to
  achieve ("mask" could mean many things)
- Break up find_quote_mask_and_bits into find_quote_mask and
  invalid_string_bytes to reduce data leakage (i.e. don't expose quote bits
  or odd_ends at all to find_structural_bits)
- Genericize overflow methods "follows" and "follows_odd_sequence" for
  descriptiveness and possible lifting into a generic simd parsing library

* Mark branches as likely/unlikely

* Reorder and unroll+interleave stage 1 loop

* Nest the cnt > 16 branch inside cnt > 8
2019-10-01 12:01:08 -04:00
Daniel Lemire 53b6deaeae
Safer handling of error codes, fixes https://github.com/lemire/simdjson/issues/318 (#319) 2019-09-29 12:12:15 -04:00
Opemipo 462858efa3 Fix Typo (#311)
escapted -> escaped
2019-09-12 10:16:33 -04:00
John Keiser f7e893667d Use simd_input generic methods for utf8 checking (#301)
* Use generic each/reduce in simdutf8check

* Remove macros from generic simd_input uses

* Use array instead of members to store simd registers

* Default local checkperf to clone from .
2019-09-02 12:46:05 -04:00
saka1 c1f27fb848 Accept large unsigned integers (#295)
* handle uint64 value in JSON
* Add integer_tests
* Add get_unsigned_integer() on  ParsedJson::BasicIterator
* Write 'u' to tape when the value seems unsigned
* Add to handle 'u' element
* Brush up integer_tests.cpp
* Append tests/integer_tests in .gitignore
* Add comments to is_integer and is_unsigned_integer
2019-09-02 10:50:24 -04:00
John Keiser 7f249cd179 Use non-interleaved map() to make structurals clearer (#304) 2019-08-29 21:38:41 -04:00
John Keiser f4fa5b7340 Add MAP_CHUNKS2, make parameter name related to input 2019-08-26 09:46:49 -07:00
John Keiser 169568ca47 Use map() to interleave instructions for parallelism 2019-08-26 09:46:49 -07:00
John Keiser 9cc4ddfc88 Use map().to_bitmask() instead of build_bitmask() 2019-08-26 09:46:49 -07:00
John Keiser 441963c84c Add AMD64 build_bitmask 2019-08-26 09:46:49 -07:00
John Keiser da0f1cacea Remove static modifiers 2019-08-26 09:46:48 -07:00
John Keiser b01222518d Genericize bitmask building to make algorithms clearer 2019-08-26 09:46:48 -07:00
John Keiser 585f84a734 Move architecture-specific headers to src/ (#287)
* Use namespaces instead of templates for stage1 impls

* Move stage1 implementation into the src/ directory

* Move architecture-specific code to src/
2019-08-21 07:59:49 -04:00
Vitaly Baranov e9be643db5 Fix condition in ParsedJson::allocate_capacity(). (#283) 2019-08-16 08:38:59 -04:00
Vitaly Baranov 6a2728e730 No allocation in the iterator's constructor (#276)
* Get rid of dynamic allocation in ParsedJson::Iterator.

* Implement copy assignment operator for ParsedJson::Iterator.

* ParsedJson::Iterator is now a template class.
2019-08-15 19:42:15 -04:00
Daniel Lemire 3fb82502f7
This gets rid of the silly ALLOW_SAME_PAGE_BUFFER_OVERRUN (#268) 2019-08-09 17:36:32 -04:00
Vitaly Baranov 0b927f059c Make dynamic dispatch free of TSan warnings (#256) 2019-08-08 16:16:35 -04:00
John Keiser f3c3afd4cd Use direct call to templated flatten_bits instead of if (#262)
* Use direct call to templated flatten_bits instead of if

* Put really_inline back on find_structural_bits_64
2019-08-08 15:09:17 -04:00
John Keiser b1beacd1f3 Make headers show up in Header Files in VS2019 (#257) 2019-08-05 16:36:52 -04:00
John Keiser d9a0e2b8f4 Fix Intellisense errors opening .h files on VS2019 (#253) 2019-08-04 19:57:55 -04:00
ioioioio 2a24567370
Replace macros by include files (#236) (#248)
* stage1 compiles without macros

* cleaning

* amalgation is weird but works

* macros are removed from stringparsing

* amalgation fixed

* Huge macros are removed.

* clang-format
2019-08-04 15:58:35 -04:00
Daniel Lemire 99a153d9e8
Hiding the pointer away... (#252)
* Hiding the runtime dispatch pointer in a source file so it is not an exported symbol
* Disabling hard failure on style check.
* Fixes https://github.com/lemire/simdjson/issues/250
2019-08-04 15:41:00 -04:00
Daniel Lemire 038b18edf1
Adding style scripts. (#243)
* Adding style scripts.
2019-08-01 16:09:26 -04:00
Daniel Lemire d83aef4e86 This should fix a warning in Visual Studio. 2019-07-31 18:12:58 -04:00
John Keiser bf59ba76f5 Fix most warnings on VS2019 (#241) 2019-07-31 17:43:45 -04:00
ioioioio c2eea8abba Style uniformization (#238)
* massive clang-format -style=LLVM

* naming harmonization

* adding commentary about sysinfoapi.h
2019-07-30 17:18:10 -04:00
Daniel Lemire 771e9cd68a
Trying again... (#235) 2019-07-29 13:55:13 -04:00
Daniel Lemire c328afee57 This should fix master. 2019-07-29 13:44:25 -04:00
Daniel Lemire a53d95099c
Intrinsic-based flatten (#234)
* Providing a flatten function with intrinsics (for Visual Studio).
2019-07-29 13:28:02 -04:00
Daniel Lemire f76ee5e5ef Fixes issue 221 (#222)
https://github.com/lemire/simdjson/issues/221
2019-07-29 10:07:07 -04:00
Daniel Lemire eba02dc1b9 Runtime dispatch
* Attempt 1 - fn targeting

GCC won't work with templates with different targets, need to specialize all the way up the call stack.

* Compiles properly with cmake. Does not with the Makefile.

* Compilation works with Makefile

* instruction_set changes to architecture

* some aesthetic changes

* fix amalgation and tests + aesthetic changes

* This now compiles and passes tests under CLANG

* Minor correction.

* Trying to make it work on ARM

* Adding missing namespace

* Missing bracket

* Fixing minor compilation issues.

* Getting parse to use runtime dispatch

* Fixing amalgamation script.

* Making sure that NEON is supported.

* Fixing typo

* Merging https://github.com/lemire/simdjson/pull/229

* Manual merge of
https://github.com/lemire/simdjson/pull/229
by @jkeiser  (second part)

* Trying another way.

* Removing the paral.

* Fixing the make file

* Let us make the practice run long enough.

* Resolved the awful slowness.

* Cleaning the README.md

* With runtime dispatching, we should not need flags anymore.

* Changing isa detection file's name + fixing typos.
2019-07-28 22:46:33 -04:00
ioioioio bcabdfc1ae Json pointer (#220)
* json pointer support

* Addition of tests for the json pointer

* Adding a new tool for the JSON Pointer support, and some documentation.
2019-07-26 18:38:10 -04:00
Daniel Lemire be956654b2 Minor cleaning = annotating simdjson namespaces and making sure that we don't have headers all over. 2019-07-09 19:24:08 -04:00
Daniel Lemire fba27ef4b9 I missed a few. Building up VS support. 2019-07-04 17:45:45 -04:00
Daniel Lemire 19cdc09928 Improving support for VS 2019-07-04 17:36:26 -04:00
ioioioio 861a6a17e4 SSE implementation integrated 2019-07-03 17:15:21 -04:00
ioioioio 036f9d5a45 Merge branch 'master' of https://github.com/lemire/simdjson into Multiple_implementation_refactoring_stage2 2019-07-03 10:34:58 -04:00
ioioioio 3f24879157 Stage2 refactored to simplify multiple implementations 2019-07-02 17:12:00 -04:00