Commit Graph

595 Commits

Author SHA1 Message Date
Daniel Lemire 2022dd7d74
Merge pull request #945 from simdjson/issue678
Fixing issue 678
2020-06-18 18:23:56 -04:00
Daniel Lemire ef688a74fe Minor tweak to the documentation. 2020-06-18 18:18:12 -04:00
Daniel Lemire 04a19f9813 Fixes https://github.com/simdjson/simdjson/issues/937 2020-06-17 18:06:13 -04:00
Daniel Lemire 2cbc591c9d Fixing issue 678 2020-06-17 16:17:17 -04:00
Daniel Lemire 3586fc4910 Fix for issue 680 2020-06-17 18:49:22 +00:00
Daniel Lemire 0b9df6d8c4 It turns out that we need fairly complicated logic. 2020-06-17 15:17:10 +00:00
Daniel Lemire 803b0c4bdb Light touch. 2020-06-17 11:00:13 -04:00
Daniel Lemire 0d4e501239 Fixing the bug. 2020-06-17 10:06:16 -04:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire 4dfbf98e4e
Using a worker instead of a thread per batch (#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.

To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.

  This fixes our parse_stream benchmark which is just busted.
  This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.

Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.

Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 16:51:18 -04:00
John Keiser bbd61eb13f Let tape writing be put in a register 2020-06-12 09:18:20 -07:00
Daniel Lemire a6e4933d93 Exposing the string minifier. 2020-06-11 13:07:18 -04:00
John Keiser fe01da077e Make threaded version work again 2020-06-07 16:21:00 -07:00
John Keiser d43a4e9df9 Remove SUCCESS_AND_HAS_MORE (internal only value) 2020-06-07 16:20:55 -07:00
John Keiser ef63a84a3e Move document stream state to implementation 2020-06-07 16:20:44 -07:00
Daniel Lemire 7a69da16e4
Fixing issue 906 (#912)
* Fixing issue 906

* Safe patching.

* Now with explanations.

* Bumping up memory allocation.

* Putting the patch back.

* fallback fixes.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-05 15:37:09 -04:00
John Keiser b75fa26dc1 Move containing_scope and ret_address to .cpp 2020-06-01 12:15:55 -07:00
John Keiser 3d22a2d845 One weird trick: set a bogus error value in the parser impl
This makes us faster under both gcc and clang somehow.
2020-06-01 12:15:55 -07:00
John Keiser 1aab4752e2 Store all parser state in the implementation 2020-06-01 12:15:54 -07:00
John Keiser 6a71b24495 Reuse stored buf and len from parser 2020-06-01 12:14:09 -07:00
John Keiser a3a9bde83e Move DOM parsing into concrete interface implementation 2020-06-01 12:14:09 -07:00
Daniel Lemire 40d57da83c
fixes issue 891 (#893) 2020-05-20 11:54:53 -04:00
John Keiser e6c9dfbd91 Make include files more fine-grained 2020-05-19 14:42:04 -07:00
John Keiser 7ad4020829 Make main compilation chunks into .cpp files 2020-05-19 13:32:35 -07:00
John Keiser a476531524 Share ref_address everywhere it's used 2020-05-19 13:30:34 -07:00
Daniel Lemire e03c5e9f23
We should guard the include (#881) 2020-05-13 20:02:46 -04:00
John Keiser dbb3316511 Move current_string_buf_loc to stage 2 2020-05-11 06:11:32 -07:00
John Keiser cd6f204c77 Move write_tape() to stage 2 code 2020-05-11 06:09:48 -07:00
John Keiser 269131ed21 Move on_number_* to stage 2 code 2020-05-11 06:04:54 -07:00
John Keiser 65d784e88e Move on_start/end_string to stage 2 code 2020-05-11 05:49:40 -07:00
John Keiser 35afb6cae0 Move on_error, on_success to stage 2 code 2020-05-11 05:46:18 -07:00
John Keiser 4f25b6ac0c Move on_end_* to stage 2 code 2020-05-11 05:34:49 -07:00
John Keiser 3d5ed1a7e3 Move on_start_* to stage 2 code 2020-05-11 05:30:35 -07:00
John Keiser a03115a4a6 Move end_scope to stage 2 code 2020-05-11 05:24:12 -07:00
John Keiser 7219d28a31 Call end_scope directly from stage 2 code 2020-05-11 05:20:04 -07:00
John Keiser 0875bce68f Don't pass depth to on_end_* 2020-05-11 05:15:39 -07:00
John Keiser 54fe302907 Don't pass depth to end_scope 2020-05-11 05:06:41 -07:00
John Keiser edaa8f811f Move on_start_* depth management to stage 2 code 2020-05-11 05:03:25 -07:00
John Keiser 2c8fd109de Move increment_count to stage 2 2020-05-11 04:58:50 -07:00
John Keiser 16d88cc095 Don't pass depth to increment_count 2020-05-11 04:15:02 -07:00
Daniel Lemire 2a6e6b3dbd
Cleaning string_view (#872)
* Cleaning string_view

* Corrected typo

* Alignment.
2020-05-10 16:05:52 -04:00
John Keiser afb369950c Disable Intellisense-only warnings in simdjson.h/cpp 2020-05-04 11:47:04 -07:00
John Keiser 1d06624d38 Unset /D_CRT_SECURE_NO_WARNINGS
- Also localize DISABLE_DEPRECATED_WARNING so that we catch other
  deprecations
2020-05-04 11:35:05 -07:00
Pavel P d40069a018 Disable deprecation warnings for VS builds
fopen/getenv are standard c++ that are not deprecated.
2020-05-04 11:34:00 -07:00
Furkan Usta e04cbd71d0 Only install singleheader/simdjson.h as part of the public API 2020-05-02 01:44:11 +03:00
Daniel Lemire fc1ddcd2f8
Faster case-insensitive comparisons. (#837)
* Faster case-insensitive comparisons.
2020-04-30 15:52:28 -04:00
Furkan Usta 73d7d704c1 CMake: Remove export_private_library
Since we are exporting all the targets as part of the main simdjson target we do not need private
exports anymore
2020-04-30 02:06:19 +03:00
Furkan Usta eee07e6cfd Use the same export name for all targets 2020-04-29 23:47:27 +03:00
Nong Li 0f9dbf84b7
Fix incorrect check for case insensitive key lookup (#824) 2020-04-29 13:55:28 -04:00
Daniel Lemire 2a1f8fa8f1
Provides support for clang under Windows. (#817) 2020-04-27 22:09:27 -04:00
John Keiser 49da7e74cd
usage.md -> basics.md (#823) 2020-04-27 16:03:19 -04:00
PavelP 0514588175
Improves clang-cl build with Visual Studio (#809) 2020-04-27 08:59:32 -04:00
Daniel Lemire b99a7344c9
missing spaces. 2020-04-25 22:26:18 -04:00
Daniel Lemire f3ac0be0e6 Merge branch 'master' of github.com:simdjson/simdjson 2020-04-23 18:39:56 -04:00
Daniel Lemire 18c9468af5 Fixed typo 2020-04-23 18:39:32 -04:00
ostri d4239aaa8f
default initialisaiton (#779)
* padded_string.* default initialisation
parsedjson_iterator - copy constructor; depth_index not necessary
2020-04-23 18:32:11 -04:00
Daniel Lemire 4d0c7d706d
Warn 32-bit users about their doom. (#783) 2020-04-23 16:01:19 -04:00
Daniel Lemire 382392e03b
This should enable -Weffc++ (#777)
* Enabling -Weffc++
2020-04-23 13:03:04 -04:00
Daniel Lemire 0d1c574cb1
A few more changes... (#775)
* More nitpicking.
2020-04-23 11:36:52 -04:00
ostri 87acab0846
elimination of most of g++ -Weffc++ warnings (#764)
Co-authored-by: Matjaž Ostroveršnik <ostri@localhost.localdomain>
Co-authored-by: Daniel Lemire <lemire@gmail.com>
2020-04-23 10:06:44 -04:00
Daniel Lemire e030f02776 Merge branch 'master' into jkeiser/wconversion 2020-04-22 22:03:34 -04:00
Daniel Lemire 185274e70f
Let us see if we can test with libc++. (#732)
* Let us see if we can test with libc++.
* Fixed spacing.
2020-04-22 21:24:42 -04:00
John Keiser d4a37f6ef5 Enable conversion warnings on Linux and Windows 2020-04-22 14:21:30 -07:00
John Keiser 3e9e14f4d6 Reenable deprecation warnings on Windows 2020-04-22 08:53:19 -07:00
John Keiser 289cc3e7a0 Treat warnings as errors during compilation 2020-04-15 19:59:38 -07:00
John Keiser 7480b87e07
Merge pull request #693 from simdjson/jkeiser/cmake-quickstartcpp
Add C++11 tests to cmake
2020-04-15 19:53:14 -07:00
Daniel Lemire befa6423be
This massively improves the performance of tight loops relying on a type() call. (#721)
* This massively improves the performance of tight loops relying on a type() call.

* Adding a few more benchmarks
2020-04-15 20:45:40 -04:00
John Keiser fd418f568c Fix c++11 warnings on clang
- namespace x::y is C++17
- static_assert requires message in C++11
2020-04-15 17:27:48 -07:00
John Keiser 09cf18a646 Add C++11 tests to cmake
- Add simdjson-flags target so callers don't have flags forced on them
2020-04-15 17:26:25 -07:00
Daniel Lemire 326c175dcb
Massive performance boost for get<double>. (#719)
* Massive performance boost for get<double>.
2020-04-15 20:09:45 -04:00
Daniel Lemire 6d7c77ddc1
Let us try to check with the exceptions disabled. (#707)
* Tweaking code so that we can run all tests with exceptions off.
* Removing SIMDJSON_DISABLE_EXCEPTIONS
2020-04-15 16:45:36 -04:00
Daniel Lemire b523c43927
Can we provide a size() function to arrays and objects? (eager approach) [TO BE MERGED] (#690)
* This is an implementation of "size()" for arrays and objects.
* Adding benchmark
* Adding a size() remark in the documentation.
* Extending size() to result types.
2020-04-15 10:15:48 -04:00
Paul Dreik 75545ff70d
ref qualify parser methods to avoid use of dangling objects (#703)
To avoid using data belonging to a temporary, the parse functions are ref qualified to get a compile error if used on an rvalue. See https://github.com/simdjson/simdjson/issues/696

Compilation tests are also added, to make sure bad usage fails to compile.

Reviewed by jkeiser.
2020-04-15 09:57:52 +02:00
Daniel Lemire 8539896f3d
It is inconvenient to be unable to print a padded_string. (#713)
* It is inconvenient to be unable to print a padded_string.

* Allows us to print the padded_string even when it is embedded in result object when exceptions are enabled.
2020-04-14 19:07:32 -04:00
Daniel Lemire 334a486737
Tweaking the doxygen. (#700)
* Tweaking the doxygen.

* Fixing typo.
2020-04-14 11:31:46 -04:00
Daniel Lemire 4af7d6f108
Disabling threads on apple's hardware when optimizer is turned off (#692)
* Disabling threads on apple's hardware.

* Turns out that you can have your bread, your butter and you cake too!
2020-04-10 18:41:05 -04:00
John Keiser 6835dd73bc Only apply compile flags to simdjson 2020-04-09 08:52:29 -07:00
John Keiser 54b7291c34 Reference simdjson by name, don't specify include files individually 2020-04-08 14:52:55 -07:00
John Keiser 1e30b6e334 Compile under C++ 11 2020-04-08 14:00:13 -07:00
John Keiser 406240bae3 Support C++ 14 2020-04-08 14:00:13 -07:00
Dirk Eddelbuettel 12ed6336b1
remove three trailing semicolons that -pedantic dislikes (#673) 2020-04-02 21:06:25 -04:00
Daniel Lemire 3cb79e6977
Trying again. (#671) 2020-04-02 19:24:43 -04:00
John Keiser 13aee51011 Add element.type() for type switching 2020-04-02 14:07:19 -07:00
Daniel Lemire 3116e29d16
Release candidate (#655)
* Release candidate
2020-03-31 17:47:25 -04:00
John Keiser d93af1161d Remove set_capacity, replace with allocate
Makes allocation point more predictable
2020-03-30 13:49:54 -07:00
John Keiser 434776db1a Deprecate more things 2020-03-30 13:48:43 -07:00
John Keiser 6167e9cefc Update doxygen to not show deprecated/private things 2020-03-30 13:47:27 -07:00
John Keiser dc918d764e
Merge pull request #646 from simdjson/jkeiser/quickstart-example
Compile all .md examples in CI
2020-03-30 13:44:43 -07:00
John Keiser 7656bd50ee Generate API docs at /api/docs 2020-03-29 17:01:12 -07:00
John Keiser 7ed65e42d7 Add actual examples from basics.md to readme_examples 2020-03-29 16:28:29 -07:00
John Keiser ea8a5020e2 Remove array indexer, make object indexer key lookup 2020-03-28 15:56:43 -07:00
John Keiser 622d9c9480 Replace as_X and is_X with get<T> and is<T> 2020-03-28 15:29:53 -07:00
John Keiser 62da98aef6 Rename dom::stream to dom::document_stream 2020-03-28 13:42:24 -07:00
John Keiser 03746b966b Move document/element/etc. under dom 2020-03-28 13:42:21 -07:00
John Keiser 836e1fc330 Use simdjson_result for all _result classes 2020-03-28 12:03:05 -07:00
John Keiser e836c28008 Deprecate parser error code methods
- Also make competitions compile without warnings
2020-03-28 10:13:20 -07:00
John Keiser 748df8d109 Use string_view instead of string/char* where possible 2020-03-27 13:11:41 -07:00
John Keiser 5ad405006c Return document::element from parse, load, parse_many, load_many 2020-03-27 12:24:41 -07:00
John Keiser e3efbcddc1 Cast padded_string to string_view instead of string 2020-03-27 09:13:11 -07:00
John Keiser c14b2fb36c Remove const char* variants for at_key()
- Remove const char * variants for at_key(), string_view covers them
- Add at_key_case_insensitive variants on *_result
- Add at(), at_key(), at_key_case_insensitive() tests
2020-03-27 09:09:08 -07:00
John Keiser f0f111b387 Make ParsedJson::Iterator backcompat test 2020-03-27 09:07:39 -07:00
Daniel Lemire abb0bf9247 Fixed basictests 2020-03-26 19:40:29 -04:00
John Keiser 56841bcede Fix conversion error on Windows 2020-03-26 12:48:07 -07:00
John Keiser 006cc2ed60 Remove simdjson_move_result 2020-03-26 12:48:03 -07:00
John Keiser 2e420169c3 Remove document::parse and document::load 2020-03-26 10:13:09 -07:00
John Keiser 5aec2671ea Remove JsonStream. Use parse_many() instead. 2020-03-26 09:25:07 -07:00
John Keiser 06587824be Deprecate ParsedJson::Iterator 2020-03-25 18:26:51 -07:00
John Keiser a0bce440a6 Remove document_iterator, document::iterator, ParsedJsonIterator
Keep ParsedJson::Iterator only, without template, in same form as
it was in 0.2
2020-03-25 18:26:51 -07:00
Daniel Lemire 8769e42a56
Fixes issue 600 (#614)
* Fixes issue 600
2020-03-25 18:01:23 -04:00
John Keiser 7cde65aa6e This deprecates json_parse() and build_parsed_json(). 2020-03-25 14:19:24 -07:00
John Keiser e1b1500e3b Make _padded available without using namespace simdjson 2020-03-25 09:37:18 -07:00
John Keiser b28cafc1d1 Remove backslash unescaping from JSON pointer impl
Also speed up non-escaped key lookup
2020-03-25 08:56:40 -07:00
John Keiser 0bcda5e384 Support JSON pointer in DOM navigation model 2020-03-23 15:05:20 -07:00
Daniel Lemire 3e39a998ce Merge branch 'master' of github.com:lemire/simdjson
Conflicts:
	include/simdjson/jsonstream.h
2020-03-22 12:40:34 -04:00
Daniel Lemire 2867dc50fa Minor typo. 2020-03-22 12:39:01 -04:00
Bruce Mitchener c3c43769ae Fix typos. 2020-03-22 09:14:14 -07:00
John Keiser 36ceaa4452 Keep loaded_bytes in parser to reduce allocation
Also centralized memory ownership to make it easy to keep data around
2020-03-21 18:12:16 -07:00
John Keiser e8b3f9eaad Support document::parse("[1,2,3]"_padded) 2020-03-21 11:15:20 -07:00
Daniel Lemire 5d1e3efce8
faster minifier (#568)
* Fallback should use our scalar code.
* parse should have a nicer error message.
* Making it so that "minify" can use different architectures.
* Let us change the minifier competition so that it tests all implementations.
* Documenting the untaken optimization opportunity.

Co-authored-by: John Keiser <john@johnkeiser.com>
2020-03-20 16:14:47 -04:00
Daniel Lemire 6cefeb338b
std::tie does not work on some compilers (#567)
* std::tie workaround.

* Cleaner solution
2020-03-19 16:56:45 -04:00
John Keiser 5a071c1907 Remove TARGET_FALLBACK 2020-03-17 14:59:47 -07:00
John Keiser 7cf3a7511b Add fallback implementation to CI
- Also add SIMDJSON_IMPLEMENTATION_HASWELL/WESTMERE/ARM64/FALLBACK=1/0 to
enable/disable various implemnentations
2020-03-17 14:59:47 -07:00
John Keiser af203aaf86 Add fallback parser for pre-SSE4.2 machines 2020-03-17 14:59:47 -07:00
John Keiser 8e2c06cb0e Compile with -fno-exceptions 2020-03-17 13:54:37 -07:00
John Keiser 1a5d8f1957 Add tests for SIMDJSON_EXCEPTIONS=0, add `tie()` support 2020-03-17 13:54:37 -07:00
John Keiser 03c828c7ad Add SIMDJSON_EXCEPTIONS=ON to turn on exception interface 2020-03-17 13:54:37 -07:00
John Keiser acc7bd79b0 Support cout << json, cout << minify(json) 2020-03-13 18:59:15 -07:00
Daniel Lemire 89d9de2353
Adding a check to see whether document::stream copy constructor and assignment actually compile (#556)
* Currently, document::stream contains an attribute that is a reference:

```
      document::parser &parser;
```

Yet we try to have it default on the move operator:

```
  stream &operator=(document::stream &&other) = default;
  stream &operator=(const document::stream &) = delete; // Disallow copying
```

```
  stream(document::stream &&other) = default;
  stream(const document::stream &) = delete; // Disallow copying
```

I am not sure what the move is supposed to do with the reference.

I cannot find where we test the copy constructor and assignment. This has been concerned that it is either dead code or buggy code.

* Remove non-working, unnecessary move constructors

* We still want to disallow copies.

Co-authored-by: John Keiser <john@johnkeiser.com>
2020-03-13 12:53:42 -04:00
John Keiser a5afec1f94 Make #defines into simdjson::constants 2020-03-11 19:16:29 -07:00
John Keiser ac0899c043 Add error tests, doc_ref_result[] chaining 2020-03-11 17:19:41 -07:00
John Keiser 40c6213d7e Add parser.load() and load_many() to load files 2020-03-11 17:19:41 -07:00
John Keiser d140bc23f5 Automatically allocate memory as needed in parse 2020-03-11 16:14:54 -07:00
John Keiser 66a2807210 Rename invalid_json to simdjson_error 2020-03-06 16:12:51 -08:00
John Keiser 3bdfe167de Support cout << error 2020-03-06 15:41:51 -08:00
John Keiser 31e8a12e88 Make error_message(error_code) return C string
- Also move all error message logic to include inline
2020-03-06 15:41:51 -08:00
John Keiser 9a7c8fb5be Use parse_many in examples/tests/docs 2020-03-05 12:04:45 -08:00
John Keiser cfef4ff2ad Create parser.parse_many() API 2020-03-05 12:04:45 -08:00
John Keiser 1c922d3b73 Fix JsonStream reference to parser on_error 2020-03-04 14:26:54 -08:00
John Keiser b23dd28a06 Declare functions inline to surface "undefined" errors earlier 2020-03-04 14:26:54 -08:00
John Keiser a55f41a24a Move JsonStream inline implementation to inline .h 2020-03-04 14:26:54 -08:00
John Keiser 5525c6f729 Stop using jsoncharutils.h in JsonStream 2020-03-04 14:26:54 -08:00
John Keiser eb147d9868 Mark jsonformatutils.h/isadetection.h internal
- Move jsonformatutils.h to internal/jsonformatutils.h (it is used by
document::print_json)
- Move isadetection.h to src/ (it is only used internally)
2020-03-04 14:26:54 -08:00
John Keiser f58a5d534e Move parser inline implementation to .cpp 2020-03-04 14:26:54 -08:00
John Keiser b3ea8c406e Add simdjson.cpp for unified use (#515) 2020-03-04 10:12:27 -08:00
John Keiser 99667f7c55 Create top level simdjson.h (#515)
- Allows everyone to #include the same way, singleheader or not.
2020-03-04 10:12:27 -08:00
John Keiser 0b21203141 Document navigation API 2020-03-02 14:49:03 -08:00
John Keiser 910f272467
Add parser implementation interface and selection API (#501)
* Make architecture implementations virtual functions

- Easier to add new architectures (add implementation to implementation.cpp)
- Easier to add new algorithms / functions to architecture selection
(add to implementation.h, implement)
- Automatically select best implementation in static initialization
- Allow user to explicitly select implementation with a string (i.e.
parameter)
- Allow user to inspect current implementation name/description
- Allow user to list available implementations
- Eliminate architecture enum and architecture-based templating
- Add noexcept in non-inline functions

* Move implementation static methods to their own classes

* Detect best supported implementation on first use

* available_implementationsI() -> available_implementations
2020-02-21 16:34:27 -05:00
John Keiser 1f76737510 Make valstat-ish parse APIs 2020-02-18 08:37:07 -08:00
John Keiser bc8bc7d1a8
Lowercase Architecture and ErrorValues (#487)
ErrorValues -> error_code, Architecture -> architecture
2020-02-14 15:21:28 -08:00
Daniel Lemire 083569fca8
This code is terrible and should not be there. (#496) 2020-02-13 07:38:11 -05:00
John Keiser 8e7d1a5f09
Separate document state from ParsedJson
This creates a "document" class with only user-facing document state (no parser internals).

- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)

Usage:

```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```

```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
2020-02-07 10:02:36 -08:00
John Keiser 76c706644a
Move stage 2 tape writing to ParsedJson (#477)
This is a first step to allowing alternate tape formats.
2020-02-04 14:28:42 -08:00
Daniel Lemire c924aaede9
Fix issue472: make JsonStream a template. (#473)
* Fix issue472: make JsonStream a template.

* Adding missing include.

* Tweaking headers and some minor formatting.

* Removing file from aggregation.

* Moving jsoncharutils

* Adding new header.

* Trying another header.

* Let us try to route around Visual Studio's nonesense.
2020-01-30 17:16:41 -05:00
Daniel Lemire 28710f8ad5
fix for Issue 467 (#469)
* Fix for issue467

* Updating single-header

* Let us make it so that JsonStream is constructed from a padded_string which will avoid dangerous overruns.

* Fixing parse_stream

* Updating documentation.
2020-01-29 19:00:18 -05:00
Daniel Lemire e695a19d11
Trying to fix issue 465 (#466)
* Trying to fix issue 465

* Actually testing

* Refreshing amal.

* Removing spurious ;
2020-01-27 11:25:23 -05:00
Daniel Lemire ba14232628
Fixing mem leak. (#461) 2020-01-27 09:31:12 -05:00
Daniel Lemire 3488c49d0a
Basically, haswell processor should be able to count on lzcnt. (#458) 2020-01-22 16:52:55 -05:00
Daniel Lemire ce8fe1bdf6
This isolates a fix found in the large PR https://github.com/lemire/simdjson/pull/445 (#457) 2020-01-22 12:58:59 -05:00
Daniel Lemire fa04595d90 Correcting typo. 2020-01-22 11:08:53 -05:00
Daniel Lemire 80b4dd2e8a
Removing all stdout, stderr from main library. (#455)
* Removing all stdout,stderr from main library.
2020-01-20 16:03:15 -05:00
Daniel Lemire a4025788ae
Commenting out one attribute when SIMDJSON_THREADS_ENABLED is off. (#453) 2020-01-20 11:18:29 -05:00
Daniel Lemire 27861f6358 SIMDJSON_PADDING is now an absolute constant. This is temporary since
padding should go away once  https://github.com/lemire/simdjson/issues/174
is resolved.
2020-01-15 15:49:50 -05:00
Daniel Lemire 1498b78342 Minor simplifications. 2020-01-10 14:07:57 -05:00
dbj 85e84fc1fa improved string padded (#440)
* dirent portable latest version

* improved

std::string argument passed by const reference
ctor added with std::string_view  argument
`allocate_padded_buffer()`  moved here with **optional** check on `length < 1`

* allocate_padded_buffer moved to padded_string.h
2020-01-10 10:15:48 -05:00
UKABUER 773883c486 Fix #420 (#421) 2020-01-09 09:56:43 -05:00
Daniel Lemire 951c4bedf8
Simpler jsonstream (#436)
* One simplification.

* Removing untested functions.
2020-01-07 19:10:02 -05:00
Daniel Lemire 0a874a5063 Some tuning 2020-01-06 11:41:07 -05:00
dbj 2caa6e3370 C++ language version detection (#418)
* added visual_studio folder where visual_studio cmake generated, local artefacts are

* C++ version detection
2020-01-06 11:38:09 -05:00
Daniel Lemire 7bde23590a
Debugging jsonstream (#432)
Fixes #424 (and provide tests for it), as well as #401
2020-01-03 22:22:47 -05:00
John Keiser 165e23773f Refactor stage 2 into structural_parser class 2020-01-02 13:12:22 -07:00
Paul Dreik 399d08c86c use unique_ptr in class parsedjson (#417)
* refactor parsedjson to use unique_ptr instead of owning raw pointer
* fix a potential undefined behavior
* output only first cpu in /proc/cpuinfo
2019-12-31 14:31:45 -05:00
dbj 9c3828fefe STRINGIFY implemented (#402)
* STRINGIFY implemented

* SIMDJSON_THREADS_ENABLED def/undef
2019-12-20 07:57:00 -05:00
John Keiser e2f349e7bd Measure impact of utf-8 blocks and structurals per block directly 2019-12-17 11:41:13 -08:00
Daniel Lemire 102262c7ab
Fixing issue386 (#396)
* Creating arch-specific bitmanipulation.h files.
* Improving system and compiler portability.
* We want to allow trailing_zeroes on zero inputs.
2019-12-16 19:09:18 -05:00
mswilson d33208c7db Correct detection of NEON support (#392)
... as the test as it is currently implemented will always evaluate to true.

Fixes #389
2019-12-10 13:12:17 -05:00
Daniel Lemire 7c560fa137 Cleaning documentation. 2019-11-26 14:13:17 -05:00
Jeremie Piotte f163155929 JsonStream documentation (#381)
* adding Multiline JSON competition chart to doc
* Completing the comments for JsonStream
* Adding a page for JsonStream's documentation.
2019-11-25 18:11:55 -05:00
Jeremie Piotte 29fc51522a
Introducing concurrency mode in JsonStream. (#373)
* JsonStream threaded prototype

* JsonStream Threaded version working. Still supporting non-threaded version.

* Fix where invalid files would enter infinite loop.

* SingleHeader update

* I will remove -pthread in cmake for now.

* Attempt at resolving the -pthread issue
2019-11-21 11:22:06 -05:00
Daniel Lemire 58d249ca16
Introducing move assignments. (#363) 2019-11-09 10:34:32 -05:00
Jeremie Piotte bdc2b07339
Streams of JSON documents + Large files (>4GB) (#350) (#364)
* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* Fix for https://github.com/lemire/simdjson/issues/345

* Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347)

* Final (?) fix for https://github.com/lemire/simdjson/issues/345

* Verbose basictest

* Being more forgiving of powers of ten.

* Let us zero the tail end.

* add basic fuzzers (#348)

* add basic fuzzing using libFuzzer

* let cmake respect cflags, otherwise the fuzzer flags go unnoticed

also, integrates badly with oss-fuzz

* add new fuzzer for minification, simplify the old one

* add fuzzer for the dump example

* clang format

* adding Paul Dreik

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* type

* minor fixes and cleaning.

* Fixing issue 351 (#352)

* Fixing issues 351 and 353

* minor fixes and cleaning.

* removing warnings

* removing some copies

* Fix ARM compile errors on g++ 7.4 (#354)

* Fix ARM compilation errors

* Update singleheader

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* fix integer overflow in subnormal_power10 (#355)

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* Adding new test file, following https://github.com/lemire/simdjson/pull/355

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* merged main into branch

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* merging main

* rough prototype working.  Needs more test and fine tuning.

* prototype working on large files.

* prototype working on large files.

* Adding benchmarks

* jsonstream API adjustment

* minor fixes and cleaning.

* minor fixes and cleaning.

* removing warnings

* removing some copies

* runtime dispatch error fix

* makefile linking src/jsonstream.cpp

* fixing arm stage 1 headers

* fixing stage 2 headers

* fixing stage 1 arm header

* making jsonstream portable

* cleaning imports

* including <algorithms> for windows compiler

* cleaning benchmark imports

* adding jsonstream to amalgamation

* bug fix where JsonStream would bug on rare cases.

* Addind a JsonStream Demo to Amalgamation

* rough prototype working.  Needs more test and fine tuning.

* minor fixes and cleaning.

* adding jsonstream to amalgamation

* merged main into branch

* Addind a JsonStream Demo to Amalgamation

* merging main

* merging main

* make file fix
2019-11-08 17:39:45 -05:00
Paul Dreik 8ae818e17c add ossfuzz support (#362)
* initial oss-fuzz friendly build

parts taken from libfmt, which I wrote and have the copyright to

* fix build error

* add script for building a corpus zip

see https://google.github.io/oss-fuzz/getting-started/new-project-guide/#seed-corpus

* fix zip command

* drop setting the C++ standard

* disable the minify fuzzer, does not pass oss-fuzz check-build test

* fix integer overflow in subnormal_power10

detected by oss-fuzz

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714

* invoke the build like oss fuzz does

* document what the scripts are for and how to use them

* add a page about fuzzing
2019-11-08 10:32:43 -05:00
Daniel Lemire 3439ce19c9
Adding a flag which allows us to disable AVX detection. This exposes a bug. (#356) 2019-11-06 10:39:26 -05:00
Daniel Lemire a065805b0f Fix for https://github.com/lemire/simdjson/issues/345 2019-10-24 15:34:30 -04:00
John Keiser 64872bddf4 Eliminate stage1_find_marks_flatten.h 2019-10-14 12:33:46 -07:00
John Keiser 9bbd6bd874 Move headers to implementation area
- jsoncharutils.h, numberparsing.h, simdprune_tables.h
2019-10-14 11:51:41 -07:00
Daniel Lemire 81f9aac13f Fixing minor perf. regression. 2019-10-07 16:31:44 -04:00
Juho Lauri b2eff3c90c case insensitive move_to_key (#324)
* case insensitive move_to_key
* portable strcmpi
2019-10-07 16:08:17 -04:00
Juho Lauri cf9dbe583d improved const correctness (#321) 2019-10-02 14:25:28 -04:00
John Keiser de8df0a05f Combined performance patch (5% overall, 15% stage 1) (#317)
* Allow -f

* Support parse -s (force sse)

* Simplify flatten_bits

- Add directly to base instead of storing variable
- Don't modify base_ptr after beginning of function
- Eliminate base variable and increment base_ptr instead

* De-unroll the flatten_bits loops

* Decrease dependencies in stage 1

- Do all finalize_structurals work before computing the quote mask; mask
  out the quote mask later
- Join find_whitespace_and_structurals and finalize_structurals into
  single find_structurals call, to reduce variable leakage
- Rework pseudo_pred algorithm to refer to "primitive" for clarity and some
  dependency reduction
- Rename quote_mask to in_string to describe what we're trying to
  achieve ("mask" could mean many things)
- Break up find_quote_mask_and_bits into find_quote_mask and
  invalid_string_bytes to reduce data leakage (i.e. don't expose quote bits
  or odd_ends at all to find_structural_bits)
- Genericize overflow methods "follows" and "follows_odd_sequence" for
  descriptiveness and possible lifting into a generic simd parsing library

* Mark branches as likely/unlikely

* Reorder and unroll+interleave stage 1 loop

* Nest the cnt > 16 branch inside cnt > 8
2019-10-01 12:01:08 -04:00
Daniel Lemire 5765c81f66 Fixing number parsing of large ints 2019-09-02 12:40:39 -04:00
Daniel Lemire 92334a8e28 Better tests. 2019-09-02 12:32:44 -04:00
Daniel Lemire c4218c8e40
Accept large unsigned integers (#295) (#306)
* handle uint64 value in JSON
* Add integer_tests
* Add get_unsigned_integer() on  ParsedJson::BasicIterator
* Write 'u' to tape when the value seems unsigned
* Add to handle 'u' element
* Brush up integer_tests.cpp
* Append tests/integer_tests in .gitignore
* Add comments to is_integer and is_unsigned_integer
2019-09-02 11:56:26 -04:00
saka1 c1f27fb848 Accept large unsigned integers (#295)
* handle uint64 value in JSON
* Add integer_tests
* Add get_unsigned_integer() on  ParsedJson::BasicIterator
* Write 'u' to tape when the value seems unsigned
* Add to handle 'u' element
* Brush up integer_tests.cpp
* Append tests/integer_tests in .gitignore
* Add comments to is_integer and is_unsigned_integer
2019-09-02 10:50:24 -04:00
Daniel Lemire f667d4965d
This is a bug fix: our prev function was buggy. (#291) 2019-08-23 18:59:43 -04:00
John Keiser 585f84a734 Move architecture-specific headers to src/ (#287)
* Use namespaces instead of templates for stage1 impls

* Move stage1 implementation into the src/ directory

* Move architecture-specific code to src/
2019-08-21 07:59:49 -04:00
Daniel Lemire a1bff85263 Documenting the limits of move_to_key with respect to Unicode Equivalence. 2019-08-20 17:10:30 -04:00
John Keiser 94673bcdf2 Use methods for utf8 checker 2019-08-16 14:15:37 -07:00
John Keiser aa15917c9d Use methods instead of functions for simd_input 2019-08-16 14:07:30 -07:00
Vitaly Baranov 6a2728e730 No allocation in the iterator's constructor (#276)
* Get rid of dynamic allocation in ParsedJson::Iterator.

* Implement copy assignment operator for ParsedJson::Iterator.

* ParsedJson::Iterator is now a template class.
2019-08-15 19:42:15 -04:00
John Keiser 0042d9b406 Move UTF8 checking functions into their own file 2019-08-14 10:34:11 -07:00
John Keiser 237b8865f5 Correct header #define 2019-08-13 17:44:26 -07:00
John Keiser 8f01cece3a Move simd_input and associated functions to their own header 2019-08-13 17:44:06 -07:00
Daniel Lemire 2ca574d9e6
Removing windows.h (#273) 2019-08-12 19:40:21 -04:00
Daniel Lemire 3fb82502f7
This gets rid of the silly ALLOW_SAME_PAGE_BUFFER_OVERRUN (#268) 2019-08-09 17:36:32 -04:00
Vitaly Baranov 9dfab9d9a4 Disable UBSan error in trailing_zeroes(). (#266)
https://github.com/lemire/simdjson/issues/265
2019-08-09 14:37:22 -04:00
John Keiser f3c3afd4cd Use direct call to templated flatten_bits instead of if (#262)
* Use direct call to templated flatten_bits instead of if

* Put really_inline back on find_structural_bits_64
2019-08-08 15:09:17 -04:00
John Keiser b1beacd1f3 Make headers show up in Header Files in VS2019 (#257) 2019-08-05 16:36:52 -04:00
John Keiser d9a0e2b8f4 Fix Intellisense errors opening .h files on VS2019 (#253) 2019-08-04 19:57:55 -04:00
ioioioio 2a24567370
Replace macros by include files (#236) (#248)
* stage1 compiles without macros

* cleaning

* amalgation is weird but works

* macros are removed from stringparsing

* amalgation fixed

* Huge macros are removed.

* clang-format
2019-08-04 15:58:35 -04:00
Daniel Lemire bd9628df93 Producing a new release 2019-08-04 15:43:47 -04:00
Daniel Lemire 99a153d9e8
Hiding the pointer away... (#252)
* Hiding the runtime dispatch pointer in a source file so it is not an exported symbol
* Disabling hard failure on style check.
* Fixes https://github.com/lemire/simdjson/issues/250
2019-08-04 15:41:00 -04:00
Daniel Lemire 2a240e3fe2 Fixing style violation. 2019-08-01 16:38:51 -04:00
Daniel Lemire ee66fb1c60 Version 0.2.0. 2019-08-01 16:23:30 -04:00
Daniel Lemire 038b18edf1
Adding style scripts. (#243)
* Adding style scripts.
2019-08-01 16:09:26 -04:00
ioioioio 968117c940 preventing clang-format to move sysinfoapi.h (#244) 2019-08-01 15:06:50 -04:00
Daniel Lemire 6788b12d65 It is not beneficial to try to get clever with trailing zeroes. (Lead to major performance
regression under haswell+ for stage 1).
2019-08-01 14:44:04 -04:00
Daniel Lemire 66ffc1b2d6 Adding a remark. 2019-08-01 11:33:51 -04:00
John Keiser bf59ba76f5 Fix most warnings on VS2019 (#241) 2019-07-31 17:43:45 -04:00
Daniel Lemire 76da659977 Fixing amalgamate under ARM 2019-07-30 22:10:48 +00:00
ioioioio c2eea8abba Style uniformization (#238)
* massive clang-format -style=LLVM

* naming harmonization

* adding commentary about sysinfoapi.h
2019-07-30 17:18:10 -04:00
ioioioio 5f20d3eb34 Merging No duplicate tail (PR#223) (#232)
* Use __forceinline on Windows for really_inline

https://docs.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=vs-2019#inline-__inline-and-__forceinline

* Don't duplicate find_structural_bits for final chunk

* writing coherent macro definitions
2019-07-29 14:11:42 -04:00
Daniel Lemire 3c0f5a3fe4 Improving the documentation. 2019-07-29 14:10:49 -04:00
Daniel Lemire 771e9cd68a
Trying again... (#235) 2019-07-29 13:55:13 -04:00
Daniel Lemire c328afee57 This should fix master. 2019-07-29 13:44:25 -04:00
Daniel Lemire 3dae86223d Changing intrinsic name. 2019-07-29 13:39:54 -04:00
Daniel Lemire 85e31a5479
This fixes the "big exp" bug, although we need to assess the performance and maybe do some tuning. (#233) 2019-07-29 13:28:16 -04:00
Daniel Lemire a53d95099c
Intrinsic-based flatten (#234)
* Providing a flatten function with intrinsics (for Visual Studio).
2019-07-29 13:28:02 -04:00
Daniel Lemire eba02dc1b9 Runtime dispatch
* Attempt 1 - fn targeting

GCC won't work with templates with different targets, need to specialize all the way up the call stack.

* Compiles properly with cmake. Does not with the Makefile.

* Compilation works with Makefile

* instruction_set changes to architecture

* some aesthetic changes

* fix amalgation and tests + aesthetic changes

* This now compiles and passes tests under CLANG

* Minor correction.

* Trying to make it work on ARM

* Adding missing namespace

* Missing bracket

* Fixing minor compilation issues.

* Getting parse to use runtime dispatch

* Fixing amalgamation script.

* Making sure that NEON is supported.

* Fixing typo

* Merging https://github.com/lemire/simdjson/pull/229

* Manual merge of
https://github.com/lemire/simdjson/pull/229
by @jkeiser  (second part)

* Trying another way.

* Removing the paral.

* Fixing the make file

* Let us make the practice run long enough.

* Resolved the awful slowness.

* Cleaning the README.md

* With runtime dispatching, we should not need flags anymore.

* Changing isa detection file's name + fixing typos.
2019-07-28 22:46:33 -04:00
ioioioio bcabdfc1ae Json pointer (#220)
* json pointer support

* Addition of tests for the json pointer

* Adding a new tool for the JSON Pointer support, and some documentation.
2019-07-26 18:38:10 -04:00
Daniel Lemire a3beac8d13
This simplifies back the number parsing code... The extra work introduced recently is seemingly unnecessary. (#218) 2019-07-18 11:50:26 -04:00
Daniel Lemire e926b4b3c9
More accurate number parsing (#217)
* This drastically improves the accuracy (down to to a ULP of 1)

* More comments and documentation.
2019-07-15 22:17:49 -04:00
Daniel Lemire 6c168f046d
Optimizing stage1 (#216)
* Optimizing stage 1-- avx edition

* Optimizing sse.

* Saving 0.5% in instruction count (NEON).
2019-07-11 20:59:21 -04:00
Daniel Lemire 4b7e87ec7f
Removing garbage. (#213) 2019-07-09 21:51:16 -04:00
Daniel Lemire 98b387aac3
Fixing a messed up interleaved #ifdef/namespace. (#211) 2019-07-09 19:48:20 -04:00
Daniel Lemire be956654b2 Minor cleaning = annotating simdjson namespaces and making sure that we don't have headers all over. 2019-07-09 19:24:08 -04:00
Daniel Lemire 977f57fd37 We need to guard the simdutf8check files. 2019-07-09 16:53:28 -04:00
ioioioio 7369339c88 Neon utf8validation (#207)
* utf8 validation on neon works
2019-07-09 15:14:34 -04:00
Daniel Lemire 3f79385160
Removing some fprintf. (#209) 2019-07-09 13:04:44 -04:00
ioioioio b0d9c074e1 check_utf8_helper has a more meaningful name 2019-07-05 11:09:28 -04:00
Daniel Lemire fba27ef4b9 I missed a few. Building up VS support. 2019-07-04 17:45:45 -04:00
Daniel Lemire 19cdc09928 Improving support for VS 2019-07-04 17:36:26 -04:00
Daniel Lemire 2b2d93b05f Various minor tweaks. 2019-07-04 17:19:05 -04:00
ioioioio f7ea2629e4 Fixing warnings and Microsoft intinsics. 2019-07-04 10:13:40 -04:00
ioioioio 861a6a17e4 SSE implementation integrated 2019-07-03 17:15:21 -04:00
ioioioio 0df6d83f08 deleting useless comments and namespace indications 2019-07-03 10:47:45 -04:00
ioioioio 036f9d5a45 Merge branch 'master' of https://github.com/lemire/simdjson into Multiple_implementation_refactoring_stage2 2019-07-03 10:34:58 -04:00
ioioioio 3f24879157 Stage2 refactored to simplify multiple implementations 2019-07-02 17:12:00 -04:00
ioioioio 9230588ce8 conflicts are solved 2019-07-02 15:21:00 -04:00
Daniel Lemire aa78b70d69 Introducing a "native" instruction set so that you do not need to do #ifdef to select the right SIMD set all the time.
Fixing indentation.
Removing some obsolete WARN_UNUSED.
Fixing a weird warning with optind variable.
2019-07-01 14:18:30 -04:00