Commit Graph

195 Commits

Author SHA1 Message Date
Daniel Lemire 0655a135e6 Reverting. 2020-06-17 17:52:07 +00:00
Daniel Lemire 4474f8ef18 Cleaning a bit the examples. 2020-06-17 16:24:55 +00:00
Daniel Lemire 27a75a9085 Tweaking. 2020-06-15 17:54:34 -04:00
Daniel Lemire 954d6c326d New examples. 2020-06-15 17:45:15 -04:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire 89b059b1ea
Testing with GCC 10 and clang 10 (#926)
* Testing with GCC 10 and clang 10

* Fixing spurious space

* gcc10 does not need the cmake installation.

* We don't want to run the perf test on ARM. I ignore them systematically. ARM performance
should be assessed manually.

* Switching to GCC 10 and Clang 10

* Disabling some tests under sanitizers when they involve rapidjson or other parsers.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 17:58:53 -04:00
Daniel Lemire 4dfbf98e4e
Using a worker instead of a thread per batch (#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.

To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.

  This fixes our parse_stream benchmark which is just busted.
  This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.

Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.

Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 16:51:18 -04:00
Daniel Lemire 45e2178ada Duh. 2020-06-11 17:20:28 +00:00
Daniel Lemire a6e4933d93 Exposing the string minifier. 2020-06-11 13:07:18 -04:00
John Keiser fe01da077e Make threaded version work again 2020-06-07 16:21:00 -07:00
John Keiser 3e226795f0 Run all passing json against parse_many. Empty documents pass, too. 2020-06-07 16:20:51 -07:00
John Keiser c4a0fe1606 Add tests for parse_many() errors 2020-06-07 16:20:46 -07:00
John Keiser ef63a84a3e Move document stream state to implementation 2020-06-07 16:20:44 -07:00
Daniel Lemire 7a69da16e4
Fixing issue 906 (#912)
* Fixing issue 906

* Safe patching.

* Now with explanations.

* Bumping up memory allocation.

* Putting the patch back.

* fallback fixes.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-05 15:37:09 -04:00
Daniel Lemire 12150baa5e
Using just ASCII. (#899)
* Using just ASCII.

* Let us prune checkperf.

* Moving the description of lookup2 to the HACKING.md file.
2020-05-21 21:59:06 -04:00
Daniel Lemire d2c9ea8a9a
Detect bash instead of relying on MSVC detection. (#894) 2020-05-20 12:13:14 -04:00
John Keiser 5312fd30e5 Fix CRT_SECURE warnings in clang 2020-05-04 11:36:00 -07:00
John Keiser 1d06624d38 Unset /D_CRT_SECURE_NO_WARNINGS
- Also localize DISABLE_DEPRECATED_WARNING so that we catch other
  deprecations
2020-05-04 11:35:05 -07:00
Furkan Usta 064eb0b24f CMake: Make simdjson-internal-flags subsume simdjson-flags 2020-05-03 02:48:29 +03:00
Furkan Usta af968c5b44 Merge branch 'master' of github.com:simdjson/simdjson into cmake-flags 2020-05-03 02:12:23 +03:00
Furkan Usta 1e9488d4a6 Remove Microsoft comment regarding dirent in parsingchecks 2020-05-02 16:01:30 +03:00
Furkan Usta ff1d77ead9 Add NOMINMAX to parsingchecks 2020-05-02 15:33:53 +03:00
Furkan Usta 977e1a94b2 Use dirent_portable.h only in MSVC 2020-05-02 15:16:50 +03:00
Furkan Usta 60ee5fc844 Enable numberparsingcheck and stringparsingcheck on MSVC 2020-05-02 15:12:30 +03:00
Furkan Usta 293c104cc4 CMake: Separate public and private compilation flags
simdjson-internal-flags for macros and warnings
simdjson-flags for pthread, sanitizer, and libcpp
2020-05-02 04:08:47 +03:00
Daniel Lemire fa4ce6a8bc
There is confusion between gigabytes and gigibytes. Let us standardize throughout. (#838)
* There is confusion between gigabytes and gigibytes.

* Trying to be consistent.
2020-05-01 12:16:18 -04:00
John Keiser 0e6ea76e88
Make checkperf work on Windows (#799)
* Make command line arguments work for Windows

* Run checkperf on Windows
2020-04-27 14:20:05 -04:00
Daniel Lemire f397b6fedf
Another example. (#790)
* Another example.

* Adding a reference to error chaining.
2020-04-23 21:48:41 -04:00
Daniel Lemire 4f72d5cfac
This adds another example (#785) 2020-04-23 18:29:28 -04:00
Daniel Lemire e030f02776 Merge branch 'master' into jkeiser/wconversion 2020-04-22 22:03:34 -04:00
Daniel Lemire f0ac55ec0c
testing on freebsd (#768)
* Adding cirrus tests
* Adding cirrus badge.
2020-04-22 21:22:09 -04:00
John Keiser d4a37f6ef5 Enable conversion warnings on Linux and Windows 2020-04-22 14:21:30 -07:00
John Keiser d3e44b1108 Add amalgamation support to cmake 2020-04-20 19:50:51 -07:00
John Keiser 53d28a713c Fix cmake error when SIMDJSON_COMPETITION=OFF 2020-04-20 10:49:40 -07:00
John Keiser e5e6a46c37 Consolidate multi-implementation tests
Uses SIMDJSON_FORCE_IMPLEMENTATION to switch the implementation at test
time.
2020-04-19 09:59:49 -07:00
John Keiser 22b9a53bef Add SIMDJSON_FORCE_IMPLEMENTATION 2020-04-18 18:21:56 -07:00
John Keiser ff09b6c824 Run fewer redundant steps and configs in CI 2020-04-17 12:23:05 -07:00
John Keiser 289cc3e7a0 Treat warnings as errors during compilation 2020-04-15 19:59:38 -07:00
John Keiser fd418f568c Fix c++11 warnings on clang
- namespace x::y is C++17
- static_assert requires message in C++11
2020-04-15 17:27:48 -07:00
John Keiser 09cf18a646 Add C++11 tests to cmake
- Add simdjson-flags target so callers don't have flags forced on them
2020-04-15 17:26:25 -07:00
Daniel Lemire 6d7c77ddc1
Let us try to check with the exceptions disabled. (#707)
* Tweaking code so that we can run all tests with exceptions off.
* Removing SIMDJSON_DISABLE_EXCEPTIONS
2020-04-15 16:45:36 -04:00
Daniel Lemire efd706528b Minor tweaks to the CMake. 2020-04-15 10:19:05 -04:00
Daniel Lemire b523c43927
Can we provide a size() function to arrays and objects? (eager approach) [TO BE MERGED] (#690)
* This is an implementation of "size()" for arrays and objects.
* Adding benchmark
* Adding a size() remark in the documentation.
* Extending size() to result types.
2020-04-15 10:15:48 -04:00
Paul Dreik 75545ff70d
ref qualify parser methods to avoid use of dangling objects (#703)
To avoid using data belonging to a temporary, the parse functions are ref qualified to get a compile error if used on an rvalue. See https://github.com/simdjson/simdjson/issues/696

Compilation tests are also added, to make sure bad usage fails to compile.

Reviewed by jkeiser.
2020-04-15 09:57:52 +02:00
Daniel Lemire 3c6ef83046
Trying to correct the documentation so that it actually describes how the code behaves. (Attempt two) (#712)
* Trying to correct the documentation so that it actually describes how the code behaves.

* tweaking the wording.

* Improving.

* Removing confusing sentence.

* Fixing formatting.

* Now with working example, tested.

* Added a smaller piece of code
2020-04-14 22:31:21 -04:00
John Keiser b9ac0a79f1
Merge pull request #715 from simdjson/jkeiser/thorough-type-tests
Test more variants of cast, get, etc.
2020-04-14 16:08:36 -07:00
Daniel Lemire 8539896f3d
It is inconvenient to be unable to print a padded_string. (#713)
* It is inconvenient to be unable to print a padded_string.

* Allows us to print the padded_string even when it is embedded in result object when exceptions are enabled.
2020-04-14 19:07:32 -04:00
John Keiser a3b508ceff Test get<>(), exception vs. no exception, explicit vs. implicit cast 2020-04-14 13:18:42 -07:00
John Keiser 1ff22c78b3 Add quickstart to cmake 2020-04-09 14:56:54 -07:00