Commit Graph

365 Commits

Author SHA1 Message Date
John Keiser aa1eabbb56 Add benchmark that stops early 2020-12-06 15:23:51 -08:00
Paul Dreik f62ca21dd1
enable boost json (#1292)
* bump boost.json and see if it works in simdjson CI

* enable boost json

* clean up

* add boost json to deps

* use boost if std::string_view is available

* add build with c++20

* use docker image which has the proper libc++ installed
2020-11-10 13:55:04 -05:00
friendlyanon c805fc28a4
Remove git modules (#1258)
* Bump minimum CMake version

* Remove unnecessary git checks

* Move benchmark options where they are used

* Declare helper functions for dependencies

The custom solution here is tailored for fast configure times, but only
works for dependencies on Github.

* Import dependencies using the declared commands

* Remove git submodules

* Call target_link_libraries properly

target_link_libraries must not be called without a requirement
specifier.

* Fix includes for competition

Co-authored-by: friendlyanon <friendlyanon@users.noreply.github.com>
2020-11-04 13:34:29 -05:00
Daniel Lemire 218c274090
Updating main branch for legacy libc++ support (#1288)
* Updating main branch for legacy libc++ support

* Adopting

* Removing unnecessary math header.

* Updating the single-header files so we can pass the new tests.

* Portable infinite-value detection is hard.

* Working toward disabling boost json selectively.

* Selectively disabling Boost JSON

* More work toward selectively disabling boost json.
2020-11-04 12:24:42 -05:00
Paul Dreik af4db55e66
remove trailing whitespace (#1284) 2020-11-03 21:48:09 +01:00
Paul Dreik f93fb21c95
optionally disable deprecated apis (#1271)
Introduce cmake option SIMDJSON_DISABLE_DEPRECATED_API (default Off)
which turns off deprecated simdjson api functions by setting the macro
 SIMDJSON_DISABLE_DEPRECATED_API.

For non-cmake users, users will have to set SIMDJSON_DISABLE_DEPRECATED_API
by some other means to disable the api.

Closes #1264
2020-11-01 06:38:52 +01:00
Danila Kutenin f46a0f64f2
PPC64 support (#1254)
* Initial PPC64 support

* Add travis CI

* Fix outdated cmake version for travis

* Fix indendtation

* Try another workaround for outdated cmake in travis

* Try beta cmake

* Add dash before beta

* Use builtin snaps

* Use cmake as rocksdb

* Test cmake on bionic

* Remove unnecessary things from travis

* Remove unnecessary things from travis

* Another try of compiler install

* Add all major compilers

* Add all major compilers

* Add all major compilers

* Tweak travis a bit

* Typo

* More robust travis

* Typos typos typos

* Add fewer compilers, add non specific build for clang and gcc, should be the final config

* CMAKE_FLAGS is in incorrect place

* Remove default implementation

* Limit build thread number

* Fall back prefix_xor to a usual implementation, no performance boost is noticed

* Test for power9 as it is the main architecture for OpenPOWER right now

* Add to documentation to build with power9 as the implementation is compatible but compiler optimizations is not

* Replace ARM with PPC in the comment
2020-10-27 18:43:39 -04:00
Daniel Lemire 14039d05a9
Adding a new benchmark for ondemand: distinct user id (#1239)
* Adding a distinct user id benchmark

* reenabling everything

* Removing an unnecessary "value()".

* Better tests of the examples and some fixes.

* Guarding exception code.
2020-10-23 08:47:01 -04:00
Daniel Lemire c592da4937
Adds yyjson to our internal benchmarks. (#1244) 2020-10-21 16:23:20 -04:00
Daniel Lemire 3e8e797bc2 Typo. 2020-10-19 17:30:52 -04:00
Paul Dreik 7bf391c54a
fix potential use of uninitialized value warning, avoid casting away const
This fixes a "potentially use of uninitialized value" warning, as well as a cstyle cast to non-const.
2020-10-16 22:14:42 +02:00
Daniel Lemire 07a6e098c8
This would allow users to find out what builtin is. (#1227)
* This would allow users to find out what builtin is.

* Trying another approach.

* Added instructions.

* Cleaning up the printout.

* Let us be less invasive.

* Adding a comment.
2020-10-15 21:58:42 -04:00
Daniel Lemire e4897d6b54
We have hardcoded 32 (#1236) 2020-10-15 21:57:10 -04:00
Daniel Lemire bb2bc98a22
Fix issue https://github.com/simdjson/simdjson/issues/1127 (#1224) 2020-10-13 09:18:54 -04:00
Paul Dreik 1d9926698e
update how boost.json is invoked, fix missing separators (#1203)
* initial try at adding boost json to the benchmark

* clean up

* qualify memcpy etc. with std::

* clang format

* extra space

* update benchmark with help from Vinnie Falco from Boost.json

* add missing separators
2020-10-09 18:22:37 -04:00
John Keiser 5b926b8196 Support array iteration over document 2020-10-06 11:29:45 -07:00
John Keiser cae91983ec Fix issue with early destruction 2020-10-06 11:29:45 -07:00
John Keiser 3190ef0c1f Check benchmark results in release builds 2020-10-06 11:29:45 -07:00
John Keiser c7c1372833 Allow reuse of value to try multiple types 2020-10-06 11:29:45 -07:00
John Keiser 6d978c383a Kinder, gentler implementation selection
- Allow user to specify SIMDJSON_BUILTIN_IMPLEMENTATION
- Make cmake -DSIMDJSON_IMPLEMENTATION=haswell *only* specify haswell
- Move negative implementation selection to
-DSIMDJSON_EXCLUDE_IMPLEMENTATION
- Automatically select SIMDJSON_BUILTIN_IMPLEMENTATION if
SIMDJSON_IMPLEMENTATION is set
- Move implementation enablement mostly to implementation files
- Make implementation enablement and selection simpler and more robust
- Fix bug where programs linked against simdjson were not passed
SIMDJSON_XXX_IMPLEMENTATION or SIMDJSON_EXCEPTIONS
2020-10-06 11:29:45 -07:00
John Keiser b70e85fd10 Only include source in bench_sax 2020-10-06 11:29:45 -07:00
John Keiser 30fe86ed32 Use simdjson::builtin instead of haswell/begin+end 2020-10-04 12:47:30 -07:00
John Keiser b4df0e7c9e Fix domnoexcept to actually be noexcept 2020-10-04 12:47:30 -07:00
John Keiser 6b219e3e25 Use ::stage2 where it's needed 2020-10-04 12:47:30 -07:00
John Keiser baf6607e74 Make ondemand build without #include "simdjson.cpp" 2020-10-04 12:47:30 -07:00
John Keiser a700848bae Move ondemand implementation to include/ 2020-10-04 12:47:30 -07:00
John Keiser b234d74f43 Remove unnamed namespace from ondemand 2020-10-04 12:47:30 -07:00
John Keiser 49faf7af1a Make simdjson_result implementation-specific 2020-10-04 12:47:30 -07:00
John Keiser 985b52331a Require object to be exact and in order 2020-10-04 12:47:30 -07:00
John Keiser 021dded9dd Add Kostya benchmarks 2020-10-04 12:47:30 -07:00
John Keiser 8fd0cdc732 Iterate value without going through indirection
Avoids issues with value being released early
2020-10-04 12:47:30 -07:00
John Keiser fe7a4d42d3 Fix top level values 2020-10-04 12:47:30 -07:00
John Keiser e89d6353af Add a "sum" benchmark with no appending to vector 2020-10-04 12:47:30 -07:00
John Keiser c5bb74d184 Pave the way for non-record-based benchmarks 2020-10-04 12:47:30 -07:00
Daniel Lemire 874349c928 Making the code cleaner. 2020-10-04 12:47:30 -07:00
Daniel Lemire 157604b3a5 I think that this is better (fairer) code. 2020-10-04 12:47:30 -07:00
John Keiser b935544d65 Make benchmark output easier to follow 2020-10-04 12:47:30 -07:00
Daniel Lemire 03271df579 This adds a frequency column (useful because if the frequency tanks, then other numbers are suspect). 2020-10-04 12:47:30 -07:00
John Keiser 0633d3a07d Make branch miss numbers integers 2020-10-04 12:47:30 -07:00
John Keiser 045377a594 Fix errors with g++ 2020-10-04 12:47:30 -07:00
John Keiser b5c8030f19 Fix LargeRandom<OnDemand> 2020-10-04 12:47:30 -07:00
John Keiser f75e856d2b Compare records to ensure benchmarks work 2020-10-04 12:47:30 -07:00
John Keiser 44d689bc6e Make instructions / cycle counters more useful 2020-10-04 12:47:30 -07:00
John Keiser 9e433c2f19 Move benchmarks into their own directories 2020-10-04 12:47:30 -07:00
John Keiser 4d89076bdc Check for EOF when skipping containers
Revert that?

Or not
2020-10-04 12:47:30 -07:00
John Keiser 283ac3191f Rename parse->iterate, add iterate_raw 2020-10-04 12:47:29 -07:00
John Keiser 4dd0c80dad Move current_string_buf_loc to json_iterator 2020-10-04 12:47:29 -07:00
John Keiser 3b53c6ca47 Use json_iterator as shared state instead of document 2020-10-04 12:47:29 -07:00
John Keiser a90b8fb449 Remove depth tracking from ondemand api 2020-10-04 12:47:29 -07:00
John Keiser 98be2c91df Fix SAX benchmarks to actually push to vector 2020-10-04 12:47:29 -07:00
John Keiser 2657e5e226 Fix points SAX to actually record points 2020-10-04 12:47:29 -07:00
John Keiser cfcb0d4fb7 Use json_iterator in array/object 2020-10-04 12:47:29 -07:00
John Keiser 4065529bdf Don't try to compile Haswell benchmarks on ARM 2020-10-04 12:47:29 -07:00
John Keiser 6be2db8c42 Fix SAX benchmark to actually add tweets 2020-10-04 12:47:29 -07:00
John Keiser 5cf68416d8 Don't bother comparing field names in parserandom 2020-10-04 12:47:29 -07:00
John Keiser ebcb3c6b3b On-demand parse implementation 2020-10-04 12:47:29 -07:00
Paul Dreik 04267e0f6b
add boost.json to benchmark (#1202)
Add boost.json to the benchmark.
It was accepted into boost 20201003, see https://lists.boost.org/Archives/boost/2020/10/250129.php.

The upstream repo is (expected to eventually be migrated to boost): https://github.com/CPPAlliance/json
2020-10-04 10:00:09 +02:00
Daniel Lemire a540e6afc5
Testing on minimalist alpine (linux) images (#1200)
* Tweaking header includes to make it safer.

* Adding the actual tests.

* Fixing my syntax.
2020-10-02 13:32:09 -04:00
Daniel Lemire 9865bb6904
Make it possible to check that an implementation is supported at runtime (#1197)
* Make it possible to check that an implementation is supported at runtime.

* add CI fuzzing on arm 64 bit

This adds fuzzing on drone.io arm64

For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.

Closes: #1188

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* Make it possible to check that an implementation is supported at runtime.

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* We need to accomodate cxxopts

Co-authored-by: Paul Dreik <github@pauldreik.se>
2020-10-02 11:04:51 -04:00
Daniel Lemire 60c139a844
Faster and more correct serialization (#1168)
* Adding new files.

* Better.

* Fixing minifier and adding tests.

* Adding benchmarks.

* Including the array header.

* Replacing old stream-based code by the new code.

* Doubling up the itoa.

* Hidden away to_chars in internal namespace.

* Removing the repetitions.

* Documented the atoi functions.

* Tuning the escape sequences.

* Moving the operators off the main namespace.

* Added more tests.

* Tweaking the implementation so that it works with and without exp.

* The string_builder template and mini_formatter class
 are not part of  our public API and are subject to change
 at any time!

* Adding a benchmark and some optimization.

* Cleaning.

* Strictly speaking, this header is needed.
2020-09-23 10:00:39 -04:00
Daniel Lemire f410213003
Improve documentation on padding
- Improves and clarifies the documentation on padding.
 - Use std:: prefix for memcpy, strlen etc.

Related to issues #1175 and #1178
2020-09-23 09:07:14 +02:00
Daniel Lemire 4c11652808
This must be a typo (#1140) 2020-08-28 20:35:13 -04:00
John Keiser b2779c35df Fix issue with unsupported unreachable on Windows 2020-08-18 21:35:12 -07:00
John Keiser 18564f1ae2 Don't benchmark unless haswell is available 2020-08-18 21:25:03 -07:00
John Keiser 638f1deb62 Add DOM tweet reader for comparison 2020-08-18 21:25:03 -07:00
John Keiser 7e74d30f45 [WIP] tweet reader SAX benchmark 2020-08-18 21:25:03 -07:00
Daniel Lemire 8a8eea53a2
Prefixing macros (issue 1035) (#1124)
* Renaming partially done.

* More prefixing.

* I thought that this was fixed.

* Missed one.

* Missed a few.

* Missed another one.

* Minor fixes.
2020-08-18 18:25:36 -04:00
Daniel Lemire 501fed6c4f
This would disable bash scripts under FreeBSD. (#1118)
* This would disable bash scripts under FreeBSD.

* Let us also disable GIT.

* Let us try to just disable GIT

* Nope. We must have both bash and git disabled.
2020-08-17 11:50:57 -04:00
Daniel Lemire 2f92a34bb7
Turns out that passing dom::element by reference can be a performance killer. (#1086)
* Turns out that passing dom::element by reference can be a performance killer.

* Tweaking.
2020-08-01 10:31:47 -04:00
Daniel Lemire 84dc398d32 Adding a couple of tests. 2020-07-31 15:29:10 -04:00
Daniel Lemire f80668e87f
This removes the crazy alignment requirements. (#1073)
* This removes the crazy alignment requirements.
2020-07-27 16:19:01 -04:00
Daniel Lemire af18d5ed81
This adds a validation benchmark (#1040) 2020-07-20 18:56:39 -04:00
Daniel Lemire d0ce2f0b5a
Fixing clang under visual studio (#1028)
* Lots of fixes

* Removing some lambdas

* Removing some functional programming.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-07-06 18:58:19 -04:00
Daniel Lemire 29e744fdbb Adding warning message. 2020-06-24 19:23:02 -04:00
Daniel Lemire 515b87bcbe Disabling perfcheck for ninja 2020-06-24 18:45:47 -04:00
Daniel Lemire 5b4acf14ea Removing space. 2020-06-24 16:51:28 -04:00
Daniel Lemire 5fc6cb15b8 This should make things even more robust. If .git is not found, just disable all git work. 2020-06-24 16:12:19 -04:00
Daniel Lemire f6e9a8eee4 Making the cmake more verbose so we can figure out what is happening. 2020-06-24 15:44:22 -04:00
Daniel Lemire cb8a9ef2c0 This removes git as a dependency 2020-06-24 15:13:47 -04:00
John Keiser 1ff55c2729 Replace auto [x,error] with .get() everywhere 2020-06-21 16:26:59 -07:00
John Keiser 6fa5abcd7e Replace x.get<T>() with x.get(v) or T(x) 2020-06-21 14:36:38 -07:00
John Keiser 9899e5021d Allow use of document_stream with tie() 2020-06-20 21:15:05 -07:00
John Keiser a7fc7d4ffb Switch from get(v,e) to e = get(v) 2020-06-20 17:57:09 -07:00
John Keiser f336103f63 Convert tools/docs/benchmarks to bool get() idiom 2020-06-20 17:55:46 -07:00
John Keiser 56e2b38048 Add bool result from tie()/get(), get<T>(T&,error_code&) 2020-06-20 17:55:46 -07:00
John Keiser 7339f67dd7
Merge pull request #462 from simdjson/jkeiser/if-backslash
Wrap backslash processing in a branch
2020-06-17 07:07:58 -07:00
Daniel Lemire 7ea05d038e
New API traversal tests. (#931)
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-15 13:15:52 -04:00
Daniel Lemire 33930ff046 Adding link. 2020-06-15 13:07:53 -04:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire d1a54249e7 New API traversal tests. 2020-06-12 17:42:57 -04:00
Daniel Lemire 4dfbf98e4e
Using a worker instead of a thread per batch (#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.

To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.

  This fixes our parse_stream benchmark which is just busted.
  This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.

Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.

Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 16:51:18 -04:00
Daniel Lemire 1b6258ec8c Added std::minify 2020-06-12 16:37:41 -04:00
John Keiser 7c6723d912 Print progress bar even if there is only one file 2020-06-12 10:01:19 -07:00
Daniel Lemire a6e4933d93 Exposing the string minifier. 2020-06-11 13:07:18 -04:00
John Keiser ae6dddfff4
Merge pull request #903 from simdjson/jkeiser/dom-parser-implementation
Move parser state to implementation-specific class
2020-06-04 13:09:57 -07:00
John Keiser 1aab4752e2 Store all parser state in the implementation 2020-06-01 12:15:54 -07:00
John Keiser 6a71b24495 Reuse stored buf and len from parser 2020-06-01 12:14:09 -07:00
John Keiser a3a9bde83e Move DOM parsing into concrete interface implementation 2020-06-01 12:14:09 -07:00
Daniel Lemire 2fe2dd170b
The "competition tests" are being made portable (#907)
* More portable competition

* This will enable SIMDJSON_COMPETITION everywhere by default.

* Minor fixes
2020-05-31 20:34:06 -04:00