Commit Graph

1986 Commits

Author SHA1 Message Date
John Keiser 4e3b4809ea [WIP] Nascent design doc for on demand 2020-10-04 12:47:29 -07:00
John Keiser a90b8fb449 Remove depth tracking from ondemand api 2020-10-04 12:47:29 -07:00
John Keiser 1da509027e Add root number/atom parsing functions 2020-10-04 12:47:29 -07:00
John Keiser 5b96e4761e Remove side-effecting assumption 2020-10-04 12:47:29 -07:00
John Keiser 311ea79238 Fix noexceptions builds 2020-10-04 12:47:29 -07:00
John Keiser 98be2c91df Fix SAX benchmarks to actually push to vector 2020-10-04 12:47:29 -07:00
John Keiser 2657e5e226 Fix points SAX to actually record points 2020-10-04 12:47:29 -07:00
John Keiser cfcb0d4fb7 Use json_iterator in array/object 2020-10-04 12:47:29 -07:00
John Keiser 97d03f3215 token_iterator -> json_iterator 2020-10-04 12:47:29 -07:00
John Keiser 4065529bdf Don't try to compile Haswell benchmarks on ARM 2020-10-04 12:47:29 -07:00
John Keiser 0a6260b1d8 Fix clang 6 compile issue 2020-10-04 12:47:29 -07:00
John Keiser 12caf2510e Mark unused variables 2020-10-04 12:47:29 -07:00
John Keiser a58d2f710d Fix C++11 error 2020-10-04 12:47:29 -07:00
John Keiser 6be2db8c42 Fix SAX benchmark to actually add tweets 2020-10-04 12:47:29 -07:00
John Keiser 5cf68416d8 Don't bother comparing field names in parserandom 2020-10-04 12:47:29 -07:00
John Keiser ebcb3c6b3b On-demand parse implementation 2020-10-04 12:47:29 -07:00
Paul Dreik 04267e0f6b
add boost.json to benchmark (#1202)
Add boost.json to the benchmark.
It was accepted into boost 20201003, see https://lists.boost.org/Archives/boost/2020/10/250129.php.

The upstream repo is (expected to eventually be migrated to boost): https://github.com/CPPAlliance/json
2020-10-04 10:00:09 +02:00
Daniel Lemire a540e6afc5
Testing on minimalist alpine (linux) images (#1200)
* Tweaking header includes to make it safer.

* Adding the actual tests.

* Fixing my syntax.
2020-10-02 13:32:09 -04:00
Daniel Lemire f1841e48b3
Minor fixes to some headers (tweak) (#1198) 2020-10-02 12:29:05 -04:00
Daniel Lemire 9865bb6904
Make it possible to check that an implementation is supported at runtime (#1197)
* Make it possible to check that an implementation is supported at runtime.

* add CI fuzzing on arm 64 bit

This adds fuzzing on drone.io arm64

For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.

Closes: #1188

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* Make it possible to check that an implementation is supported at runtime.

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* We need to accomodate cxxopts

Co-authored-by: Paul Dreik <github@pauldreik.se>
2020-10-02 11:04:51 -04:00
Paul Dreik e06ddea784
add CI fuzzing on arm 64 bit
This adds fuzzing on drone.io arm64

For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.

Closes: #1188
2020-10-01 10:12:37 +02:00
Daniel Lemire 8b5a89c136
Parsing floats with 19 significant digits should be fine. (#1191)
* Parsing floats with 19 significant digits should be fine.

* Adding more tests with very long mantissa.
2020-09-29 19:42:43 -04:00
Daniel Lemire da093c1982
Fixing "undefined behavior" issue in new fast_itoa functions (#1186)
* Fixing "undefined behavior" issue.

* Simplifying our custom atoi

* Fixing minor bug
2020-09-29 19:17:03 -04:00
Daniel Lemire 048fb6278a
This adds two tests to verify a new fuzzer issue. (So far I could not verify.) (#1194) 2020-09-29 11:45:41 -04:00
Paul Dreik f1b0778f79
add utf8 fuzzer
This enables the utf8 fuzzer, now when #1187 is fixed
2020-09-27 21:11:13 +02:00
Daniel Lemire 0e584fa4a5
Attempt to fix issue 1187. (#1192) 2020-09-27 12:04:47 -04:00
Paul Dreik f44386008c
add minifier fuzzers (#1172)
This adds a minifier fuzzer. There is also an utf-8 fuzzer, but it is disabled until  #1187 is fixed.

Run all fuzzers bug the utf-8 one in the github CI fuzz.
2020-09-26 14:25:00 +02:00
Daniel Lemire 60c139a844
Faster and more correct serialization (#1168)
* Adding new files.

* Better.

* Fixing minifier and adding tests.

* Adding benchmarks.

* Including the array header.

* Replacing old stream-based code by the new code.

* Doubling up the itoa.

* Hidden away to_chars in internal namespace.

* Removing the repetitions.

* Documented the atoi functions.

* Tuning the escape sequences.

* Moving the operators off the main namespace.

* Added more tests.

* Tweaking the implementation so that it works with and without exp.

* The string_builder template and mini_formatter class
 are not part of  our public API and are subject to change
 at any time!

* Adding a benchmark and some optimization.

* Cleaning.

* Strictly speaking, this header is needed.
2020-09-23 10:00:39 -04:00
Daniel Lemire f410213003
Improve documentation on padding
- Improves and clarifies the documentation on padding.
 - Use std:: prefix for memcpy, strlen etc.

Related to issues #1175 and #1178
2020-09-23 09:07:14 +02:00
Daniel Lemire 19cb5d57db
Some minor documentation fixes. (#1177) 2020-09-17 13:17:35 -04:00
Paul Dreik 30b912fc81
fuzz at_pointer
This adds a fuzzer for at_pointer() which recently had a bug.

The #1142 bug had been found with this fuzzer

Also, it polishes the github action job:

    cross pollinate the fuzzer corpora (lets fuzzers reuse results from other fuzzers)
    use github action syntax instead of bash checks
    only run on push if on master
2020-09-16 21:17:43 +02:00
Daniel Lemire 7fc07e2d5e Correcting typo 2020-09-16 11:11:49 -04:00
Daniel Lemire 72c83d9430
This avoids locale-dependent number parsing at the standard library level (#1157)
* This avoids locale-dependent number parsing at the standard library level.

* Adding missing cast.

* Inserting the missing "endif"

* Trial and error.

* Another attempt.

* Another tweak.

* Another fix.

* Restricting it even more.

* Tweaking our symbol checks.

* Somewhat smarter tests.

* Nice comments.

* Minor simplification.

* Adding cerr.
2020-09-15 11:36:18 -04:00
Daniel Lemire bfbac12f76
We were forgetting to check the end bytes at the end of the UTF8 validation. (#1173)
* We were forgetting to check the end bytes at the end of the UTF8 validation.

* Silencing the sanitizer

* Better explanation.
2020-09-15 11:33:09 -04:00
Daniel Lemire 461f7dc9f9
Remove unnecessary comment. 2020-09-14 10:44:10 -04:00
Daniel Lemire 3e5497e2f9
Fixes issue 1170 and makes the usage of minify easier. (#1171)
* Fixes issue 1170 and makes the usage of minify easier.

* This should get the fallback implementation to detect unclosed strings.
2020-09-12 16:20:20 -04:00
Paul Dreik 6ecbcc7c19
add multi implementation fuzzer (#1162)
This adds a fuzzer which parses the same input using all the available implementations (haswell, westmere, fallback on x64).

This should get the otherwise uncovered sourcefiles (mostly fallback) to show up in the fuzz coverage.
For instance, the fallback directory has only one line covered.
As of the 20200909 report, 1866 lines are covered out of 4478.

Also, it will detect if the implementations behave differently:

    by making sure they all succeed, or all error
    turning the parsed data into text again, should produce equal results

While at it, I corrected some minor things:

    clean up building too many variants, run with forced implementation (closes #815 )
    always store crashes as artefacts, good in case the fuzzer finds something
    return value of the fuzzer function should always be 0
    reduce log spam
    introduce max size for the seed corpus and the CI fuzzer
2020-09-11 23:46:22 +02:00
John Keiser 8cef02e8e8
Merge pull request #1167 from simdjson/jkeiser/isolate-checkperf-more
Isolate checkperf more
2020-09-11 11:58:37 -07:00
John Keiser caabfd14b3 Isolate checkperf more 2020-09-11 08:53:41 -07:00
Daniel Lemire 2ffbaa9578
This will isolate the perf checks in CI (#1164)
* This will isolate the perf checks.

* Fixed typo
2020-09-10 18:15:45 -04:00
Daniel Lemire c40aeaec3a
Fix for issue 1147 (#1153)
* This must be a typo

* Improving documentation of the string conversion.

* Minor update.
2020-09-03 13:18:15 -04:00
John Keiser 80e84a3ad0
Merge pull request #1143 from simdjson/jkeiser/classify
Simplify operator classification lookup on Intel
2020-09-03 10:08:14 -07:00
Daniel Lemire 0552335ec1
Fixing the issue. (#1151) 2020-09-02 18:41:59 -04:00
Daniel Lemire 7aea774b21
Adding a tests and a fix for empty strings in at_pointer (#1148)
* Adding a test.

* More tests.
2020-09-02 17:04:56 -04:00
Daniel Lemire 4d4ed92055
Removes 5 KB of tables in the number parsing routine (#1139)
* Removes 5 KB of tables at the expense, and a load, at the expense
of a multiplication and a shift. I have not benchmarked this new
code, but my expectation is that it should be largely performance
neutral. The motivation is to reduce the size of the library slightly.
There is also a matter of elegance.
2020-09-02 15:47:11 -04:00
John Keiser f0ec26992a Remove bit_or (bad perf on Windows) 2020-09-01 08:43:09 -07:00
John Keiser 62e8332b34 Use simd8x64 abstractions in classification 2020-09-01 08:43:09 -07:00
John Keiser 0925f71987 Simplify operator classification lookup on Intel 2020-09-01 08:43:07 -07:00
Daniel Lemire 4c11652808
This must be a typo (#1140) 2020-08-28 20:35:13 -04:00
Daniel Lemire 5b10c38e43
Make parse_many safer. (#1137) 2020-08-20 22:22:46 -04:00