Commit Graph

1770 Commits

Author SHA1 Message Date
Paul Dreik 04267e0f6b
add boost.json to benchmark (#1202)
Add boost.json to the benchmark.
It was accepted into boost 20201003, see https://lists.boost.org/Archives/boost/2020/10/250129.php.

The upstream repo is (expected to eventually be migrated to boost): https://github.com/CPPAlliance/json
2020-10-04 10:00:09 +02:00
Daniel Lemire a540e6afc5
Testing on minimalist alpine (linux) images (#1200)
* Tweaking header includes to make it safer.

* Adding the actual tests.

* Fixing my syntax.
2020-10-02 13:32:09 -04:00
Daniel Lemire f1841e48b3
Minor fixes to some headers (tweak) (#1198) 2020-10-02 12:29:05 -04:00
Daniel Lemire 9865bb6904
Make it possible to check that an implementation is supported at runtime (#1197)
* Make it possible to check that an implementation is supported at runtime.

* add CI fuzzing on arm 64 bit

This adds fuzzing on drone.io arm64

For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.

Closes: #1188

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* Make it possible to check that an implementation is supported at runtime.

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* We need to accomodate cxxopts

Co-authored-by: Paul Dreik <github@pauldreik.se>
2020-10-02 11:04:51 -04:00
Paul Dreik e06ddea784
add CI fuzzing on arm 64 bit
This adds fuzzing on drone.io arm64

For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.

Closes: #1188
2020-10-01 10:12:37 +02:00
Daniel Lemire 8b5a89c136
Parsing floats with 19 significant digits should be fine. (#1191)
* Parsing floats with 19 significant digits should be fine.

* Adding more tests with very long mantissa.
2020-09-29 19:42:43 -04:00
Daniel Lemire da093c1982
Fixing "undefined behavior" issue in new fast_itoa functions (#1186)
* Fixing "undefined behavior" issue.

* Simplifying our custom atoi

* Fixing minor bug
2020-09-29 19:17:03 -04:00
Daniel Lemire 048fb6278a
This adds two tests to verify a new fuzzer issue. (So far I could not verify.) (#1194) 2020-09-29 11:45:41 -04:00
Paul Dreik f1b0778f79
add utf8 fuzzer
This enables the utf8 fuzzer, now when #1187 is fixed
2020-09-27 21:11:13 +02:00
Daniel Lemire 0e584fa4a5
Attempt to fix issue 1187. (#1192) 2020-09-27 12:04:47 -04:00
Paul Dreik f44386008c
add minifier fuzzers (#1172)
This adds a minifier fuzzer. There is also an utf-8 fuzzer, but it is disabled until  #1187 is fixed.

Run all fuzzers bug the utf-8 one in the github CI fuzz.
2020-09-26 14:25:00 +02:00
Daniel Lemire 60c139a844
Faster and more correct serialization (#1168)
* Adding new files.

* Better.

* Fixing minifier and adding tests.

* Adding benchmarks.

* Including the array header.

* Replacing old stream-based code by the new code.

* Doubling up the itoa.

* Hidden away to_chars in internal namespace.

* Removing the repetitions.

* Documented the atoi functions.

* Tuning the escape sequences.

* Moving the operators off the main namespace.

* Added more tests.

* Tweaking the implementation so that it works with and without exp.

* The string_builder template and mini_formatter class
 are not part of  our public API and are subject to change
 at any time!

* Adding a benchmark and some optimization.

* Cleaning.

* Strictly speaking, this header is needed.
2020-09-23 10:00:39 -04:00
Daniel Lemire f410213003
Improve documentation on padding
- Improves and clarifies the documentation on padding.
 - Use std:: prefix for memcpy, strlen etc.

Related to issues #1175 and #1178
2020-09-23 09:07:14 +02:00
Daniel Lemire 19cb5d57db
Some minor documentation fixes. (#1177) 2020-09-17 13:17:35 -04:00
Paul Dreik 30b912fc81
fuzz at_pointer
This adds a fuzzer for at_pointer() which recently had a bug.

The #1142 bug had been found with this fuzzer

Also, it polishes the github action job:

    cross pollinate the fuzzer corpora (lets fuzzers reuse results from other fuzzers)
    use github action syntax instead of bash checks
    only run on push if on master
2020-09-16 21:17:43 +02:00
Daniel Lemire 7fc07e2d5e Correcting typo 2020-09-16 11:11:49 -04:00
Daniel Lemire 72c83d9430
This avoids locale-dependent number parsing at the standard library level (#1157)
* This avoids locale-dependent number parsing at the standard library level.

* Adding missing cast.

* Inserting the missing "endif"

* Trial and error.

* Another attempt.

* Another tweak.

* Another fix.

* Restricting it even more.

* Tweaking our symbol checks.

* Somewhat smarter tests.

* Nice comments.

* Minor simplification.

* Adding cerr.
2020-09-15 11:36:18 -04:00
Daniel Lemire bfbac12f76
We were forgetting to check the end bytes at the end of the UTF8 validation. (#1173)
* We were forgetting to check the end bytes at the end of the UTF8 validation.

* Silencing the sanitizer

* Better explanation.
2020-09-15 11:33:09 -04:00
Daniel Lemire 461f7dc9f9
Remove unnecessary comment. 2020-09-14 10:44:10 -04:00
Daniel Lemire 3e5497e2f9
Fixes issue 1170 and makes the usage of minify easier. (#1171)
* Fixes issue 1170 and makes the usage of minify easier.

* This should get the fallback implementation to detect unclosed strings.
2020-09-12 16:20:20 -04:00
Paul Dreik 6ecbcc7c19
add multi implementation fuzzer (#1162)
This adds a fuzzer which parses the same input using all the available implementations (haswell, westmere, fallback on x64).

This should get the otherwise uncovered sourcefiles (mostly fallback) to show up in the fuzz coverage.
For instance, the fallback directory has only one line covered.
As of the 20200909 report, 1866 lines are covered out of 4478.

Also, it will detect if the implementations behave differently:

    by making sure they all succeed, or all error
    turning the parsed data into text again, should produce equal results

While at it, I corrected some minor things:

    clean up building too many variants, run with forced implementation (closes #815 )
    always store crashes as artefacts, good in case the fuzzer finds something
    return value of the fuzzer function should always be 0
    reduce log spam
    introduce max size for the seed corpus and the CI fuzzer
2020-09-11 23:46:22 +02:00
John Keiser 8cef02e8e8
Merge pull request #1167 from simdjson/jkeiser/isolate-checkperf-more
Isolate checkperf more
2020-09-11 11:58:37 -07:00
John Keiser caabfd14b3 Isolate checkperf more 2020-09-11 08:53:41 -07:00
Daniel Lemire 2ffbaa9578
This will isolate the perf checks in CI (#1164)
* This will isolate the perf checks.

* Fixed typo
2020-09-10 18:15:45 -04:00
Daniel Lemire c40aeaec3a
Fix for issue 1147 (#1153)
* This must be a typo

* Improving documentation of the string conversion.

* Minor update.
2020-09-03 13:18:15 -04:00
John Keiser 80e84a3ad0
Merge pull request #1143 from simdjson/jkeiser/classify
Simplify operator classification lookup on Intel
2020-09-03 10:08:14 -07:00
Daniel Lemire 0552335ec1
Fixing the issue. (#1151) 2020-09-02 18:41:59 -04:00
Daniel Lemire 7aea774b21
Adding a tests and a fix for empty strings in at_pointer (#1148)
* Adding a test.

* More tests.
2020-09-02 17:04:56 -04:00
Daniel Lemire 4d4ed92055
Removes 5 KB of tables in the number parsing routine (#1139)
* Removes 5 KB of tables at the expense, and a load, at the expense
of a multiplication and a shift. I have not benchmarked this new
code, but my expectation is that it should be largely performance
neutral. The motivation is to reduce the size of the library slightly.
There is also a matter of elegance.
2020-09-02 15:47:11 -04:00
John Keiser f0ec26992a Remove bit_or (bad perf on Windows) 2020-09-01 08:43:09 -07:00
John Keiser 62e8332b34 Use simd8x64 abstractions in classification 2020-09-01 08:43:09 -07:00
John Keiser 0925f71987 Simplify operator classification lookup on Intel 2020-09-01 08:43:07 -07:00
Daniel Lemire 4c11652808
This must be a typo (#1140) 2020-08-28 20:35:13 -04:00
Daniel Lemire 5b10c38e43
Make parse_many safer. (#1137) 2020-08-20 22:22:46 -04:00
Daniel Lemire 3316df9195
Adding test for issue 1133 and improving documentation (#1134)
* Adding test.

* Saving.

* With exceptions.

* Added extensive tests.

* Better documentation.

* Tweaking CI

* Cleaning.

* Do not assume make.

* Let us make the build verbose

* Reorg

* I do not understand how circle ci works.

* Breaking it up.

* Better syntax.
2020-08-20 14:03:14 -04:00
Daniel Lemire 5d355f1a8b
release candidate (#1132) 2020-08-19 18:12:23 -04:00
John Keiser 2ff91103ca
Remove SIMDJSON_DO_NOT_USE_THREADS_NO_MATTER_WHAT (#1131) 2020-08-19 17:11:13 -04:00
Daniel Lemire a954d50ad4
This improves our documentation. (#1128)
* This improves our documentation.

* Removing tags for doxygen.

* You need a recent cmake remark.
2020-08-19 14:02:08 -04:00
John Keiser 5be4d37aff
Merge pull request #1129 from simdjson/jkeiser/inl
Move inline/* to *-inl.h
2020-08-19 09:52:34 -07:00
John Keiser 1e6c9dbcfa Reamalgamate 2020-08-19 09:16:25 -07:00
John Keiser 708a56872d Move inline/* to *-inl.h 2020-08-19 09:09:31 -07:00
John Keiser 0a2bca3f73
Merge pull request #1101 from simdjson/jkeiser/yakety-sax
Basic SAX interface with benchmarks
2020-08-19 09:05:44 -07:00
Daniel Lemire 1ec710c985 Updating the documentation for hackers. 2020-08-19 10:59:24 -04:00
Daniel Lemire d5a44f9ad4 Merge branch 'master' of github.com:simdjson/simdjson 2020-08-19 10:36:01 -04:00
Daniel Lemire e64dca7144 Tweaking. 2020-08-19 10:35:49 -04:00
John Keiser b2779c35df Fix issue with unsupported unreachable on Windows 2020-08-18 21:35:12 -07:00
John Keiser 9b11e119d4 Make skip_double() comment more explicit 2020-08-18 21:25:03 -07:00
John Keiser 988c62baed Encapsulate significant_digits() 2020-08-18 21:25:03 -07:00
John Keiser eb3e640003 Return bool from compute_float_64 2020-08-18 21:25:03 -07:00
John Keiser 9475b947f5 Return error codes from parse_number 2020-08-18 21:25:03 -07:00