Commit Graph

2157 Commits

Author SHA1 Message Date
Daniel Lemire 19cb5d57db
Some minor documentation fixes. (#1177) 2020-09-17 13:17:35 -04:00
Paul Dreik 30b912fc81
fuzz at_pointer
This adds a fuzzer for at_pointer() which recently had a bug.

The #1142 bug had been found with this fuzzer

Also, it polishes the github action job:

    cross pollinate the fuzzer corpora (lets fuzzers reuse results from other fuzzers)
    use github action syntax instead of bash checks
    only run on push if on master
2020-09-16 21:17:43 +02:00
Daniel Lemire 7fc07e2d5e Correcting typo 2020-09-16 11:11:49 -04:00
Daniel Lemire 72c83d9430
This avoids locale-dependent number parsing at the standard library level (#1157)
* This avoids locale-dependent number parsing at the standard library level.

* Adding missing cast.

* Inserting the missing "endif"

* Trial and error.

* Another attempt.

* Another tweak.

* Another fix.

* Restricting it even more.

* Tweaking our symbol checks.

* Somewhat smarter tests.

* Nice comments.

* Minor simplification.

* Adding cerr.
2020-09-15 11:36:18 -04:00
Daniel Lemire bfbac12f76
We were forgetting to check the end bytes at the end of the UTF8 validation. (#1173)
* We were forgetting to check the end bytes at the end of the UTF8 validation.

* Silencing the sanitizer

* Better explanation.
2020-09-15 11:33:09 -04:00
Daniel Lemire 461f7dc9f9
Remove unnecessary comment. 2020-09-14 10:44:10 -04:00
Daniel Lemire 3e5497e2f9
Fixes issue 1170 and makes the usage of minify easier. (#1171)
* Fixes issue 1170 and makes the usage of minify easier.

* This should get the fallback implementation to detect unclosed strings.
2020-09-12 16:20:20 -04:00
Paul Dreik 6ecbcc7c19
add multi implementation fuzzer (#1162)
This adds a fuzzer which parses the same input using all the available implementations (haswell, westmere, fallback on x64).

This should get the otherwise uncovered sourcefiles (mostly fallback) to show up in the fuzz coverage.
For instance, the fallback directory has only one line covered.
As of the 20200909 report, 1866 lines are covered out of 4478.

Also, it will detect if the implementations behave differently:

    by making sure they all succeed, or all error
    turning the parsed data into text again, should produce equal results

While at it, I corrected some minor things:

    clean up building too many variants, run with forced implementation (closes #815 )
    always store crashes as artefacts, good in case the fuzzer finds something
    return value of the fuzzer function should always be 0
    reduce log spam
    introduce max size for the seed corpus and the CI fuzzer
2020-09-11 23:46:22 +02:00
John Keiser 8cef02e8e8
Merge pull request #1167 from simdjson/jkeiser/isolate-checkperf-more
Isolate checkperf more
2020-09-11 11:58:37 -07:00
John Keiser caabfd14b3 Isolate checkperf more 2020-09-11 08:53:41 -07:00
Daniel Lemire 2ffbaa9578
This will isolate the perf checks in CI (#1164)
* This will isolate the perf checks.

* Fixed typo
2020-09-10 18:15:45 -04:00
Daniel Lemire c40aeaec3a
Fix for issue 1147 (#1153)
* This must be a typo

* Improving documentation of the string conversion.

* Minor update.
2020-09-03 13:18:15 -04:00
John Keiser 80e84a3ad0
Merge pull request #1143 from simdjson/jkeiser/classify
Simplify operator classification lookup on Intel
2020-09-03 10:08:14 -07:00
Daniel Lemire 0552335ec1
Fixing the issue. (#1151) 2020-09-02 18:41:59 -04:00
Daniel Lemire 7aea774b21
Adding a tests and a fix for empty strings in at_pointer (#1148)
* Adding a test.

* More tests.
2020-09-02 17:04:56 -04:00
Daniel Lemire 4d4ed92055
Removes 5 KB of tables in the number parsing routine (#1139)
* Removes 5 KB of tables at the expense, and a load, at the expense
of a multiplication and a shift. I have not benchmarked this new
code, but my expectation is that it should be largely performance
neutral. The motivation is to reduce the size of the library slightly.
There is also a matter of elegance.
2020-09-02 15:47:11 -04:00
John Keiser f0ec26992a Remove bit_or (bad perf on Windows) 2020-09-01 08:43:09 -07:00
John Keiser 62e8332b34 Use simd8x64 abstractions in classification 2020-09-01 08:43:09 -07:00
John Keiser 0925f71987 Simplify operator classification lookup on Intel 2020-09-01 08:43:07 -07:00
Daniel Lemire 4c11652808
This must be a typo (#1140) 2020-08-28 20:35:13 -04:00
Daniel Lemire 5b10c38e43
Make parse_many safer. (#1137) 2020-08-20 22:22:46 -04:00
Daniel Lemire 3316df9195
Adding test for issue 1133 and improving documentation (#1134)
* Adding test.

* Saving.

* With exceptions.

* Added extensive tests.

* Better documentation.

* Tweaking CI

* Cleaning.

* Do not assume make.

* Let us make the build verbose

* Reorg

* I do not understand how circle ci works.

* Breaking it up.

* Better syntax.
2020-08-20 14:03:14 -04:00
Daniel Lemire 5d355f1a8b
release candidate (#1132) 2020-08-19 18:12:23 -04:00
John Keiser 2ff91103ca
Remove SIMDJSON_DO_NOT_USE_THREADS_NO_MATTER_WHAT (#1131) 2020-08-19 17:11:13 -04:00
Daniel Lemire a954d50ad4
This improves our documentation. (#1128)
* This improves our documentation.

* Removing tags for doxygen.

* You need a recent cmake remark.
2020-08-19 14:02:08 -04:00
John Keiser 5be4d37aff
Merge pull request #1129 from simdjson/jkeiser/inl
Move inline/* to *-inl.h
2020-08-19 09:52:34 -07:00
John Keiser 1e6c9dbcfa Reamalgamate 2020-08-19 09:16:25 -07:00
John Keiser 708a56872d Move inline/* to *-inl.h 2020-08-19 09:09:31 -07:00
John Keiser 0a2bca3f73
Merge pull request #1101 from simdjson/jkeiser/yakety-sax
Basic SAX interface with benchmarks
2020-08-19 09:05:44 -07:00
Daniel Lemire 1ec710c985 Updating the documentation for hackers. 2020-08-19 10:59:24 -04:00
Daniel Lemire d5a44f9ad4 Merge branch 'master' of github.com:simdjson/simdjson 2020-08-19 10:36:01 -04:00
Daniel Lemire e64dca7144 Tweaking. 2020-08-19 10:35:49 -04:00
John Keiser b2779c35df Fix issue with unsupported unreachable on Windows 2020-08-18 21:35:12 -07:00
John Keiser 9b11e119d4 Make skip_double() comment more explicit 2020-08-18 21:25:03 -07:00
John Keiser 988c62baed Encapsulate significant_digits() 2020-08-18 21:25:03 -07:00
John Keiser eb3e640003 Return bool from compute_float_64 2020-08-18 21:25:03 -07:00
John Keiser 9475b947f5 Return error codes from parse_number 2020-08-18 21:25:03 -07:00
John Keiser 18564f1ae2 Don't benchmark unless haswell is available 2020-08-18 21:25:03 -07:00
John Keiser 638f1deb62 Add DOM tweet reader for comparison 2020-08-18 21:25:03 -07:00
John Keiser 7e74d30f45 [WIP] tweet reader SAX benchmark 2020-08-18 21:25:03 -07:00
John Keiser ce8d0f8135 Add visit_primitive() helper in iterator 2020-08-18 21:25:03 -07:00
John Keiser 872127b722 Move is_array management together with depth 2020-08-18 21:25:03 -07:00
John Keiser e180dc44bc Move container logging into json_iterator 2020-08-18 21:25:03 -07:00
John Keiser 268b8845a9 Document tape_builder 2020-08-18 21:25:03 -07:00
John Keiser 74c47995a3 Document json_iterator 2020-08-18 21:25:03 -07:00
John Keiser 24f5936cbf Give is_array responsibility to json_iterator 2020-08-18 21:25:03 -07:00
John Keiser bdfa8aca28 Separate interface from implementation to make interface clearer 2020-08-18 21:25:03 -07:00
John Keiser 15eb1ad922 Preface visitor methods with visit() 2020-08-18 21:25:03 -07:00
John Keiser 6ec98ee8b1 Add error codes to all things 2020-08-18 21:25:03 -07:00
John Keiser c5862d6de9 Remove empty_object/empty_array 2020-08-18 21:25:03 -07:00