Commit Graph

569 Commits

Author SHA1 Message Date
John Keiser 3b53c6ca47 Use json_iterator as shared state instead of document 2020-10-04 12:47:29 -07:00
John Keiser a90b8fb449 Remove depth tracking from ondemand api 2020-10-04 12:47:29 -07:00
John Keiser 1da509027e Add root number/atom parsing functions 2020-10-04 12:47:29 -07:00
John Keiser 5b96e4761e Remove side-effecting assumption 2020-10-04 12:47:29 -07:00
John Keiser 311ea79238 Fix noexceptions builds 2020-10-04 12:47:29 -07:00
John Keiser cfcb0d4fb7 Use json_iterator in array/object 2020-10-04 12:47:29 -07:00
John Keiser 97d03f3215 token_iterator -> json_iterator 2020-10-04 12:47:29 -07:00
John Keiser 0a6260b1d8 Fix clang 6 compile issue 2020-10-04 12:47:29 -07:00
John Keiser 12caf2510e Mark unused variables 2020-10-04 12:47:29 -07:00
John Keiser a58d2f710d Fix C++11 error 2020-10-04 12:47:29 -07:00
John Keiser 5cf68416d8 Don't bother comparing field names in parserandom 2020-10-04 12:47:29 -07:00
John Keiser ebcb3c6b3b On-demand parse implementation 2020-10-04 12:47:29 -07:00
Daniel Lemire f1841e48b3
Minor fixes to some headers (tweak) (#1198) 2020-10-02 12:29:05 -04:00
Daniel Lemire 9865bb6904
Make it possible to check that an implementation is supported at runtime (#1197)
* Make it possible to check that an implementation is supported at runtime.

* add CI fuzzing on arm 64 bit

This adds fuzzing on drone.io arm64

For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.

Closes: #1188

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* Make it possible to check that an implementation is supported at runtime.

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* We need to accomodate cxxopts

Co-authored-by: Paul Dreik <github@pauldreik.se>
2020-10-02 11:04:51 -04:00
Daniel Lemire 8b5a89c136
Parsing floats with 19 significant digits should be fine. (#1191)
* Parsing floats with 19 significant digits should be fine.

* Adding more tests with very long mantissa.
2020-09-29 19:42:43 -04:00
Daniel Lemire 0e584fa4a5
Attempt to fix issue 1187. (#1192) 2020-09-27 12:04:47 -04:00
Daniel Lemire 60c139a844
Faster and more correct serialization (#1168)
* Adding new files.

* Better.

* Fixing minifier and adding tests.

* Adding benchmarks.

* Including the array header.

* Replacing old stream-based code by the new code.

* Doubling up the itoa.

* Hidden away to_chars in internal namespace.

* Removing the repetitions.

* Documented the atoi functions.

* Tuning the escape sequences.

* Moving the operators off the main namespace.

* Added more tests.

* Tweaking the implementation so that it works with and without exp.

* The string_builder template and mini_formatter class
 are not part of  our public API and are subject to change
 at any time!

* Adding a benchmark and some optimization.

* Cleaning.

* Strictly speaking, this header is needed.
2020-09-23 10:00:39 -04:00
Daniel Lemire f410213003
Improve documentation on padding
- Improves and clarifies the documentation on padding.
 - Use std:: prefix for memcpy, strlen etc.

Related to issues #1175 and #1178
2020-09-23 09:07:14 +02:00
Daniel Lemire 7fc07e2d5e Correcting typo 2020-09-16 11:11:49 -04:00
Daniel Lemire 72c83d9430
This avoids locale-dependent number parsing at the standard library level (#1157)
* This avoids locale-dependent number parsing at the standard library level.

* Adding missing cast.

* Inserting the missing "endif"

* Trial and error.

* Another attempt.

* Another tweak.

* Another fix.

* Restricting it even more.

* Tweaking our symbol checks.

* Somewhat smarter tests.

* Nice comments.

* Minor simplification.

* Adding cerr.
2020-09-15 11:36:18 -04:00
Daniel Lemire bfbac12f76
We were forgetting to check the end bytes at the end of the UTF8 validation. (#1173)
* We were forgetting to check the end bytes at the end of the UTF8 validation.

* Silencing the sanitizer

* Better explanation.
2020-09-15 11:33:09 -04:00
Daniel Lemire 461f7dc9f9
Remove unnecessary comment. 2020-09-14 10:44:10 -04:00
Daniel Lemire 3e5497e2f9
Fixes issue 1170 and makes the usage of minify easier. (#1171)
* Fixes issue 1170 and makes the usage of minify easier.

* This should get the fallback implementation to detect unclosed strings.
2020-09-12 16:20:20 -04:00
John Keiser 80e84a3ad0
Merge pull request #1143 from simdjson/jkeiser/classify
Simplify operator classification lookup on Intel
2020-09-03 10:08:14 -07:00
Daniel Lemire 4d4ed92055
Removes 5 KB of tables in the number parsing routine (#1139)
* Removes 5 KB of tables at the expense, and a load, at the expense
of a multiplication and a shift. I have not benchmarked this new
code, but my expectation is that it should be largely performance
neutral. The motivation is to reduce the size of the library slightly.
There is also a matter of elegance.
2020-09-02 15:47:11 -04:00
John Keiser f0ec26992a Remove bit_or (bad perf on Windows) 2020-09-01 08:43:09 -07:00
John Keiser 62e8332b34 Use simd8x64 abstractions in classification 2020-09-01 08:43:09 -07:00
John Keiser 0925f71987 Simplify operator classification lookup on Intel 2020-09-01 08:43:07 -07:00
John Keiser 9b11e119d4 Make skip_double() comment more explicit 2020-08-18 21:25:03 -07:00
John Keiser 988c62baed Encapsulate significant_digits() 2020-08-18 21:25:03 -07:00
John Keiser eb3e640003 Return bool from compute_float_64 2020-08-18 21:25:03 -07:00
John Keiser 9475b947f5 Return error codes from parse_number 2020-08-18 21:25:03 -07:00
John Keiser 18564f1ae2 Don't benchmark unless haswell is available 2020-08-18 21:25:03 -07:00
John Keiser 638f1deb62 Add DOM tweet reader for comparison 2020-08-18 21:25:03 -07:00
John Keiser 7e74d30f45 [WIP] tweet reader SAX benchmark 2020-08-18 21:25:03 -07:00
John Keiser ce8d0f8135 Add visit_primitive() helper in iterator 2020-08-18 21:25:03 -07:00
John Keiser 872127b722 Move is_array management together with depth 2020-08-18 21:25:03 -07:00
John Keiser e180dc44bc Move container logging into json_iterator 2020-08-18 21:25:03 -07:00
John Keiser 268b8845a9 Document tape_builder 2020-08-18 21:25:03 -07:00
John Keiser 74c47995a3 Document json_iterator 2020-08-18 21:25:03 -07:00
John Keiser 24f5936cbf Give is_array responsibility to json_iterator 2020-08-18 21:25:03 -07:00
John Keiser bdfa8aca28 Separate interface from implementation to make interface clearer 2020-08-18 21:25:03 -07:00
John Keiser 15eb1ad922 Preface visitor methods with visit() 2020-08-18 21:25:03 -07:00
John Keiser 6ec98ee8b1 Add error codes to all things 2020-08-18 21:25:03 -07:00
John Keiser c5862d6de9 Remove empty_object/empty_array 2020-08-18 21:25:03 -07:00
John Keiser 57eb55446f Cache string value locally 2020-08-18 21:25:03 -07:00
John Keiser abd1399a7f Don't check depth at the end (unnecessary check) 2020-08-18 21:25:03 -07:00
John Keiser 57eba21ee5 Fall through goto labels where possible 2020-08-18 21:25:03 -07:00
John Keiser 1b56211a70 Give start_*/end_* error codes 2020-08-18 21:25:03 -07:00
John Keiser d8974d53b2 Keep value around between states 2020-08-18 21:25:03 -07:00
John Keiser 5ecd17f49e Log unconsumed input as an error 2020-08-18 21:25:03 -07:00
John Keiser 6bb99aec3c Merge structural_parser+iterator into json_iterator 2020-08-18 21:25:03 -07:00
John Keiser a67e83e24e Remove parse_* from visitor method names 2020-08-18 21:25:03 -07:00
John Keiser 5a3c3134ec Move value into common place to be shared across states 2020-08-18 21:25:03 -07:00
John Keiser ce8b9ee8c4 Move finish() out to walk_document 2020-08-18 21:25:03 -07:00
John Keiser 970dfc9f67 builder -> visitor, parser -> iter 2020-08-18 21:25:03 -07:00
John Keiser 04d39c0961 Make tape_builder primary entry point to stage 2 2020-08-18 21:25:01 -07:00
John Keiser d6339aa015 Set is_array in builder 2020-08-18 17:41:48 -07:00
John Keiser 11076bf337 containing_scope -> open_container 2020-08-18 17:41:16 -07:00
Daniel Lemire 8a8eea53a2
Prefixing macros (issue 1035) (#1124)
* Renaming partially done.

* More prefixing.

* I thought that this was fixed.

* Missed one.

* Missed a few.

* Missed another one.

* Minor fixes.
2020-08-18 18:25:36 -04:00
John Keiser ab6b7a8044 Make last_structural() helper 2020-08-18 10:12:42 -07:00
John Keiser 07c2fe726e Fail if hash is unclosed at start 2020-08-18 10:10:01 -07:00
Daniel Lemire 17f6d5208f
Documenting and fixing the case where a string is immediately followed by a scalar (#1106)
* Documenting and fixing.

* More cleaning.

* Being a bit cleaner.
2020-08-14 16:19:57 -04:00
John Keiser bee4d7a12b
Merge pull request #1108 from simdjson/jkeiser/ncgdoc
Remove information about nonexistent computed gotos :)
2020-08-12 10:39:20 -07:00
John Keiser 1b69612246 Remove information about nonexistent computed gotos :) 2020-08-10 16:29:24 -07:00
Daniel Lemire 9e93509a56
Fix number parsing (too lenient). (#1107)
* Fix number parsing (too lenient).

* Minor tweak.

* These are Booleans.

* Tweaking test config
2020-08-10 18:10:11 -04:00
John Keiser 5dd625916b Decrement depth just before checking 2020-08-03 23:09:21 -07:00
John Keiser e3d7718cf3 Simplify value switch statements 2020-08-03 23:09:21 -07:00
John Keiser 9eccd7b1fb Inline start_object/start_array 2020-08-03 23:09:21 -07:00
John Keiser 5b05d126b4 Consolidate start_object calls 2020-08-03 23:09:20 -07:00
John Keiser 03aaf189c1 Use parse_primitive (negative perf!) 2020-08-03 23:09:20 -07:00
John Keiser 6ef9395419 "parser.parser" -> "parser.dom_parser" 2020-08-03 23:09:20 -07:00
John Keiser 3a56e13b78 Make parse() a method 2020-08-03 23:09:19 -07:00
John Keiser ec28acba3d De-templatize stage2::structural_parser 2020-08-03 23:09:15 -07:00
John Keiser ee6647ce40 Make parse part of structural_parser 2020-08-03 17:50:51 -07:00
John Keiser 03d54f8f6e Use SAX model for stage 2 2020-08-03 17:50:51 -07:00
John Keiser 553e6d7549 Don't check max depth on startup 2020-08-03 17:49:14 -07:00
John Keiser e6896ee71e Keep current JSON after checking primitive type 2020-08-03 13:30:13 -07:00
John Keiser e6762f9b48 Advance immediately upon evaluating a character 2020-08-03 13:26:56 -07:00
John Keiser 099bb1afef Pass buffer to primitive parse functions 2020-08-03 12:56:35 -07:00
John Keiser 9c33093c91 Name goto labels consistently 2020-08-03 11:47:38 -07:00
John Keiser 634d8038b9 Increment depth before starting a scope 2020-08-03 11:35:46 -07:00
John Keiser ad46154f2f Hardcode document start/end creation 2020-08-03 10:23:32 -07:00
John Keiser fa81068ea8 Simplify structural_parser.start() 2020-08-03 09:49:15 -07:00
John Keiser 70c2a1c9f9 Short-circuit empty objects/arrays 2020-08-03 09:36:18 -07:00
John Keiser 66a68ce264 Return errors immediately instead of using goto 2020-08-02 12:04:12 -07:00
John Keiser 6bca1225e6 Add unlikely in strategic places 2020-08-01 18:19:36 -07:00
John Keiser 379a4e6a01 namespace { -> unnamed namespace 2020-08-01 14:46:23 -07:00
John Keiser 460cfcaf3e Make parse_structurals inline 2020-08-01 14:43:50 -07:00
John Keiser 8e69103822 Remove computed GOTO 2020-08-01 14:43:50 -07:00
John Keiser 2f67dab2b6 Remove extraneous machine addresses 2020-08-01 14:43:50 -07:00
John Keiser bb65ebd8be Remove computed gotos from parse_value 2020-08-01 14:43:50 -07:00
John Keiser c46ea0390c Move { and [ to the start of the switch 2020-08-01 14:43:50 -07:00
John Keiser bc8a6dd2e3 Remove dead code 2020-08-01 14:43:10 -07:00
John Keiser b1478c37f6 Fix arm64 build 2020-08-01 14:43:10 -07:00
John Keiser 4e944a9f3c Eliminate unused functions in fallback 2020-08-01 14:43:10 -07:00
John Keiser c7fa9b5fe8 Make entire implementation namespaces anonymous 2020-08-01 14:43:10 -07:00
John Keiser 65148b123b Put anonymous namespace in front of everything 2020-08-01 14:43:10 -07:00
John Keiser 3acfc0b630
Merge pull request #1045 from simdjson/jkeiser/generic-2
Define namespaces inside generic files
2020-07-24 12:42:39 -07:00
Daniel Lemire 2ce5f69def
fix recently introduced overflow (#1060)
* Various fixes.

* Clearer comment.
2020-07-24 13:59:24 -04:00