Commit Graph

549 Commits

Author SHA1 Message Date
John Keiser 03aaf189c1 Use parse_primitive (negative perf!) 2020-08-03 23:09:20 -07:00
John Keiser 6ef9395419 "parser.parser" -> "parser.dom_parser" 2020-08-03 23:09:20 -07:00
John Keiser 3a56e13b78 Make parse() a method 2020-08-03 23:09:19 -07:00
John Keiser ec28acba3d De-templatize stage2::structural_parser 2020-08-03 23:09:15 -07:00
John Keiser ee6647ce40 Make parse part of structural_parser 2020-08-03 17:50:51 -07:00
John Keiser 03d54f8f6e Use SAX model for stage 2 2020-08-03 17:50:51 -07:00
John Keiser 553e6d7549 Don't check max depth on startup 2020-08-03 17:49:14 -07:00
John Keiser e6896ee71e Keep current JSON after checking primitive type 2020-08-03 13:30:13 -07:00
John Keiser e6762f9b48 Advance immediately upon evaluating a character 2020-08-03 13:26:56 -07:00
John Keiser 099bb1afef Pass buffer to primitive parse functions 2020-08-03 12:56:35 -07:00
John Keiser 9c33093c91 Name goto labels consistently 2020-08-03 11:47:38 -07:00
John Keiser 634d8038b9 Increment depth before starting a scope 2020-08-03 11:35:46 -07:00
John Keiser ad46154f2f Hardcode document start/end creation 2020-08-03 10:23:32 -07:00
John Keiser fa81068ea8 Simplify structural_parser.start() 2020-08-03 09:49:15 -07:00
John Keiser 70c2a1c9f9 Short-circuit empty objects/arrays 2020-08-03 09:36:18 -07:00
John Keiser 66a68ce264 Return errors immediately instead of using goto 2020-08-02 12:04:12 -07:00
John Keiser 6bca1225e6 Add unlikely in strategic places 2020-08-01 18:19:36 -07:00
John Keiser 379a4e6a01 namespace { -> unnamed namespace 2020-08-01 14:46:23 -07:00
John Keiser 460cfcaf3e Make parse_structurals inline 2020-08-01 14:43:50 -07:00
John Keiser 8e69103822 Remove computed GOTO 2020-08-01 14:43:50 -07:00
John Keiser 2f67dab2b6 Remove extraneous machine addresses 2020-08-01 14:43:50 -07:00
John Keiser bb65ebd8be Remove computed gotos from parse_value 2020-08-01 14:43:50 -07:00
John Keiser c46ea0390c Move { and [ to the start of the switch 2020-08-01 14:43:50 -07:00
John Keiser bc8a6dd2e3 Remove dead code 2020-08-01 14:43:10 -07:00
John Keiser b1478c37f6 Fix arm64 build 2020-08-01 14:43:10 -07:00
John Keiser 4e944a9f3c Eliminate unused functions in fallback 2020-08-01 14:43:10 -07:00
John Keiser c7fa9b5fe8 Make entire implementation namespaces anonymous 2020-08-01 14:43:10 -07:00
John Keiser 65148b123b Put anonymous namespace in front of everything 2020-08-01 14:43:10 -07:00
John Keiser 3acfc0b630
Merge pull request #1045 from simdjson/jkeiser/generic-2
Define namespaces inside generic files
2020-07-24 12:42:39 -07:00
Daniel Lemire 2ce5f69def
fix recently introduced overflow (#1060)
* Various fixes.

* Clearer comment.
2020-07-24 13:59:24 -04:00
John Keiser 7d347be902 Untangle amalgamated headers 2020-07-24 02:56:41 -07:00
John Keiser a456d78fe0 really_inline more things 2020-07-24 02:56:41 -07:00
John Keiser bf67c967d6 Inline jsoncharutils per-implementation 2020-07-24 02:56:41 -07:00
John Keiser 44b7a7145c Include bitmanip/simd everywhere 2020-07-24 02:56:39 -07:00
John Keiser 3867ee71ed Include files where they are used 2020-07-24 02:56:37 -07:00
John Keiser 464f4813e3 Define namespaces inside generic files 2020-07-24 02:56:36 -07:00
John Keiser af8b52e7e8 Target region for entire compilation of an implementation 2020-07-24 02:48:25 -07:00
Daniel Lemire 4beb2ed507
Make simd8 64 uncopyable and other Visual Studio optimizations (#1031)
* Working on making simd8x64 immutable


* Even less invasive
2020-07-21 18:11:21 -04:00
Daniel Lemire e9c91a1ce2
lookup4 (new UTF-8 validation) (#993)
* lookup4

* Self-document lookup4 and clean up extra bits

* Maintenance, to match against upcoming PR.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
Co-authored-by: John Keiser <john@johnkeiser.com>
2020-07-20 18:20:07 -04:00
Daniel Lemire 534632dc52
Minor tweak on number parsing (#1041)
* Tweak.
2020-07-17 12:14:10 -04:00
Daniel Lemire 8bf5f3d869
Trying to document more carefully the use of memcpy. (#1038)
* Trying to document more carefully the use of memcpy.

* Patching spelling.
2020-07-17 09:58:34 -04:00
Vitaly Baranov 1e4aa116e5
Choose active implementation only once. (#1044) 2020-07-16 18:17:56 -04:00
John Keiser 90cc1411da
Merge pull request #1018 from simdjson/jkeiser/simplify-integer-parse
Remove some branches from number parsing
2020-07-16 12:21:43 -07:00
Vitaly Baranov 6bd64c6873
Fix clang warning -Wused-but-marked-unused. (#1042)
* Fix clang warning -Wused-but-marked-unused.

* Fix build.
2020-07-15 13:28:51 -04:00
Vitaly Baranov a2f0933d01
Fix undefined behavior: load of misaligned address in atomparsing.h (#1037) 2020-07-13 08:46:52 -04:00
John Keiser 6797a6ab56 Use const uint8_t * in number parsing 2020-07-10 09:17:23 -07:00
John Keiser 86b5928f5e Use parse_digit for decimal and exp parsing as well 2020-07-10 09:16:43 -07:00
John Keiser 6dbd15aa71 Move SIMDJSON_SKIPNUMBERPARSING method out 2020-07-09 15:55:10 -07:00
John Keiser 22e5b081c4 Remove is_integer 2020-07-09 15:55:10 -07:00
John Keiser d848f33c48 Simplify integer parsing 2020-07-09 15:55:10 -07:00
John Keiser c64367536d Eliminate "found_minus" parse_number() parameter 2020-07-09 15:55:09 -07:00
John Keiser fc0102b079 Use common parse_digit() funtion in int parsing 2020-07-09 15:33:22 -07:00
Daniel Lemire d0ce2f0b5a
Fixing clang under visual studio (#1028)
* Lots of fixes

* Removing some lambdas

* Removing some functional programming.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-07-06 18:58:19 -04:00
John Keiser 82fb45aa2a
Merge pull request #990 from simdjson/jkeiser/fast-large-integer
Don't reparse large integers
2020-07-01 12:49:43 -07:00
Daniel Lemire 74870a8189
Fixing issue 1013. (#1016)
* Fixing issue 1013.

* Bumping to 0.4.6

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-07-01 14:14:51 -04:00
John Keiser 7a9f6b48f4 Replace TODOs with comments about why we DIDNTDO 2020-07-01 10:31:10 -07:00
John Keiser d3c089130d Check overflow without reparsing integers 2020-07-01 09:51:48 -07:00
John Keiser e0f3060527 Add negative/positive integer writing 2020-07-01 09:51:48 -07:00
John Keiser 4c1256acc4 Reduce nesting somewhat with different if() order 2020-07-01 09:51:48 -07:00
John Keiser 85f6f5bd29 Use macros to remove #ifdefs on every write 2020-07-01 09:51:48 -07:00
John Keiser 4d9eac663a Use a macro to get rid of #ifdefs on each invalid number check 2020-07-01 09:51:48 -07:00
Daniel Lemire 0ef4d90ad0
Fix for issue 1014. (#1015)
* Fix for issue 1014.

* Explanation.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-30 19:36:26 -04:00
Daniel Lemire 444ec4ad27 Stupid me 2020-06-26 19:29:28 -04:00
Daniel Lemire bb5ce007e6 Something better. 2020-06-26 19:03:28 -04:00
Daniel Lemire deaa74d378 Re-enabling tests generally. 2020-06-26 18:57:34 -04:00
Daniel Lemire b6997a56df Patching things up and adding tests. 2020-06-26 12:15:16 -04:00
Brendan Knapp 41f33ecbb9 Permit 32-bit GCC compilation 2020-06-25 17:07:17 -07:00
John Keiser b4b968ff44 Fix #953 2020-06-23 09:53:24 -07:00
Daniel Lemire 2bb101bd19 Code reformatting. 2020-06-22 16:50:57 -04:00
Daniel Lemire a6cbf1f922 Going generic... 2020-06-22 16:25:11 -04:00
Daniel Lemire b836164a38 Fix. 2020-06-22 02:12:49 +00:00
Daniel Lemire 058507badf Putting back the loop 2020-06-21 21:21:49 -04:00
Daniel Lemire ad40e90790 Patching. 2020-06-21 20:14:00 -04:00
Daniel Lemire 066269153e Explaining decision. 2020-06-21 18:02:34 -04:00
Daniel Lemire 5dbcdf1484 Ok 2020-06-21 17:52:30 -04:00
Daniel Lemire f03a6ab5a4 Tweaking. 2020-06-21 17:39:24 -04:00
Daniel Lemire 5dc07ed295 It builds. 2020-06-21 17:20:33 -04:00
Daniel Lemire 064d4255d5 Ok. 2020-06-21 17:09:06 -04:00
Daniel Lemire 04139eb82e Ok. 2020-06-21 17:05:55 -04:00
John Keiser 76c9f4f5a6
Merge pull request #941 from simdjson/jkeiser/forgot
Remove unnecessary functions
2020-06-17 09:09:28 -07:00
Daniel Lemire 942ef3b7f2
Merge pull request #939 from simdjson/dlemire/lookup3
Introducing lookup3 (UTF-8 validation).
2020-06-17 11:19:09 -04:00
John Keiser f8f36c085c Remove unnecessary functions 2020-06-17 07:11:53 -07:00
John Keiser 7339f67dd7
Merge pull request #462 from simdjson/jkeiser/if-backslash
Wrap backslash processing in a branch
2020-06-17 07:07:58 -07:00
Daniel Lemire 71a889ed73 Introducing lookup3 (UTF-8 validation). 2020-06-16 19:08:25 -04:00
John Keiser 610c79fbf3 Don't use backslash branch on ARM 2020-06-13 07:51:28 -07:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire bd2d0f769f
One unlikely too many (#930) 2020-06-12 17:58:10 -04:00
John Keiser 664b03bb13 Short circuit find escapes if there is a backslash 2020-06-12 10:10:35 -07:00
John Keiser bbd61eb13f Let tape writing be put in a register 2020-06-12 09:18:20 -07:00
John Keiser e15e1e253d peek_char -> peek_next_char 2020-06-12 09:10:16 -07:00
Daniel Lemire a6e4933d93 Exposing the string minifier. 2020-06-11 13:07:18 -04:00
John Keiser ea08e7d192 Remove unused extra copy of find_next_document_index 2020-06-09 17:52:13 -07:00
John Keiser d178e089a6 Stop caching current structural, keep current index around instead of
next
2020-06-08 15:21:54 -07:00
John Keiser 5f00b37e21 Stop caching the buffer index 2020-06-08 15:21:54 -07:00
John Keiser 8a8792d47f Remove most uses of current_char() 2020-06-08 15:21:54 -07:00
John Keiser 59d9bc9e48 Store the pointer to the next structural instead of base
structural_indexes and an index
2020-06-08 15:21:54 -07:00
John Keiser 8793dd3ceb Don't store len locally 2020-06-08 15:21:54 -07:00
John Keiser 48062380fa Move parser to structural_iterator 2020-06-08 15:21:54 -07:00
John Keiser 3636aa5522 Extend structural_parser from structural_iterator 2020-06-08 15:21:54 -07:00
John Keiser a1aea4588f Move document stream state to implementation 2020-06-08 15:21:54 -07:00
John Keiser 1d4fffb799 Fix fallback implementation 2020-06-08 15:21:52 -07:00
John Keiser 6f90f5dc5f Remove templating from finish() method 2020-06-08 15:20:56 -07:00
John Keiser 9dd6972d26 Remove impossible checks, add EMPTY check to normal parser 2020-06-08 15:20:56 -07:00
John Keiser d731a7d52c Privatize structural_parser 2020-06-08 15:20:56 -07:00
John Keiser 059468b74e Eliminate streaming_structural_parser subclass with templates 2020-06-08 15:20:56 -07:00
John Keiser 5e69fb782a Call a function to parse structurals 2020-06-08 15:20:56 -07:00
John Keiser a5beffda78 Remove streaming_structural_parser.h 2020-06-08 15:20:56 -07:00
John Keiser 7de7ce5fdc Move document stream state to implementation 2020-06-08 15:20:56 -07:00
John Keiser 0dbda65e44 Fix fallback implementation 2020-06-08 14:52:23 -07:00
John Keiser d43a4e9df9 Remove SUCCESS_AND_HAS_MORE (internal only value) 2020-06-07 16:20:55 -07:00
John Keiser ef63a84a3e Move document stream state to implementation 2020-06-07 16:20:44 -07:00
John Keiser 8c16ba372e Acknowledge that we always have a remainder 2020-06-06 16:46:38 -07:00
John Keiser 9be4a17687 Separate definition from declaration, arrange top down 2020-06-06 16:46:38 -07:00
John Keiser ed0c815735 Move unclosed array check to stage 2 2020-06-05 12:39:13 -07:00
Daniel Lemire 7a69da16e4
Fixing issue 906 (#912)
* Fixing issue 906

* Safe patching.

* Now with explanations.

* Bumping up memory allocation.

* Putting the patch back.

* fallback fixes.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-05 15:37:09 -04:00
Daniel Lemire 52f44de257
This introduces a tiny simplification in number parsing. (#910)
* This introduces a tiny simplification in number parsing.

* Removing unnecessary function.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-04 17:13:02 -04:00
John Keiser b75fa26dc1 Move containing_scope and ret_address to .cpp 2020-06-01 12:15:55 -07:00
John Keiser 3d22a2d845 One weird trick: set a bogus error value in the parser impl
This makes us faster under both gcc and clang somehow.
2020-06-01 12:15:55 -07:00
John Keiser 1aab4752e2 Store all parser state in the implementation 2020-06-01 12:15:54 -07:00
John Keiser 86f8a4a9d2 Don't set parser.valid or parser.error
This regresses performance and is ONLY here because the next
two commits are here; this lets us see the impact of removing
parser.error separately from the impact of the next commit.
2020-06-01 12:14:09 -07:00
John Keiser db2cb061cb Remove on_error function
Solely here to make the next patch smaller and more isolatable
2020-06-01 12:14:09 -07:00
John Keiser 6a71b24495 Reuse stored buf and len from parser 2020-06-01 12:14:09 -07:00
John Keiser 84712a8bbc Store buf and len in parser implementation 2020-06-01 12:14:09 -07:00
John Keiser b86fb95306 Rename doc_parser -> parser 2020-06-01 12:14:09 -07:00
John Keiser a3a9bde83e Move DOM parsing into concrete interface implementation 2020-06-01 12:14:09 -07:00
Daniel Lemire 12150baa5e
Using just ASCII. (#899)
* Using just ASCII.

* Let us prune checkperf.

* Moving the description of lookup2 to the HACKING.md file.
2020-05-21 21:59:06 -04:00
John Keiser 4551e60f8b Don't write start object/array until the end 2020-05-21 14:28:47 -07:00
John Keiser 5651fbedc4 Add logging to stage 2 2020-05-21 09:47:19 -07:00
Daniel Lemire 40d57da83c
fixes issue 891 (#893) 2020-05-20 11:54:53 -04:00
John Keiser e6c9dfbd91 Make include files more fine-grained 2020-05-19 14:42:04 -07:00
John Keiser 64abc3e86c Include top-level .h files outside #if statements 2020-05-19 13:33:14 -07:00
John Keiser 7ad4020829 Make main compilation chunks into .cpp files 2020-05-19 13:32:35 -07:00
John Keiser 72ab0d11ff Move stage 1 and 2 files to their own directories 2020-05-19 13:30:34 -07:00
John Keiser 4ea866f050 Move stage2 classes into their own files 2020-05-19 13:30:34 -07:00
John Keiser a476531524 Share ref_address everywhere it's used 2020-05-19 13:30:34 -07:00
John Keiser dbb3316511 Move current_string_buf_loc to stage 2 2020-05-11 06:11:32 -07:00
John Keiser cd6f204c77 Move write_tape() to stage 2 code 2020-05-11 06:09:48 -07:00
John Keiser 269131ed21 Move on_number_* to stage 2 code 2020-05-11 06:04:54 -07:00
John Keiser 65d784e88e Move on_start/end_string to stage 2 code 2020-05-11 05:49:40 -07:00
John Keiser 35afb6cae0 Move on_error, on_success to stage 2 code 2020-05-11 05:46:18 -07:00
John Keiser 27bce09be8 Consolidate start_scope/end_scope 2020-05-11 05:40:02 -07:00
John Keiser 4f25b6ac0c Move on_end_* to stage 2 code 2020-05-11 05:34:49 -07:00
John Keiser 3d5ed1a7e3 Move on_start_* to stage 2 code 2020-05-11 05:30:35 -07:00
John Keiser a03115a4a6 Move end_scope to stage 2 code 2020-05-11 05:24:12 -07:00
John Keiser 7219d28a31 Call end_scope directly from stage 2 code 2020-05-11 05:20:04 -07:00
John Keiser 0875bce68f Don't pass depth to on_end_* 2020-05-11 05:15:39 -07:00
John Keiser 54fe302907 Don't pass depth to end_scope 2020-05-11 05:06:41 -07:00
John Keiser edaa8f811f Move on_start_* depth management to stage 2 code 2020-05-11 05:03:25 -07:00
John Keiser 2c8fd109de Move increment_count to stage 2 2020-05-11 04:58:50 -07:00