Commit Graph

486 Commits

Author SHA1 Message Date
Daniel Lemire 88da62ba09 Better documentation in the code. 2020-06-26 13:02:12 -04:00
Daniel Lemire b6997a56df Patching things up and adding tests. 2020-06-26 12:15:16 -04:00
Daniel Lemire 2956bce047 Minor fixes to avoid 32-bit warnings. 2020-06-25 21:12:26 -04:00
Brendan Knapp 41f33ecbb9 Permit 32-bit GCC compilation 2020-06-25 17:07:17 -07:00
Daniel Lemire 86241e2871
Merge pull request #987 from simdjson/issue985
Removing optional since it is not C++11, and it is not used
2020-06-25 11:04:36 -04:00
Daniel Lemire 1b63a9a9b5 Removing optional since it is not C++11 2020-06-25 10:25:57 -04:00
Daniel Lemire 32348c2b0b Elaborating. 2020-06-25 10:14:29 -04:00
Daniel Lemire 5e690c5d04 Fixing the string_view issue. 2020-06-25 10:02:10 -04:00
Daniel Lemire e01f1434fb Bumping up the version number 2020-06-23 20:55:52 -04:00
John Keiser 187084ce46
Merge pull request #970 from simdjson/jkeiser/singleheader-tests
Make singleheader tests be test-only
2020-06-23 17:07:03 -07:00
Daniel Lemire 544fa57641 Damn merge conflicts. 2020-06-23 19:15:47 -04:00
John Keiser d9929edbc1 Run -Weffc++ in CI 2020-06-23 13:44:25 -07:00
Daniel Lemire b84a3a0230
Merge branch 'master' into issue961 2020-06-23 14:33:06 -04:00
Daniel Lemire 49d70232f8
Merge pull request #969 from simdjson/dlemire/minor_pre0.4_cleaning
Very minor cleaning.
2020-06-23 14:30:47 -04:00
John Keiser 257089884f
Merge pull request #958 from simdjson/jkeiser/is
Make simdjson_result<element>.is() return bool
2020-06-23 09:51:37 -07:00
John Keiser c650ea9765
Merge pull request #960 from simdjson/jkeiser/idiomatic-get
Convert simdjson to use .get()
2020-06-23 09:49:41 -07:00
John Keiser e369d45b9c Fix non-compileable examples 2020-06-23 09:48:17 -07:00
John Keiser 2d84b6f6d9 Make simdjson_result<element>.is() return bool 2020-06-23 09:09:24 -07:00
John Keiser eef1171944
Merge pull request #954 from simdjson/jkeiser/parse-many-result
Return error from parse_many
2020-06-23 09:06:20 -07:00
Daniel Lemire f1a03bfb04 Very minor cleaning. 2020-06-23 11:05:58 -04:00
Daniel Lemire 696b0e29e4 Fixing issue 961 2020-06-23 10:47:32 -04:00
Daniel Lemire 33e003616d Fixing the name of the variable 2020-06-22 16:29:38 -04:00
Daniel Lemire bf03d77ab9 Passing by value the string_view 2020-06-22 16:28:35 -04:00
Daniel Lemire d6f056f266 Fixing documentation issues. 2020-06-22 16:17:11 -04:00
Daniel Lemire a76c67c19f Fixing... 2020-06-22 15:57:54 -04:00
John Keiser 1ff55c2729 Replace auto [x,error] with .get() everywhere 2020-06-21 16:26:59 -07:00
Daniel Lemire 5dbcdf1484 Ok 2020-06-21 17:52:30 -04:00
Daniel Lemire f03a6ab5a4 Tweaking. 2020-06-21 17:39:24 -04:00
John Keiser 6fa5abcd7e Replace x.get<T>() with x.get(v) or T(x) 2020-06-21 14:36:38 -07:00
Daniel Lemire 5dc07ed295 It builds. 2020-06-21 17:20:33 -04:00
John Keiser 1b1a122b1f Fix copy constructor issue on older gcc 2020-06-21 12:06:14 -07:00
John Keiser ae1bd891e7 Remove deprecated uses of parse_many 2020-06-21 11:19:06 -07:00
John Keiser 9899e5021d Allow use of document_stream with tie() 2020-06-20 21:15:05 -07:00
John Keiser 94440e0170 Return simdjson_result from load_many/parse_many 2020-06-20 20:51:53 -07:00
John Keiser a7fc7d4ffb Switch from get(v,e) to e = get(v) 2020-06-20 17:57:09 -07:00
John Keiser 56e2b38048 Add bool result from tie()/get(), get<T>(T&,error_code&) 2020-06-20 17:55:46 -07:00
John Keiser 1d8c2d6c22 Make get_xxx the primary functions 2020-06-20 13:29:12 -07:00
John Keiser 0b8c357eff Add get_X and is_X methods 2020-06-19 13:27:33 -07:00
John Keiser 05bc664c11 Don't extend from tape_ref in public classes 2020-06-19 13:25:52 -07:00
Daniel Lemire c13c2650a2
Merge pull request #940 from simdjson/issue938
Verifying (and fixing) issue 938
2020-06-18 18:25:31 -04:00
Daniel Lemire 2f6091419f
Merge pull request #944 from simdjson/issue680
Document the complexity of array.at
2020-06-18 18:24:08 -04:00
Daniel Lemire 2022dd7d74
Merge pull request #945 from simdjson/issue678
Fixing issue 678
2020-06-18 18:23:56 -04:00
Daniel Lemire ef688a74fe Minor tweak to the documentation. 2020-06-18 18:18:12 -04:00
Daniel Lemire 04a19f9813 Fixes https://github.com/simdjson/simdjson/issues/937 2020-06-17 18:06:13 -04:00
Daniel Lemire 2cbc591c9d Fixing issue 678 2020-06-17 16:17:17 -04:00
Daniel Lemire 3586fc4910 Fix for issue 680 2020-06-17 18:49:22 +00:00
Daniel Lemire 0b9df6d8c4 It turns out that we need fairly complicated logic. 2020-06-17 15:17:10 +00:00
Daniel Lemire 803b0c4bdb Light touch. 2020-06-17 11:00:13 -04:00
Daniel Lemire 0d4e501239 Fixing the bug. 2020-06-17 10:06:16 -04:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire 4dfbf98e4e
Using a worker instead of a thread per batch (#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.

To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.

  This fixes our parse_stream benchmark which is just busted.
  This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.

Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.

Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 16:51:18 -04:00
John Keiser bbd61eb13f Let tape writing be put in a register 2020-06-12 09:18:20 -07:00
Daniel Lemire a6e4933d93 Exposing the string minifier. 2020-06-11 13:07:18 -04:00
John Keiser fe01da077e Make threaded version work again 2020-06-07 16:21:00 -07:00
John Keiser d43a4e9df9 Remove SUCCESS_AND_HAS_MORE (internal only value) 2020-06-07 16:20:55 -07:00
John Keiser ef63a84a3e Move document stream state to implementation 2020-06-07 16:20:44 -07:00
Daniel Lemire 7a69da16e4
Fixing issue 906 (#912)
* Fixing issue 906

* Safe patching.

* Now with explanations.

* Bumping up memory allocation.

* Putting the patch back.

* fallback fixes.

Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-05 15:37:09 -04:00
John Keiser b75fa26dc1 Move containing_scope and ret_address to .cpp 2020-06-01 12:15:55 -07:00
John Keiser 3d22a2d845 One weird trick: set a bogus error value in the parser impl
This makes us faster under both gcc and clang somehow.
2020-06-01 12:15:55 -07:00
John Keiser 1aab4752e2 Store all parser state in the implementation 2020-06-01 12:15:54 -07:00
John Keiser 6a71b24495 Reuse stored buf and len from parser 2020-06-01 12:14:09 -07:00
John Keiser a3a9bde83e Move DOM parsing into concrete interface implementation 2020-06-01 12:14:09 -07:00
Daniel Lemire 40d57da83c
fixes issue 891 (#893) 2020-05-20 11:54:53 -04:00
John Keiser e6c9dfbd91 Make include files more fine-grained 2020-05-19 14:42:04 -07:00
John Keiser 7ad4020829 Make main compilation chunks into .cpp files 2020-05-19 13:32:35 -07:00
John Keiser a476531524 Share ref_address everywhere it's used 2020-05-19 13:30:34 -07:00
Daniel Lemire e03c5e9f23
We should guard the include (#881) 2020-05-13 20:02:46 -04:00
John Keiser dbb3316511 Move current_string_buf_loc to stage 2 2020-05-11 06:11:32 -07:00
John Keiser cd6f204c77 Move write_tape() to stage 2 code 2020-05-11 06:09:48 -07:00
John Keiser 269131ed21 Move on_number_* to stage 2 code 2020-05-11 06:04:54 -07:00
John Keiser 65d784e88e Move on_start/end_string to stage 2 code 2020-05-11 05:49:40 -07:00
John Keiser 35afb6cae0 Move on_error, on_success to stage 2 code 2020-05-11 05:46:18 -07:00
John Keiser 4f25b6ac0c Move on_end_* to stage 2 code 2020-05-11 05:34:49 -07:00
John Keiser 3d5ed1a7e3 Move on_start_* to stage 2 code 2020-05-11 05:30:35 -07:00
John Keiser a03115a4a6 Move end_scope to stage 2 code 2020-05-11 05:24:12 -07:00
John Keiser 7219d28a31 Call end_scope directly from stage 2 code 2020-05-11 05:20:04 -07:00
John Keiser 0875bce68f Don't pass depth to on_end_* 2020-05-11 05:15:39 -07:00
John Keiser 54fe302907 Don't pass depth to end_scope 2020-05-11 05:06:41 -07:00
John Keiser edaa8f811f Move on_start_* depth management to stage 2 code 2020-05-11 05:03:25 -07:00
John Keiser 2c8fd109de Move increment_count to stage 2 2020-05-11 04:58:50 -07:00
John Keiser 16d88cc095 Don't pass depth to increment_count 2020-05-11 04:15:02 -07:00
Daniel Lemire 2a6e6b3dbd
Cleaning string_view (#872)
* Cleaning string_view

* Corrected typo

* Alignment.
2020-05-10 16:05:52 -04:00
John Keiser afb369950c Disable Intellisense-only warnings in simdjson.h/cpp 2020-05-04 11:47:04 -07:00
John Keiser 1d06624d38 Unset /D_CRT_SECURE_NO_WARNINGS
- Also localize DISABLE_DEPRECATED_WARNING so that we catch other
  deprecations
2020-05-04 11:35:05 -07:00
Pavel P d40069a018 Disable deprecation warnings for VS builds
fopen/getenv are standard c++ that are not deprecated.
2020-05-04 11:34:00 -07:00
Furkan Usta e04cbd71d0 Only install singleheader/simdjson.h as part of the public API 2020-05-02 01:44:11 +03:00
Daniel Lemire fc1ddcd2f8
Faster case-insensitive comparisons. (#837)
* Faster case-insensitive comparisons.
2020-04-30 15:52:28 -04:00
Furkan Usta 73d7d704c1 CMake: Remove export_private_library
Since we are exporting all the targets as part of the main simdjson target we do not need private
exports anymore
2020-04-30 02:06:19 +03:00
Furkan Usta eee07e6cfd Use the same export name for all targets 2020-04-29 23:47:27 +03:00
Nong Li 0f9dbf84b7
Fix incorrect check for case insensitive key lookup (#824) 2020-04-29 13:55:28 -04:00
Daniel Lemire 2a1f8fa8f1
Provides support for clang under Windows. (#817) 2020-04-27 22:09:27 -04:00
John Keiser 49da7e74cd
usage.md -> basics.md (#823) 2020-04-27 16:03:19 -04:00
PavelP 0514588175
Improves clang-cl build with Visual Studio (#809) 2020-04-27 08:59:32 -04:00
Daniel Lemire b99a7344c9
missing spaces. 2020-04-25 22:26:18 -04:00
Daniel Lemire f3ac0be0e6 Merge branch 'master' of github.com:simdjson/simdjson 2020-04-23 18:39:56 -04:00
Daniel Lemire 18c9468af5 Fixed typo 2020-04-23 18:39:32 -04:00
ostri d4239aaa8f
default initialisaiton (#779)
* padded_string.* default initialisation
parsedjson_iterator - copy constructor; depth_index not necessary
2020-04-23 18:32:11 -04:00
Daniel Lemire 4d0c7d706d
Warn 32-bit users about their doom. (#783) 2020-04-23 16:01:19 -04:00
Daniel Lemire 382392e03b
This should enable -Weffc++ (#777)
* Enabling -Weffc++
2020-04-23 13:03:04 -04:00