Commit Graph

203 Commits

Author SHA1 Message Date
Daniel Lemire bd2a31a0fe
Minor edits regarding the On Demand documentation. (#1384)
* Minor edits regarding the On Demand documentation.

* Adding more instructions for CMake

* Tweaking.

* Adding changes requested by John.

* Bringing back detailed explanations of -march=native.
2021-01-11 18:48:02 -05:00
John Keiser 17f4f82827 Ondemand usage docs (and associated tests)
Also disallowed parsing a temporary padded_string, since the JSON *must*
live through the whole parse.
2021-01-01 19:17:58 -08:00
John Keiser d91491bf13 Update documentation for out-of-order fields 2020-12-23 09:14:45 -08:00
John Keiser 2eaeac53e4 Revamp design documentation to match new design 2020-12-07 13:09:44 -08:00
Daniel Lemire 9304d88920
Prototype test for issue 1299: using parse_many, find the location of the end of the last document (#1301)
* Prototype test for issue 1299.

* This improves the documentation.

* Removing trailing white spaces.

* Removing trailing spaces

* Trailing.
2020-12-01 15:59:20 -05:00
Daniel Lemire 3fa40b8dc2
Adding an example corresponding to issue 1316 (documentation enhancement) (#1317)
* Adding an example.

* Updated other doc file.

* Trying to take into account @jkeiser's comments.

* Some people prefer empty final lines.
2020-11-27 17:40:29 -05:00
Paul Dreik def624a50c
update version tag (#1297) 2020-11-08 10:17:06 -05:00
Paul Dreik af4db55e66
remove trailing whitespace (#1284) 2020-11-03 21:48:09 +01:00
Danila Kutenin f46a0f64f2
PPC64 support (#1254)
* Initial PPC64 support

* Add travis CI

* Fix outdated cmake version for travis

* Fix indendtation

* Try another workaround for outdated cmake in travis

* Try beta cmake

* Add dash before beta

* Use builtin snaps

* Use cmake as rocksdb

* Test cmake on bionic

* Remove unnecessary things from travis

* Remove unnecessary things from travis

* Another try of compiler install

* Add all major compilers

* Add all major compilers

* Add all major compilers

* Tweak travis a bit

* Typo

* More robust travis

* Typos typos typos

* Add fewer compilers, add non specific build for clang and gcc, should be the final config

* CMAKE_FLAGS is in incorrect place

* Remove default implementation

* Limit build thread number

* Fall back prefix_xor to a usual implementation, no performance boost is noticed

* Test for power9 as it is the main architecture for OpenPOWER right now

* Add to documentation to build with power9 as the implementation is compatible but compiler optimizations is not

* Replace ARM with PPC in the comment
2020-10-27 18:43:39 -04:00
Jonathan Wakely 1fd0447dbb
Remove repeated words (#1252) 2020-10-26 20:41:01 -04:00
Daniel Lemire a75c07065f
Fix for issue 1246. We document the relationship between parser instances and elements (#1250)
* Fix for issue 1246.

* Adopting John's wording.
2020-10-26 08:40:45 -04:00
Daniel Lemire 14039d05a9
Adding a new benchmark for ondemand: distinct user id (#1239)
* Adding a distinct user id benchmark

* reenabling everything

* Removing an unnecessary "value()".

* Better tests of the examples and some fixes.

* Guarding exception code.
2020-10-23 08:47:01 -04:00
Daniel Lemire 0942dc0764
This fixes a typo and makes the types more explicit (#1241) 2020-10-20 17:41:37 -04:00
Daniel Lemire 0d6919dd99
Reenable the on-demand tests and allows us to convert a raw string into a C++ string. (#1232)
* Reenable the on-demand tests and allows us to convert a raw string into a C++ string.

* Fixing a 1-byte buffer overrun.

* More documentation.

* Adding more tests.

* Enabling the new tests

* Committing a nicer example.

* Not yet happy but this should fix our failures.

* Duh.

* Ok. Making it easier to get string_view instances from field instances.

* It is a struct.

* Trying to satisfy VS.

* Adopting John's name.
2020-10-19 20:22:24 -04:00
Daniel Lemire 0a907ec694
Tweaking further the documentation. (#1237)
* Tweaking further the documentation.

* More details.

* Another sentence.

* Saving.

* Tweaking more
2020-10-19 16:51:04 -04:00
Daniel Lemire 07a6e098c8
This would allow users to find out what builtin is. (#1227)
* This would allow users to find out what builtin is.

* Trying another approach.

* Added instructions.

* Cleaning up the printout.

* Let us be less invasive.

* Adding a comment.
2020-10-15 21:58:42 -04:00
Daniel Lemire 23026d966b Tweaking. 2020-10-15 21:55:23 -04:00
Daniel Lemire 3cd98df30d
This adds new tests regarding ordering. (#1233)
* This adds new tests regarding ordering.

* Updating the documentation with more examples.

* Adding compilation tests.

* Pruning code for exceptions.

* Guarding exceptionless.
2020-10-15 16:41:14 -04:00
Daniel Lemire 001be23258
Being more specific regarding the padding. (#1228)
* Being more specific regarding the padding.

* Even more precise.
2020-10-14 13:35:51 -04:00
Daniel Lemire c85b6682e0
This is a cleaner on-demand documentation (for discussion). (#1226)
* This is a cleaner on-demand documentation (for discussion).

* Added stable APIs.
2020-10-14 13:35:28 -04:00
Daniel Lemire ce94411dff Tweaking the documentation to better answer https://github.com/simdjson/simdjson/issues/1218 2020-10-09 10:02:56 -04:00
John Keiser a9480a768b
Merge pull request #947 from simdjson/jkeiser/stream-parse
On-Demand Parsing
2020-10-06 16:04:27 -07:00
Daniel Lemire 1f41cc2030
Making it clearer that parse_many is meant for *small* documents. (#1205)
* Making it clearer that parse_many is meant for *small* documents.

* Update parse_many.md
2020-10-06 17:19:34 -04:00
John Keiser 938678f87f Complete draft design doc 2020-10-06 11:29:45 -07:00
John Keiser 9dcf5fca5b Add ondemand rationale to beginning of document 2020-10-06 11:29:45 -07:00
John Keiser 4e3b4809ea [WIP] Nascent design doc for on demand 2020-10-04 12:47:29 -07:00
Daniel Lemire 9865bb6904
Make it possible to check that an implementation is supported at runtime (#1197)
* Make it possible to check that an implementation is supported at runtime.

* add CI fuzzing on arm 64 bit

This adds fuzzing on drone.io arm64

For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing.

Closes: #1188

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* Make it possible to check that an implementation is supported at runtime.

* Guarding the implementation accesses.

* Better doc.

* Updating cxxopts.

* We need to accomodate cxxopts

Co-authored-by: Paul Dreik <github@pauldreik.se>
2020-10-02 11:04:51 -04:00
Daniel Lemire f410213003
Improve documentation on padding
- Improves and clarifies the documentation on padding.
 - Use std:: prefix for memcpy, strlen etc.

Related to issues #1175 and #1178
2020-09-23 09:07:14 +02:00
Daniel Lemire 19cb5d57db
Some minor documentation fixes. (#1177) 2020-09-17 13:17:35 -04:00
Daniel Lemire 3e5497e2f9
Fixes issue 1170 and makes the usage of minify easier. (#1171)
* Fixes issue 1170 and makes the usage of minify easier.

* This should get the fallback implementation to detect unclosed strings.
2020-09-12 16:20:20 -04:00
Daniel Lemire c40aeaec3a
Fix for issue 1147 (#1153)
* This must be a typo

* Improving documentation of the string conversion.

* Minor update.
2020-09-03 13:18:15 -04:00
Daniel Lemire 5b10c38e43
Make parse_many safer. (#1137) 2020-08-20 22:22:46 -04:00
Daniel Lemire 3316df9195
Adding test for issue 1133 and improving documentation (#1134)
* Adding test.

* Saving.

* With exceptions.

* Added extensive tests.

* Better documentation.

* Tweaking CI

* Cleaning.

* Do not assume make.

* Let us make the build verbose

* Reorg

* I do not understand how circle ci works.

* Breaking it up.

* Better syntax.
2020-08-20 14:03:14 -04:00
Daniel Lemire 5d355f1a8b
release candidate (#1132) 2020-08-19 18:12:23 -04:00
Daniel Lemire a954d50ad4
This improves our documentation. (#1128)
* This improves our documentation.

* Removing tags for doxygen.

* You need a recent cmake remark.
2020-08-19 14:02:08 -04:00
Daniel Lemire 1ec710c985 Updating the documentation for hackers. 2020-08-19 10:59:24 -04:00
Daniel Lemire 09bd7e8ef8
Verification and fix for issue 1063 (JSON Pointers) (#1064)
* Specification is not followed.

* Fixes.

* Do not pass string_view by reference.

* Better documentation.

* The example is written for exceptions.

* Better documentation.

* Updating with deprecation.

* Updating example.

* Updating example.
2020-08-18 17:23:18 -04:00
Daniel Lemire 4a6eebc0e4
This corrects a small typo in the documentation. (#1121)
* This corrects a small typo in the documentation.

* Modifying the test as well.
2020-08-18 08:36:15 -04:00
John Keiser 1b69612246 Remove information about nonexistent computed gotos :) 2020-08-10 16:29:24 -07:00
Daniel Lemire ef45cd3342
Let us be explicit about standard compliance (#1099)
* Let us be explicit about standard compliance

* More explicit.
2020-08-06 18:24:36 -04:00
Daniel Lemire 2f92a34bb7
Turns out that passing dom::element by reference can be a performance killer. (#1086)
* Turns out that passing dom::element by reference can be a performance killer.

* Tweaking.
2020-08-01 10:31:47 -04:00
Daniel Lemire 268df9f67a
Update basics.md 2020-07-31 15:43:34 -04:00
Daniel Lemire 0ff6833e96
Update basics.md 2020-07-21 17:29:10 -04:00
Daniel Lemire ba58d868e5
Update performance.md 2020-07-14 15:00:31 -04:00
Ben McMorran c50799ba3b Fix TOC links in basics documentation
The "++" in "C++" gets stripped from the generated anchors, so the links in the table of contents didn't work.
2020-07-13 17:02:35 -04:00
Daniel Lemire 77e1e3cc18
Update performance.md 2020-07-12 18:35:15 -04:00
Daniel Lemire 7bdd41350a
Update performance.md 2020-07-12 18:31:45 -04:00
Daniel Lemire 62a39639c2
Update performance.md 2020-07-09 11:47:33 -04:00
Daniel Lemire 158aaff384
Update performance.md 2020-07-09 11:46:35 -04:00
Daniel Lemire fd836145fe
Update performance.md 2020-07-09 11:45:47 -04:00
Daniel Lemire 697bafdd0a
Update performance.md 2020-07-08 08:32:41 -04:00
Daniel Lemire 9675dcac44
Update performance.md 2020-07-06 19:03:18 -04:00
Daniel Lemire f7d99f97a3
Update performance.md 2020-07-04 11:52:40 -04:00
Daniel Lemire 8b7df0c12e
Update performance.md 2020-07-03 23:14:01 -04:00
Daniel Lemire bd780817f7
Update performance.md 2020-07-02 15:33:36 -04:00
Daniel Lemire b6f1f4ef64
Update basics.md 2020-06-29 21:41:50 -04:00
Daniel Lemire 1fd30db726
This example in our documentation would not compile (#1005)
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-29 16:25:11 -04:00
Daniel Lemire 4582a13360 Final steps. 2020-06-26 20:31:24 -04:00
Daniel Lemire 4c9f11b78a Missing character. 2020-06-25 10:15:13 -04:00
Daniel Lemire 5e690c5d04 Fixing the string_view issue. 2020-06-25 10:02:10 -04:00
Daniel Lemire 8f2a5649fe
Merge pull request #983 from TkTech/patch-1
Fix documentation links in basics.md
2020-06-24 20:44:46 -04:00
Daniel Lemire c3b25e12a5
Update implementation-selection.md 2020-06-24 20:42:04 -04:00
Daniel Lemire 6d3e33d440
Update parse_many.md 2020-06-24 20:41:38 -04:00
Daniel Lemire c11f7ce54f
Update performance.md 2020-06-24 20:41:06 -04:00
Tyler Kennedy 84806cc174
Fix documentation links in basics.md
Links to other files need to be either relative to themselves (doc/performance.md -> performance.md) or absolute (doc/performance.md -> /doc/performance.md). This change fixes the documentation when read on GitHub.
2020-06-24 20:20:14 -04:00
Daniel Lemire 3e35729eb6
Merge pull request #968 from simdjson/issue961
Fixing issue 961
2020-06-23 19:48:43 -04:00
Daniel Lemire 7e94309046
Update basics.md 2020-06-23 19:08:14 -04:00
Daniel Lemire c8a70a0a73 Tweaking the documentation. 2020-06-23 14:39:16 -04:00
Daniel Lemire b84a3a0230
Merge branch 'master' into issue961 2020-06-23 14:33:06 -04:00
Daniel Lemire 8cc9f496ee
Merge branch 'master' into dlemire/improving_documentation 2020-06-23 13:07:29 -04:00
Daniel Lemire 1547f2ec80 Pleasing John 2020-06-23 13:05:19 -04:00
John Keiser c650ea9765
Merge pull request #960 from simdjson/jkeiser/idiomatic-get
Convert simdjson to use .get()
2020-06-23 09:49:41 -07:00
John Keiser eef1171944
Merge pull request #954 from simdjson/jkeiser/parse-many-result
Return error from parse_many
2020-06-23 09:06:20 -07:00
John Keiser 12ccdcf858 Include document_stream line in parse_many docs 2020-06-23 08:49:47 -07:00
Daniel Lemire 696b0e29e4 Fixing issue 961 2020-06-23 10:47:32 -04:00
Daniel Lemire 5eb748ae17 This improves slightly the documentation, adding instructions for CMake users. 2020-06-23 09:33:15 -04:00
Daniel Lemire 89c2582376 Extending the documentation. 2020-06-22 16:32:00 -04:00
Daniel Lemire a76c67c19f Fixing... 2020-06-22 15:57:54 -04:00
John Keiser 1ff55c2729 Replace auto [x,error] with .get() everywhere 2020-06-21 16:26:59 -07:00
Daniel Lemire 38bb08778a With an example. 2020-06-21 17:57:22 -04:00
John Keiser 6fa5abcd7e Replace x.get<T>() with x.get(v) or T(x) 2020-06-21 14:36:38 -07:00
John Keiser a7fc7d4ffb Switch from get(v,e) to e = get(v) 2020-06-20 17:57:09 -07:00
John Keiser f336103f63 Convert tools/docs/benchmarks to bool get() idiom 2020-06-20 17:55:46 -07:00
John Keiser 56e2b38048 Add bool result from tie()/get(), get<T>(T&,error_code&) 2020-06-20 17:55:46 -07:00
Daniel Lemire 5ccdbef7d5
Merge pull request #936 from simdjson/dlemire/new_examples
New examples.
2020-06-18 18:29:06 -04:00
John Keiser f632e7c043 Put C++11 capable version back, change name to readme style 2020-06-18 12:50:49 -07:00
Daniel Lemire 3f00e79bcb
Merge branch 'master' into dlemire/better_doxygen_home_page 2020-06-17 16:02:49 -04:00
Daniel Lemire 14ceacac73 Tweaking. 2020-06-17 13:27:17 -04:00
Daniel Lemire 4474f8ef18 Cleaning a bit the examples. 2020-06-17 16:24:55 +00:00
Daniel Lemire b5ea504ad2 Tweaks doxygen so that we have a better main page. 2020-06-17 11:07:21 -04:00
Daniel Lemire 27a75a9085 Tweaking. 2020-06-15 17:54:34 -04:00
Daniel Lemire 954d6c326d New examples. 2020-06-15 17:45:15 -04:00
Daniel Lemire 16f41ea059 Added a word. 2020-06-14 18:48:42 -04:00
Daniel Lemire 0a7270fc29 More tweaks. 2020-06-14 18:47:22 -04:00
Daniel Lemire 23fbd9d004 Some tweaks. 2020-06-14 18:28:09 -04:00
John Keiser fd44c2a2ff
Merge pull request #927 from simdjson/dlemire/exposingthestringminifier
Exposing the string minifier.
2020-06-13 07:47:20 -07:00
John Keiser a86a82b39c Rename minify class to minifier so the minify() method is cleared up 2020-06-12 17:05:25 -07:00
Daniel Lemire 4dfbf98e4e
Using a worker instead of a thread per batch (#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading.

To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches.

  This fixes our parse_stream benchmark which is just busted.
  This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently.
This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time.

Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread.

Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-12 16:51:18 -04:00
Daniel Lemire be707dbb6f Added a remark 2020-06-12 16:07:34 -04:00
Daniel Lemire 45e2178ada Duh. 2020-06-11 17:20:28 +00:00