Commit Graph

96 Commits

Author SHA1 Message Date
Nicolas Boyer 5c590b8434
Bringing ndjson(document_stream) to On Demand (#1643)
* Update basic.md to document JSON pointer for On Demand.

* Add automatic rewind for at_pointer

* Remove DOM examples in basics.md and update documentation reflecting addition of at_pointer automatic rewinding.

* Review

* Add test

* Add document_stream constructors and iterate_many

* Attempt to implement streaming.

* Kind of fixed next() for getting next document

* Temporary save.

* Putting in working order.

* Add working doc_index and add function next_document()

* Attempt to implement streaming.

* Re-anchoring json_iterator after a call to stage 1

* I am convinced it should be a 'while'.

* Add source() with test.

* Add truncated_bytes().

* Fix casting issues.

* Fix old style cast.

* Fix privacy issue.

* Fix privacy issues.

* Again

* .

* Add more tests. Add error() for iterator class.

* Fix source() to not included whitespaces between documents.

* Fixing CI.

* Fix source() for multiple batches. Add new tests.

* Fix batch_start when document has leading spaces. Add new tests for that.

* Add new tests.

* Temporary save.

* Working hacky multithread version.

* Small fix in header files.

* Correct version (not working).

* Adding a move assignment to ondemand::parser.

* Fix attempt by changing std::swap.

* Moving DEFAULT_BATCH_SIZE and MINIMAL_BATCH_SIZE.

* Update doc and readme tests.

* Update basics.md

* Update readme_examples tests.

* Fix exceptions in test.

* Partial setup for amazon_cellphones.

* Benchmark with vectors.

* Benchmark with maps

* With vectors again.

* Fix for weighted average.

* DOM benchmark.

* Fix typos. Add On Demand benchmark.

* Add large amazon_cellphones benchmark for DOM

* Add benchmark for On demand.

* Fix broken read_me test.

* Add parser.threaded to enable/disable thread usage.

Co-authored-by: Daniel Lemire <lemire@gmail.com>
2021-07-20 14:17:23 -04:00
Daniel Lemire 2dac3705d2
renames 'to_string' to 'to_json_string' and makes it ridiculously fast (#1642)
* Changing the name of the function to 'to_json_string' from 'to_string' to avoid confusion.

* Moving to a fast string_view model

* Making it exception-safe.

* Tweaking.

* Workaround for exceptions.

* more robust to_json_string (#1651)

* WIP.

* Fuzzing timeout  (bug fix) (#1650)

* prove pull request #1648 introduces an infinite loop

* Interesting bug!

* Tweak.

Co-authored-by: Paul Dreik <github@pauldreik.se>

* It should now work.

* Moving car examples to exception mode

* Simplifying somewhat.

* I forgot to abandon. Let us do that.

* Adding more tests.

* WIP.

* It should now work.

* Moving car examples to exception mode

* Simplifying somewhat.

* I forgot to abandon. Let us do that.

* Adding more tests.

Co-authored-by: Paul Dreik <github@pauldreik.se>

Co-authored-by: Paul Dreik <github@pauldreik.se>
2021-07-19 10:24:36 -04:00
Nicolas Boyer eb849662c0
Update basic.md to document JSON pointer for On Demand. (#1618)
* Update basic.md to document JSON pointer for On Demand.

* Add automatic rewind for at_pointer

* Remove DOM examples in basics.md and update documentation reflecting addition of at_pointer automatic rewinding.

* Review

* Add test

Co-authored-by: Daniel Lemire <lemire@gmail.com>
2021-06-26 11:38:17 -04:00
Daniel Lemire f146294a85
Partial documentation regarding relative JSON pointers. (#1630)
* Attempt at bringing some sanity to partial/relative JSON pointers.

* Removing some white spaces.
2021-06-26 11:36:38 -04:00
Daniel Lemire 5b99a75ae1
count_elements did not like empty arrays. (#1631)
* count_elements did not like empty arrays.

* Minor cleaning.

* I don't understand.

* More cleaning.
2021-06-24 11:08:13 -04:00
Daniel Lemire cfe3adb599
Added tests over invalid documents. (#1626)
* Added tests over invalid documents.

* Tweaking.
2021-06-23 18:02:00 -04:00
Daniel Lemire 1c01fc35eb
This better documents invalidation. (#1625)
* This better documents invalidation.

* Tweak.
2021-06-22 11:33:25 -04:00
Nicolas Boyer ce38fe7bea
Add automatic rewind for at_pointer (#1624) 2021-06-21 15:17:24 -04:00
Nicolas Boyer a4803d50c5
Add JSON Pointer for On Demand (#1615)
* Add working JSON pointer for array of atoms.

* Add working JSON pointer for object with key-atom pairs.

* Add first version of JSON pointer.

* Update tests (2 tests).

* Make tests exceptionless.

* Fix builing issues.

* Add more tests. Add json_pointer validation in array-inl.h and object-inl.h and empty json_pointer in document-inl.h.

* Fix errors in tests.

* Review.

* Add missing comment.
2021-06-11 14:20:05 -04:00
Nicolas Boyer 3ba221eb8e
Add max_capacity setting for On Demand (#1610)
* First try at implementing max_capacity for simdjson_ondemand.

* Add max_capacity check.

* Update doc.

* Add one more example in doc for fixed capacity.

* Make allocate() public.

* Remove whitespace

* Found culprit whitespace.

* Duplicating variable.
2021-06-08 14:42:42 -04:00
Daniel Lemire 13ab123daf
Testing issue 1607. (#1608) 2021-06-07 10:50:48 -04:00
Daniel Lemire 16e8db1f17
Adding 'count_elements' method. (#1577)
* Adding 'count_elements' method.

* Actually reporting errors.

* removing white space.

* Removing white space again.

* Adding an extra example.

* Prettier.

* Making the functionality more error-proof.

* Avoiding exceptions.

* Various fixes including extending count_elements to value types.

* Various fixes.

* Minor fixes.

* Correcting comment.

* Trimming white spaces.
2021-06-06 17:56:00 -04:00
Daniel Lemire eb0ae041e3
Verification and bug fix of issue 1511 (#1602)
* Verification and bug fix.

* Removing comment.

* Removing spaces.

* Guarding exceptions.

* Tweaking the test
2021-06-06 17:55:33 -04:00
Daniel Lemire 19c3b1315a
Rewind functionality. (#1539)
* Rewind functionality.


* Keeping just the document rewind.
2021-06-04 09:22:33 -04:00
Daniel Lemire f44a53271d
Documentation for issue 1562 (Accessing escaped key with on-demand API) (#1563)
* Documentation for issue 1562.

* Making exception-free.

* Improving wording.
2021-06-04 09:21:52 -04:00
Daniel Lemire 1032f70ddf
Verifies and fixes issue 1588 (#1589)
* Verifies and fix issue 1588

* Removing a trailing space.
2021-05-27 19:35:42 -04:00
strager 16e2323153
Fix UB in dev checks when iterating empty object (#1587)
When find_field_unordered is used on an empty object, it calls
json_iterator::reenter_child. reenter_child asserts that it doesn't
rewind too far back by consulting parser->start_positions.

When the On Demand parser sees an empty object, it fails to update
parser->start_positions. This means that the assertion in
json_iterator::reenter_child reads stale data, or potentially
uninitialized memory. Reading uninitialized memory can cause spurious
assertion failures and Valgrind memcheck reports:

    Running missing_keys_for_empty_top_level_object ...
    ==170679== Conditional jump or move depends on uninitialised value(s)
    ==170679==    at 0x4943D7: reenter_child (json_iterator-inl.h:208)
    ==170679==    by 0x4943D7: find_field_unordered_raw (value_iterator-inl.h:197)
    ==170679==    by 0x4943D7: find_field_unordered (object-inl.h:13)
    ==170679==    by 0x4943D7: find_field_unordered (object-inl.h:96)
    ==170679==    by 0x4943D7: find_field_unordered (value-inl.h:110)
    ==170679==    by 0x4943D7: find_field_unordered (document-inl.h:105)
    ==170679==    by 0x4943D7: object_tests::missing_keys_for_empty_top_level_object() (ondemand_object_tests.cpp:117)
    ==170679==    by 0x4CA761: object_tests::run() (ondemand_object_tests.cpp:1085)
    ==170679==    by 0x8BA314: int test_main<bool ()>(int, char**, bool ( const&)()) (test_ondemand.h:81)
    ==170679==    by 0x4CA9C8: main (ondemand_object_tests.cpp:1119)
    ==170679==

Fix the read of uninitialized or stale memory by updating
parser->start_positions regardless of whether we see an empty object or
an object with some keys.

This commit only affects builds where development checks
(SIMDJSON_DEVELOPMENT_CHECKS) are enabled. Builds where development
checks are disabled are unaffected by this bug.
2021-05-27 08:34:28 -04:00
Daniel Lemire 4fb09824bf
Restricting how we can end key searches (#1575)
* Verifies bug with missing keys.

* Allowing search from any key.

* Workaround for buggy msys

* Restricting how we can end key searches.

* Adding a few tests.
2021-05-20 16:23:38 -04:00
Daniel Lemire ad1cd6a2ce
Documenting raw string access. (#1566)
* Documenting raw string access.

* Removing trailing space.
2021-05-20 13:57:48 -04:00
Daniel Lemire a27367210a
Improving how to_string is explained. (#1583) 2021-05-20 11:22:31 -04:00
Daniel Lemire efe9761f80
Fixing issue 1579. (#1580) 2021-05-19 12:23:17 -04:00
Daniel Lemire c1dffac28c
This moves all DOM (benchmark + test) files to a subdir (#1549)
* This moves all DOM (benchmark + test) files to a subdir

* Missing file.

* CMake + DLL is not pretty.

* Capitalizing AND

* Fixing mismatch endif

* Flipping the order.

* onedemand => ondemand
2021-04-30 18:33:45 -04:00
Daniel Lemire b3a22bea56
My third attempt at fixing issue 1521 (not being merged due to performance concerns) (#1530)
* Reduction of the missing-key bug.

* Adding the other test cases.

* Really simple fix for 1529
2021-04-05 11:55:39 -04:00
Daniel Lemire d0821adf0e
This implements string serialization for On Demand instances. (#1527)
* This implementations string serialization for On Demand instances.

* Adding more documentation.

* Another remark.

* Marking the new functions as inline.

* casts apparently do not work.

* Upgrading the API.

* Making the code really free from exceptions.

* At another fix for exceptionless.

* Modify to_chars so that it does not pad integers with '.0'.

* Negative 0 cannot be expressed as an integer.

* Again, accomodating exceptionless usage.

* Using x <= -0 does not allow you to determine the sign since 0 <= -0. I am not sure where
this bug comes from.
2021-04-01 11:25:00 -04:00
Daniel Lemire a6576f1d09
We should be able to open empty files (paranoid test) (#1519)
* We should be able to open empty files.

* Testing also the ondemand API.
2021-03-26 11:43:40 -04:00
Daniel Lemire 95b4870e20
Avoiding stack allocation. (#1515) 2021-03-23 11:32:04 -04:00
Daniel Lemire ddf610125f
Easy fix. (#1507) 2021-03-19 19:53:22 -04:00
Daniel Lemire 8e8fbc4cff
fixing issue 1480 (#1485) 2021-03-08 19:31:42 -05:00
John Keiser bad582c2d3 Add value.raw_json_token() 2021-03-05 09:07:41 -08:00
John Keiser f0e92e3bdd Pass "capacity" straight to iterate, support std::string 2021-03-03 12:51:00 -08:00
John Keiser 3db1a214ce Support user-provided buffers via promise_padded 2021-03-03 12:50:56 -08:00
John Keiser 9944db6d73 Move json_type to ondemand to prevent target mismatch inline errors 2021-03-02 18:31:17 -08:00
John Keiser 2ed24666b5 Add value.type() 2021-03-02 17:02:50 -08:00
John Keiser bcab8d3abf Check for end object/array at top level
This avoids a very unlikely buffer overrun that can occur in a particular kind of invalid JSON:
- the document is invalid with an unclosed top level array or object
- the last thing in the document is a number that ends at EOF
- the padding is filled entirely with numeric digits
2021-02-22 09:35:21 -08:00
John Keiser 74d6658f39 Make out of order iteration tests actually test errors in the loop 2021-02-21 11:43:36 -08:00
John Keiser 3076de0405 Use SIMDJSON_DEVELOPMENT_CHECKS instead of SIMDJSON_PRODUCTION
Don't enable in retail
2021-02-20 11:46:01 -08:00
John Keiser 4a0a0ed4c6 Split more tests into separate methods 2021-02-20 11:22:24 -08:00
John Keiser 9651efe626 Split up tests for compile times 2021-02-06 11:07:14 -08:00
John Keiser 14315ec5cd Default SIMDJSON_PRODUCTION to OFF for bare header usage 2021-02-06 11:06:37 -08:00
John Keiser a33bf40a7d Add tests for sibling indexing detection 2021-02-05 16:39:52 -08:00
John Keiser 3801ea7777 Disable all OUT_OF_ORDER_ITERATION checks when SIMDJSON_API_USAGE_CHECKS
is off
2021-02-05 16:39:44 -08:00
John Keiser e4626d233c Descend into fields at the value position, not the key 2021-02-05 10:18:01 -08:00
John Keiser 9934f65987 Store start index of each depth for safety 2021-02-05 10:17:28 -08:00
John Keiser 1bfbb6448a Check out-of-order error in object index 2021-01-26 20:49:14 -08:00
John Keiser fe726b0f80 Split up ondemand_dom_api_tests for sanitize build times 2021-01-26 19:42:37 -08:00
John Keiser 18ecc0032d Reenable test that is now working 2021-01-26 15:15:09 -08:00
John Keiser 1a1532c8cc Return INCORRECT_TYPE when numbers fail to parse
Also add tests for trying to get multiple types in a row
2021-01-26 14:59:13 -08:00
John Keiser e6d2b7759a Fix assertion when getting array after failing to get a scalar
Also remove distinction between & and && for array start, acting like
other types
2021-01-26 14:09:54 -08:00
Daniel Lemire 2a714f4e37
Hide the std::pair inheritance in our result instances (#1396)
* Fixing issue 1243

* The tie must go.

* Having std::pair be a protected inheritance breaks on demand.

* Putting it back.

* You really want to use emplace.

* Fixing one botched test.

* Prettier test.

* Using safer code.

* Fixing unsafe code.

* Simplifying the fuzzer.

* Trying another way.

* Ok. It should work without exceptions.

* Removing trailing spaces.
2021-01-18 12:00:02 -05:00
John Keiser 55faf4c5bc
Recommend simdjson::ondemand over simdjson::builtin::ondemand (#1380)
Co-authored-by: Daniel Lemire <lemire@gmail.com>
2021-01-14 17:33:49 -05:00