Commit Graph

680 Commits

Author SHA1 Message Date
Daniel Lemire eb93b98d6a
verify and fix issue 1668 (#1673)
* Adding test.

* Verifies and fix issue 1668. This commit updates the previous behavior of the
On Demand stream support by return a value type (document_reference) instead
of a reference to a document. This allows us to bridge with the usually simdjson
error system, with its simdjson_result types.

* Minor reformat.

* Adds a test with initial tests passing.

* Adding an example.
2021-07-27 08:51:07 -04:00
Nicolas Boyer 7d887fdc1e
Parse numbers inside strings (#1667)
* Update basic.md to document JSON pointer for On Demand.

* Add automatic rewind for at_pointer

* Remove DOM examples in basics.md and update documentation reflecting addition of at_pointer automatic rewinding.

* Review

* Add test

* Naive implementation for doubles in string.

* Add double from string in atom doc.

* Simplification (removed all *_from_string())

* Add int and uint parsing in string.

* Make duplicates instead.

* Make tests exceptionless.

* Add missing declarations.

* Add more tests (errors, JSON pointer).

* Add crypto json tests.

* Update doc.

* Update doc after review.

Co-authored-by: Daniel Lemire <lemire@gmail.com>
2021-07-27 08:50:44 -04:00
Daniel Lemire b79261eebc
This cleans a bit the current code, especially with respect to EOF guards. (#1669)
* Upgrading the GitHub Actions.

* Upgrading appveyor

* Upgrading circle ci.

* Cleaning.
2021-07-25 10:36:22 -04:00
Daniel Lemire 47a62db559
Isolated jkeiser fix for issue 1632: make it so that INCORRECT_TYPE is a recoverable condition in On Demand (#1663) 2021-07-23 11:32:26 -04:00
Nicolas Boyer 5c590b8434
Bringing ndjson(document_stream) to On Demand (#1643)
* Update basic.md to document JSON pointer for On Demand.

* Add automatic rewind for at_pointer

* Remove DOM examples in basics.md and update documentation reflecting addition of at_pointer automatic rewinding.

* Review

* Add test

* Add document_stream constructors and iterate_many

* Attempt to implement streaming.

* Kind of fixed next() for getting next document

* Temporary save.

* Putting in working order.

* Add working doc_index and add function next_document()

* Attempt to implement streaming.

* Re-anchoring json_iterator after a call to stage 1

* I am convinced it should be a 'while'.

* Add source() with test.

* Add truncated_bytes().

* Fix casting issues.

* Fix old style cast.

* Fix privacy issue.

* Fix privacy issues.

* Again

* .

* Add more tests. Add error() for iterator class.

* Fix source() to not included whitespaces between documents.

* Fixing CI.

* Fix source() for multiple batches. Add new tests.

* Fix batch_start when document has leading spaces. Add new tests for that.

* Add new tests.

* Temporary save.

* Working hacky multithread version.

* Small fix in header files.

* Correct version (not working).

* Adding a move assignment to ondemand::parser.

* Fix attempt by changing std::swap.

* Moving DEFAULT_BATCH_SIZE and MINIMAL_BATCH_SIZE.

* Update doc and readme tests.

* Update basics.md

* Update readme_examples tests.

* Fix exceptions in test.

* Partial setup for amazon_cellphones.

* Benchmark with vectors.

* Benchmark with maps

* With vectors again.

* Fix for weighted average.

* DOM benchmark.

* Fix typos. Add On Demand benchmark.

* Add large amazon_cellphones benchmark for DOM

* Add benchmark for On demand.

* Fix broken read_me test.

* Add parser.threaded to enable/disable thread usage.

Co-authored-by: Daniel Lemire <lemire@gmail.com>
2021-07-20 14:17:23 -04:00
Daniel Lemire 2dac3705d2
renames 'to_string' to 'to_json_string' and makes it ridiculously fast (#1642)
* Changing the name of the function to 'to_json_string' from 'to_string' to avoid confusion.

* Moving to a fast string_view model

* Making it exception-safe.

* Tweaking.

* Workaround for exceptions.

* more robust to_json_string (#1651)

* WIP.

* Fuzzing timeout  (bug fix) (#1650)

* prove pull request #1648 introduces an infinite loop

* Interesting bug!

* Tweak.

Co-authored-by: Paul Dreik <github@pauldreik.se>

* It should now work.

* Moving car examples to exception mode

* Simplifying somewhat.

* I forgot to abandon. Let us do that.

* Adding more tests.

* WIP.

* It should now work.

* Moving car examples to exception mode

* Simplifying somewhat.

* I forgot to abandon. Let us do that.

* Adding more tests.

Co-authored-by: Paul Dreik <github@pauldreik.se>

Co-authored-by: Paul Dreik <github@pauldreik.se>
2021-07-19 10:24:36 -04:00
Daniel Lemire bea1483cde
Fixing minor issue with document stream (DOM). (#1648)
* Fixing minor issue with document stream (DOM).

* Porting over the fix.
2021-07-05 17:40:04 -04:00
Daniel Lemire 7e646efd0f
This should print out (once) some instructions to interpret the logging traces. (#1637)
* This should print out (once) some instructions to interpret the logging traces.

* More details.
2021-06-26 11:38:38 -04:00
Nicolas Boyer eb849662c0
Update basic.md to document JSON pointer for On Demand. (#1618)
* Update basic.md to document JSON pointer for On Demand.

* Add automatic rewind for at_pointer

* Remove DOM examples in basics.md and update documentation reflecting addition of at_pointer automatic rewinding.

* Review

* Add test

Co-authored-by: Daniel Lemire <lemire@gmail.com>
2021-06-26 11:38:17 -04:00
Daniel Lemire f146294a85
Partial documentation regarding relative JSON pointers. (#1630)
* Attempt at bringing some sanity to partial/relative JSON pointers.

* Removing some white spaces.
2021-06-26 11:36:38 -04:00
Daniel Lemire 374de826ab
This introduces a reset functionality for object and array containers (#1639)
* This introduces a reset functionality.

* Minor simplification.

* Tweaking further.

* This should fix the tests.
2021-06-26 11:33:37 -04:00
Daniel Lemire 1fd3e32051
Removes is_at_container_start() and documents is_at_iterator_start(), move_at_start(), enter_at_container_start() (#1638)
* Removes is_at_container_start() and documents is_at_iterator_start()

* More documentation.
2021-06-25 13:26:43 -04:00
Daniel Lemire 5b99a75ae1
count_elements did not like empty arrays. (#1631)
* count_elements did not like empty arrays.

* Minor cleaning.

* I don't understand.

* More cleaning.
2021-06-24 11:08:13 -04:00
Daniel Lemire cfe3adb599
Added tests over invalid documents. (#1626)
* Added tests over invalid documents.

* Tweaking.
2021-06-23 18:02:00 -04:00
Daniel Lemire 1c01fc35eb
This better documents invalidation. (#1625)
* This better documents invalidation.

* Tweak.
2021-06-22 11:33:25 -04:00
Nicolas Boyer ce38fe7bea
Add automatic rewind for at_pointer (#1624) 2021-06-21 15:17:24 -04:00
Nicolas Boyer a4803d50c5
Add JSON Pointer for On Demand (#1615)
* Add working JSON pointer for array of atoms.

* Add working JSON pointer for object with key-atom pairs.

* Add first version of JSON pointer.

* Update tests (2 tests).

* Make tests exceptionless.

* Fix builing issues.

* Add more tests. Add json_pointer validation in array-inl.h and object-inl.h and empty json_pointer in document-inl.h.

* Fix errors in tests.

* Review.

* Add missing comment.
2021-06-11 14:20:05 -04:00
Nicolas Boyer 3ba221eb8e
Add max_capacity setting for On Demand (#1610)
* First try at implementing max_capacity for simdjson_ondemand.

* Add max_capacity check.

* Update doc.

* Add one more example in doc for fixed capacity.

* Make allocate() public.

* Remove whitespace

* Found culprit whitespace.

* Duplicating variable.
2021-06-08 14:42:42 -04:00
Daniel Lemire 16e8db1f17
Adding 'count_elements' method. (#1577)
* Adding 'count_elements' method.

* Actually reporting errors.

* removing white space.

* Removing white space again.

* Adding an extra example.

* Prettier.

* Making the functionality more error-proof.

* Avoiding exceptions.

* Various fixes including extending count_elements to value types.

* Various fixes.

* Minor fixes.

* Correcting comment.

* Trimming white spaces.
2021-06-06 17:56:00 -04:00
Daniel Lemire eb0ae041e3
Verification and bug fix of issue 1511 (#1602)
* Verification and bug fix.

* Removing comment.

* Removing spaces.

* Guarding exceptions.

* Tweaking the test
2021-06-06 17:55:33 -04:00
Daniel Lemire 19c3b1315a
Rewind functionality. (#1539)
* Rewind functionality.


* Keeping just the document rewind.
2021-06-04 09:22:33 -04:00
Daniel Lemire 5d2eca2363 Correcting a couple of typographic errors. 2021-06-01 13:59:32 -04:00
Daniel Lemire 1032f70ddf
Verifies and fixes issue 1588 (#1589)
* Verifies and fix issue 1588

* Removing a trailing space.
2021-05-27 19:35:42 -04:00
strager 16e2323153
Fix UB in dev checks when iterating empty object (#1587)
When find_field_unordered is used on an empty object, it calls
json_iterator::reenter_child. reenter_child asserts that it doesn't
rewind too far back by consulting parser->start_positions.

When the On Demand parser sees an empty object, it fails to update
parser->start_positions. This means that the assertion in
json_iterator::reenter_child reads stale data, or potentially
uninitialized memory. Reading uninitialized memory can cause spurious
assertion failures and Valgrind memcheck reports:

    Running missing_keys_for_empty_top_level_object ...
    ==170679== Conditional jump or move depends on uninitialised value(s)
    ==170679==    at 0x4943D7: reenter_child (json_iterator-inl.h:208)
    ==170679==    by 0x4943D7: find_field_unordered_raw (value_iterator-inl.h:197)
    ==170679==    by 0x4943D7: find_field_unordered (object-inl.h:13)
    ==170679==    by 0x4943D7: find_field_unordered (object-inl.h:96)
    ==170679==    by 0x4943D7: find_field_unordered (value-inl.h:110)
    ==170679==    by 0x4943D7: find_field_unordered (document-inl.h:105)
    ==170679==    by 0x4943D7: object_tests::missing_keys_for_empty_top_level_object() (ondemand_object_tests.cpp:117)
    ==170679==    by 0x4CA761: object_tests::run() (ondemand_object_tests.cpp:1085)
    ==170679==    by 0x8BA314: int test_main<bool ()>(int, char**, bool ( const&)()) (test_ondemand.h:81)
    ==170679==    by 0x4CA9C8: main (ondemand_object_tests.cpp:1119)
    ==170679==

Fix the read of uninitialized or stale memory by updating
parser->start_positions regardless of whether we see an empty object or
an object with some keys.

This commit only affects builds where development checks
(SIMDJSON_DEVELOPMENT_CHECKS) are enabled. Builds where development
checks are disabled are unaffected by this bug.
2021-05-27 08:34:28 -04:00
Daniel Lemire 4fb09824bf
Restricting how we can end key searches (#1575)
* Verifies bug with missing keys.

* Allowing search from any key.

* Workaround for buggy msys

* Restricting how we can end key searches.

* Adding a few tests.
2021-05-20 16:23:38 -04:00
Daniel Lemire efe9761f80
Fixing issue 1579. (#1580) 2021-05-19 12:23:17 -04:00
Ivan Volnov 0b75de12ef
Don't allocate std::string just for padded_string::load() (#1578)
* Don't allocate std::string just for padded_string::load()

Use std::string_view

* Remove reference from string_view
2021-05-18 12:32:51 -04:00
Dirk Stolle 2abcc35031
fix serveral typos (#1558)
* fix typos in markdown files

* fix typos in CMake files

* fix typos in headers and test code
2021-05-01 10:19:53 -04:00
Daniel Lemire 85b910814e
Under ARM, it is slightly better to reverse the word once and then extract the bits. (#1545)
* Under ARM, it is slightly better to reverse the word once and then extract the bits.

* Guarding the zero_leading_bit call to avoid sanitizer warnings.
2021-04-30 18:34:21 -04:00
D. Stolle be9d5d4e31
adjust GitHub links to current repository URL (#1553)
Switch links (mostly in comments) from old repository URL
<https://github.com/lemire/simdjson/> to the current URL
<https://github.com/simdjson/simdjson/>.
2021-04-26 09:08:14 -04:00
bobergj ef8c2c434e
When realloc_if_needed, use loaded_bytes buffer rather always allocating a tmp one. (#1518) 2021-04-23 10:10:03 -04:00
friendlyanon 5ec85197f8
CMake refactor stage1 (#1512)
* Remove CMP0025 policy

This policy is already set to NEW by the minimum required version.

* Use HOMEPAGE_URL in the project call

* Use VERSION in the project call

* Detect if this is the top project

* Port simdjson-user-cmakecache to a CMake script

* Create a developer mode

The SIMDJSON_DEVELOPER_MODE option set to ON will enable targets that
are only useful for developers of simdjson.

* Consolidate root CML commands into logical sections

* Warn about intended use of developer mode

* Prettify the just_ascii test

* Remove redundant CMake variables

* Inline CML contents from include and src

* Raise minimum CMake requirement to 3.14

* Define proper install rules

* Restore thread support variable

* Add BUILD_SHARED_LIBS as a top level only option

* Force developer mode to be on in CI

* Include flags earlier in developer mode

* Set CMAKE_BUILD_TYPE conditionally

CMAKE_BUILD_TYPE is used only by single configuration generators and is
otherwise completely ignored.

* Remove useless static/shared options

simdjson now uses the CMake builtin BUILD_SHARED_LIBS to switch the
built artifact's type.

* Remove unused CMAKE_MODULE_PATH variable

* Refactor implementation switching into a module

* Factor exception option out into a module

* Reformat simdjson-flags.cmake

* Rename simdjson-flags to developer-options

* Accumulate properties into an include module

This is done this way to avoid using utility targets that must be
exported and installed, which could potentially be misused by users of
the library.

* Port impl definitions to props

* Port exception options to props

* Lift normal options to the top

* Port developer options to props

* Remove simdjson-flags from benchmark

* Document the developer mode in HACKING

* Fix include path in installed config file

* Fix formatting of prop commands

* Fix tests that include .cpp files

* Change GCC AVX fixes back to compile options

* Deprecate SIMDJSON_BUILD_STATIC

* Always link fuzz targets to simdjson

* Install CMake from simdjson's debian repo

* Add gnupg for apt-key

* Make sure ASan link flags come first

* Pass CI env variable to cmake invocation

* Install package for apt-add-repository

* Remove return() from flush macro

* Use directory level commands instead of props

* Restore the github repository variable

* Set developer mode unconditionally for checkperf

The CI env variable is only set in the CI and this target is always run
in developer mode.

* Attempt to fix ODR violation in parsing checks

These tests were compiling the simdjson.cpp file again and linking to
the simdjson library target causes ODR violations.

Instead of linking to the target, just inherit its props.

* Move variables before the source dir

* Mark props to be flushed after adding more

* Use props for every command for the library

* Use keyword form for linking libs

* Handle deprecation of SIMDJSON_JUST_LIBRARY

* Handle deprecations in a separate module

Co-authored-by: friendlyanon <friendlyanon@users.noreply.github.com>
2021-04-23 09:24:56 -04:00
Daniel Lemire 8eed8f5155
Document stream: truncate final unfinished document and give access to the number of truncated bytes. (#1534)
* Truncate final unclosed string.

* Adding more precise remarks.

* Better documentation and more robust code.

* ARM + PPC corrections.

* Patching ARM implementation with new stage1_mode parameter.

* Fixed most problems.

* Correcting white spaces and adding a remark.

* This adds the truncated_bytes() method to the stream instances.
2021-04-23 09:24:00 -04:00
Daniel Lemire b3a22bea56
My third attempt at fixing issue 1521 (not being merged due to performance concerns) (#1530)
* Reduction of the missing-key bug.

* Adding the other test cases.

* Really simple fix for 1529
2021-04-05 11:55:39 -04:00
Daniel Lemire d0821adf0e
This implements string serialization for On Demand instances. (#1527)
* This implementations string serialization for On Demand instances.

* Adding more documentation.

* Another remark.

* Marking the new functions as inline.

* casts apparently do not work.

* Upgrading the API.

* Making the code really free from exceptions.

* At another fix for exceptionless.

* Modify to_chars so that it does not pad integers with '.0'.

* Negative 0 cannot be expressed as an integer.

* Again, accomodating exceptionless usage.

* Using x <= -0 does not allow you to determine the sign since 0 <= -0. I am not sure where
this bug comes from.
2021-04-01 11:25:00 -04:00
Daniel Lemire ddf610125f
Easy fix. (#1507) 2021-03-19 19:53:22 -04:00
Daniel Lemire b6cce3d744
Let us stop evoluating. (#1506) 2021-03-18 22:42:36 -04:00
Daniel Lemire 8a3b2f20e4 Version 0.9.1 2021-03-18 11:31:38 -04:00
Daniel Lemire 62cd5f7984 get_root_value is dead code that should have been removed. 2021-03-18 11:30:40 -04:00
Daniel Lemire 2db4592571
Last commit for version 0.9.0. (#1503)
* Last commit for version 0.9.0.

* Removing space.
2021-03-17 11:08:44 -04:00
Daniel Lemire 02f9b83353
This moves us to On Demand as the default front-end. (#1494)
* This moves us to On Demand as the default front-end.

* Made casting magical

* Adding another section

* Undoing my damage.
2021-03-12 14:19:11 -05:00
John Keiser cfc965ff9a
Merge pull request #1490 from simdjson/jkeiser/single-ondemand
Don't compile On Demand with extra flags
2021-03-09 16:03:58 -08:00
John Keiser 751696d7eb Move implementation selection to implementations.h 2021-03-09 09:10:08 -08:00
Daniel Lemire 8b8af6aee5
Making input capacity more robust. (#1488) 2021-03-09 09:58:38 -05:00
Daniel Lemire 8e8fbc4cff
fixing issue 1480 (#1485) 2021-03-08 19:31:42 -05:00
John Keiser 985dfab2c4 Don't use TARGET unless the target options are *not* specified
This eliminates the possibility of inlining target failures for ondemand

Also makes it so we always compile common architectures needed by simdjson.cpp in simdjson.h, since amalgamation has no way to reason about whether to include / exclude it.
2021-03-08 13:49:09 -08:00
John Keiser 633161fe86 Don't include target flags if the compiler already has them on 2021-03-08 13:48:58 -08:00
John Keiser f51d50399c Only include builtin implementation from header 2021-03-08 13:48:53 -08:00
John Keiser cf4e538536 Separate builtin implementation from "all implementations" 2021-03-06 13:08:42 -08:00
John Keiser ec5ba79447 Add base.h to allow src/ to pick and choose includes 2021-03-05 11:48:34 -08:00