Commit Graph

2235 Commits

Author SHA1 Message Date
Nicolas Boyer 03f7396d50
Fix branches. (#1619) 2021-06-17 18:31:40 -04:00
Nicolas Boyer a4803d50c5
Add JSON Pointer for On Demand (#1615)
* Add working JSON pointer for array of atoms.

* Add working JSON pointer for object with key-atom pairs.

* Add first version of JSON pointer.

* Update tests (2 tests).

* Make tests exceptionless.

* Fix builing issues.

* Add more tests. Add json_pointer validation in array-inl.h and object-inl.h and empty json_pointer in document-inl.h.

* Fix errors in tests.

* Review.

* Add missing comment.
2021-06-11 14:20:05 -04:00
Daniel Lemire 40cba172ed
Adds compile-test for Visual Studio + ARM and turn developer mode throughout CI. (#1609)
* Adds compile-test for Visual Studio + ARM and turn developer mode throughout CI.

* Correcting YAML error.

* Disabling google benchmarks under Windows ARM.

* Turning off exceptions under ARM.
2021-06-09 16:42:37 -04:00
Nicolas Boyer 3ba221eb8e
Add max_capacity setting for On Demand (#1610)
* First try at implementing max_capacity for simdjson_ondemand.

* Add max_capacity check.

* Update doc.

* Add one more example in doc for fixed capacity.

* Make allocate() public.

* Remove whitespace

* Found culprit whitespace.

* Duplicating variable.
2021-06-08 14:42:42 -04:00
Daniel Lemire 8bc12fe7cb
Update basics.md 2021-06-07 14:54:18 -04:00
Daniel Lemire 34bb2079e7
Adding documentation regarding versions. (#1611)
* Adding documentation regarding versions.

* Minor tweaks.
2021-06-07 14:19:23 -04:00
Daniel Lemire 7ca016652e
Update README.md 2021-06-07 11:27:48 -04:00
Daniel Lemire 13ab123daf
Testing issue 1607. (#1608) 2021-06-07 10:50:48 -04:00
Daniel Lemire f54bd69b5b
Update bug_report.md 2021-06-07 09:57:43 -04:00
Daniel Lemire 16e8db1f17
Adding 'count_elements' method. (#1577)
* Adding 'count_elements' method.

* Actually reporting errors.

* removing white space.

* Removing white space again.

* Adding an extra example.

* Prettier.

* Making the functionality more error-proof.

* Avoiding exceptions.

* Various fixes including extending count_elements to value types.

* Various fixes.

* Minor fixes.

* Correcting comment.

* Trimming white spaces.
2021-06-06 17:56:00 -04:00
Daniel Lemire eb0ae041e3
Verification and bug fix of issue 1511 (#1602)
* Verification and bug fix.

* Removing comment.

* Removing spaces.

* Guarding exceptions.

* Tweaking the test
2021-06-06 17:55:33 -04:00
John Keiser 893e613faa
Don't #include "simdjson.cpp" in tests (#1605) 2021-06-06 14:44:04 -04:00
Daniel Lemire 714f0ba222
This deletes most of our data files making the repository much smaller (#1582)
* This deletes most of our data files making the repository much smaller.

* Removing dead code.

* Various minor fixes.
2021-06-04 09:24:03 -04:00
Daniel Lemire 19c3b1315a
Rewind functionality. (#1539)
* Rewind functionality.


* Keeping just the document rewind.
2021-06-04 09:22:33 -04:00
Daniel Lemire f44a53271d
Documentation for issue 1562 (Accessing escaped key with on-demand API) (#1563)
* Documentation for issue 1562.

* Making exception-free.

* Improving wording.
2021-06-04 09:21:52 -04:00
Nicolas Boyer d90714e8df
Add RapidJSON and nlohmann_json SAX to partial_tweets benchmark (#1597)
* Add first working version of rapidjson_sax for partial tweets.

* Add cleaner and faster rapidjson_sax

* Add nlohmann_json_sax.

* Replace array of bool by bitsets.

* Replace strdup to copy string in rapidjson_sax.

* Change std::string_view assignment in rapidjson_sax.
2021-06-03 16:41:20 -04:00
Nicolas Boyer c7fd7353a8
Add RapidJSON and nlohmann_json SAX to top_tweet benchmark (#1599)
* Add rapidjson_sax.h and fix typo in rapidjson.h

* Add nlohmann_json_sax.h and add user key check for screen_name in rapidjson_sax

* Change std::string_view assignement for text and screen_name.
2021-06-03 16:41:00 -04:00
Nicolas Boyer 05f15d88b6
Add large_random/rapidjson_sax.h and large_random/nlohmann_json_sax.h. Clean up kostya/rapidjson_sax.h (add flags also) and kostya/nlohmann_json_sax.h (#1600) 2021-06-03 16:40:39 -04:00
Nicolas Boyer d7d81c7152
Add RapidJSON and nlohmann_json SAX to find_tweet benchmark (#1598)
* Add rapidjson_sax.h .

* Add nlohmann_json_sax.h . Fix typos distinct_user_id/nlohmann_json_sax.h, find_tweet/rapidjson.h and find_tweet/rapidjson_sax.h .

* Add extra check for id key when looking for find_id.
2021-06-03 12:43:54 -04:00
Nicolas Boyer 73b510225f
Add RapidJSON and nlohmann_json SAX to distinct_user_id benchmark (#1593)
* Add rapidjson_sax for distinct_user_id

* Add nlohmann_json_sax.h for distinct_user_id

* Add flags for RapidJSON.

* Fix revisions.

* Fix revisions again.

* Replace strcpy with memcpy. Increase performance fix.
2021-06-01 14:51:27 -04:00
Daniel Lemire 5d2eca2363 Correcting a couple of typographic errors. 2021-06-01 13:59:32 -04:00
Daniel Lemire 939b6b854a
This adds /permissive- to recent visual studio builds (#1596)
* This adds /permissive-.

* Typo.

* Trying this simple fix.
2021-06-01 10:57:37 -04:00
Daniel Lemire 4f8bdf517a
Adds a warning message when SIMDJSON_DEVELOPER_MODE is OFF. (#1594) 2021-06-01 10:29:11 -04:00
Nicolas Boyer 369f66be35
Add RapidJSON and nlohmann_json SAX to kostya benchmark (#1592)
* Add RapidJSON and nlohmann_json SAX to kostya benchmark

* Remove trailing whitespaces

* Fix typo
2021-05-31 10:15:50 -04:00
Daniel Lemire 8a75dbf719
Update README.md 2021-05-28 09:05:56 -04:00
Daniel Lemire 1032f70ddf
Verifies and fixes issue 1588 (#1589)
* Verifies and fix issue 1588

* Removing a trailing space.
2021-05-27 19:35:42 -04:00
strager 16e2323153
Fix UB in dev checks when iterating empty object (#1587)
When find_field_unordered is used on an empty object, it calls
json_iterator::reenter_child. reenter_child asserts that it doesn't
rewind too far back by consulting parser->start_positions.

When the On Demand parser sees an empty object, it fails to update
parser->start_positions. This means that the assertion in
json_iterator::reenter_child reads stale data, or potentially
uninitialized memory. Reading uninitialized memory can cause spurious
assertion failures and Valgrind memcheck reports:

    Running missing_keys_for_empty_top_level_object ...
    ==170679== Conditional jump or move depends on uninitialised value(s)
    ==170679==    at 0x4943D7: reenter_child (json_iterator-inl.h:208)
    ==170679==    by 0x4943D7: find_field_unordered_raw (value_iterator-inl.h:197)
    ==170679==    by 0x4943D7: find_field_unordered (object-inl.h:13)
    ==170679==    by 0x4943D7: find_field_unordered (object-inl.h:96)
    ==170679==    by 0x4943D7: find_field_unordered (value-inl.h:110)
    ==170679==    by 0x4943D7: find_field_unordered (document-inl.h:105)
    ==170679==    by 0x4943D7: object_tests::missing_keys_for_empty_top_level_object() (ondemand_object_tests.cpp:117)
    ==170679==    by 0x4CA761: object_tests::run() (ondemand_object_tests.cpp:1085)
    ==170679==    by 0x8BA314: int test_main<bool ()>(int, char**, bool ( const&)()) (test_ondemand.h:81)
    ==170679==    by 0x4CA9C8: main (ondemand_object_tests.cpp:1119)
    ==170679==

Fix the read of uninitialized or stale memory by updating
parser->start_positions regardless of whether we see an empty object or
an object with some keys.

This commit only affects builds where development checks
(SIMDJSON_DEVELOPMENT_CHECKS) are enabled. Builds where development
checks are disabled are unaffected by this bug.
2021-05-27 08:34:28 -04:00
Pavel Novikov 2ec23bdf37
fixed some typos (#1585) 2021-05-24 09:21:00 -04:00
Daniel Lemire 4fb09824bf
Restricting how we can end key searches (#1575)
* Verifies bug with missing keys.

* Allowing search from any key.

* Workaround for buggy msys

* Restricting how we can end key searches.

* Adding a few tests.
2021-05-20 16:23:38 -04:00
Daniel Lemire ad1cd6a2ce
Documenting raw string access. (#1566)
* Documenting raw string access.

* Removing trailing space.
2021-05-20 13:57:48 -04:00
Daniel Lemire a27367210a
Improving how to_string is explained. (#1583) 2021-05-20 11:22:31 -04:00
Daniel Lemire efe9761f80
Fixing issue 1579. (#1580) 2021-05-19 12:23:17 -04:00
Ivan Volnov 0b75de12ef
Don't allocate std::string just for padded_string::load() (#1578)
* Don't allocate std::string just for padded_string::load()

Use std::string_view

* Remove reference from string_view
2021-05-18 12:32:51 -04:00
Daniel Lemire af5c8175b4
By default, we should not do the DOM checkperf… (#1571)
* By default, we should not do the DOM checkperf. These targets assume that main branch remains
compatible, an assumption that will break over time.
2021-05-15 15:28:59 -04:00
Amos Bird 8df32cea33
Return err when alloc failure (#1567) 2021-05-14 22:51:07 -04:00
Luigi Pinca e4150443ca
Update journal reference (#1565)
Update the journal reference of the "Validating UTF-8 In Less Than One
Instruction Per Byte" paper.
2021-05-10 08:16:37 -04:00
Daniel Lemire d539781cf3
This attempts to fix the fuzzers. (#1564)
* This attempts to fix the fuzzers.

* Retiring bintray.

* Disabling ARM fuzzing.
2021-05-07 22:59:26 -04:00
PavelP 2bbab7d892
Update CONTRIBUTORS (#1560) 2021-05-02 12:30:25 -04:00
Daniel Lemire 729c35c0f8 Removes docker file which is unused and untested, and updates the path to dom/parse. 2021-05-01 10:31:00 -04:00
Dirk Stolle 2abcc35031
fix serveral typos (#1558)
* fix typos in markdown files

* fix typos in CMake files

* fix typos in headers and test code
2021-05-01 10:19:53 -04:00
Daniel Lemire 85b910814e
Under ARM, it is slightly better to reverse the word once and then extract the bits. (#1545)
* Under ARM, it is slightly better to reverse the word once and then extract the bits.

* Guarding the zero_leading_bit call to avoid sanitizer warnings.
2021-04-30 18:34:21 -04:00
Daniel Lemire c1dffac28c
This moves all DOM (benchmark + test) files to a subdir (#1549)
* This moves all DOM (benchmark + test) files to a subdir

* Missing file.

* CMake + DLL is not pretty.

* Capitalizing AND

* Fixing mismatch endif

* Flipping the order.

* onedemand => ondemand
2021-04-30 18:33:45 -04:00
Daniel Lemire 911b06186b
Delete Dockerfile 2021-04-26 09:08:34 -04:00
D. Stolle be9d5d4e31
adjust GitHub links to current repository URL (#1553)
Switch links (mostly in comments) from old repository URL
<https://github.com/lemire/simdjson/> to the current URL
<https://github.com/simdjson/simdjson/>.
2021-04-26 09:08:14 -04:00
Daniel Lemire b32d66e7b6
Update README.md 2021-04-24 16:59:41 -04:00
Daniel Lemire 939bfc701a
Update README.md 2021-04-24 16:58:57 -04:00
Daniel Lemire 9c470822a1 Putting back the rstrip. 2021-04-23 10:54:21 -04:00
Daniel Lemire 59195bd5dc Removing unsupported '--parallel'. 2021-04-23 10:14:10 -04:00
bobergj ef8c2c434e
When realloc_if_needed, use loaded_bytes buffer rather always allocating a tmp one. (#1518) 2021-04-23 10:10:03 -04:00
friendlyanon 5ec85197f8
CMake refactor stage1 (#1512)
* Remove CMP0025 policy

This policy is already set to NEW by the minimum required version.

* Use HOMEPAGE_URL in the project call

* Use VERSION in the project call

* Detect if this is the top project

* Port simdjson-user-cmakecache to a CMake script

* Create a developer mode

The SIMDJSON_DEVELOPER_MODE option set to ON will enable targets that
are only useful for developers of simdjson.

* Consolidate root CML commands into logical sections

* Warn about intended use of developer mode

* Prettify the just_ascii test

* Remove redundant CMake variables

* Inline CML contents from include and src

* Raise minimum CMake requirement to 3.14

* Define proper install rules

* Restore thread support variable

* Add BUILD_SHARED_LIBS as a top level only option

* Force developer mode to be on in CI

* Include flags earlier in developer mode

* Set CMAKE_BUILD_TYPE conditionally

CMAKE_BUILD_TYPE is used only by single configuration generators and is
otherwise completely ignored.

* Remove useless static/shared options

simdjson now uses the CMake builtin BUILD_SHARED_LIBS to switch the
built artifact's type.

* Remove unused CMAKE_MODULE_PATH variable

* Refactor implementation switching into a module

* Factor exception option out into a module

* Reformat simdjson-flags.cmake

* Rename simdjson-flags to developer-options

* Accumulate properties into an include module

This is done this way to avoid using utility targets that must be
exported and installed, which could potentially be misused by users of
the library.

* Port impl definitions to props

* Port exception options to props

* Lift normal options to the top

* Port developer options to props

* Remove simdjson-flags from benchmark

* Document the developer mode in HACKING

* Fix include path in installed config file

* Fix formatting of prop commands

* Fix tests that include .cpp files

* Change GCC AVX fixes back to compile options

* Deprecate SIMDJSON_BUILD_STATIC

* Always link fuzz targets to simdjson

* Install CMake from simdjson's debian repo

* Add gnupg for apt-key

* Make sure ASan link flags come first

* Pass CI env variable to cmake invocation

* Install package for apt-add-repository

* Remove return() from flush macro

* Use directory level commands instead of props

* Restore the github repository variable

* Set developer mode unconditionally for checkperf

The CI env variable is only set in the CI and this target is always run
in developer mode.

* Attempt to fix ODR violation in parsing checks

These tests were compiling the simdjson.cpp file again and linking to
the simdjson library target causes ODR violations.

Instead of linking to the target, just inherit its props.

* Move variables before the source dir

* Mark props to be flushed after adding more

* Use props for every command for the library

* Use keyword form for linking libs

* Handle deprecation of SIMDJSON_JUST_LIBRARY

* Handle deprecations in a separate module

Co-authored-by: friendlyanon <friendlyanon@users.noreply.github.com>
2021-04-23 09:24:56 -04:00