Commit Graph

966 Commits

Author SHA1 Message Date
Daniel Lemire 083569fca8
This code is terrible and should not be there. (#496) 2020-02-13 07:38:11 -05:00
Frank Wessels a8903d9765
Update reference for simdjson-go to minio (#492) 2020-02-11 12:58:04 -05:00
John Keiser 8e7d1a5f09
Separate document state from ParsedJson
This creates a "document" class with only user-facing document state (no parser internals).

- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)

Usage:

```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```

```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
2020-02-07 10:02:36 -08:00
Daniel Lemire c879b56f41 Fixing logical error. 2020-02-07 10:44:17 -05:00
Daniel Lemire 4518f1fba1 Some minor nitpicking. 2020-02-07 10:41:45 -05:00
Daniel Lemire 5c59b3a775 Fixing memory leaks. (Minor issue.) 2020-02-07 10:29:15 -05:00
Yufei Huang 299dfcdd3c
Update README.md (#486) 2020-02-07 07:56:16 -05:00
John Keiser 76c706644a
Move stage 2 tape writing to ParsedJson (#477)
This is a first step to allowing alternate tape formats.
2020-02-04 14:28:42 -08:00
John Keiser 0c8f2b9d85
Make "make amalgamate" more automatic (#480)
- automatically include local includes in the right places
2020-02-03 12:51:24 -05:00
Daniel Lemire c924aaede9
Fix issue472: make JsonStream a template. (#473)
* Fix issue472: make JsonStream a template.

* Adding missing include.

* Tweaking headers and some minor formatting.

* Removing file from aggregation.

* Moving jsoncharutils

* Adding new header.

* Trying another header.

* Let us try to route around Visual Studio's nonesense.
2020-01-30 17:16:41 -05:00
Daniel Lemire 28710f8ad5
fix for Issue 467 (#469)
* Fix for issue467

* Updating single-header

* Let us make it so that JsonStream is constructed from a padded_string which will avoid dangerous overruns.

* Fixing parse_stream

* Updating documentation.
2020-01-29 19:00:18 -05:00
Daniel Lemire e695a19d11
Trying to fix issue 465 (#466)
* Trying to fix issue 465

* Actually testing

* Refreshing amal.

* Removing spurious ;
2020-01-27 11:25:23 -05:00
John Keiser 6978a0b8d4 Benchmark escapes (#464)
* Add escapes as a feature we benchmark

* Don't print effectiveness metric unless verbose is on
2020-01-27 09:58:14 -05:00
Daniel Lemire 6784530b8b
Update CONTRIBUTING.md 2020-01-27 09:37:55 -05:00
Daniel Lemire 03d5fc33ca
Update CONTRIBUTING.md 2020-01-27 09:34:29 -05:00
Daniel Lemire ba14232628
Fixing mem leak. (#461) 2020-01-27 09:31:12 -05:00
Daniel Lemire 1cdf5581f3 Adding a contributing document. 2020-01-25 12:32:15 -05:00
Daniel Lemire 3488c49d0a
Basically, haswell processor should be able to count on lzcnt. (#458) 2020-01-22 16:52:55 -05:00
John Keiser adaef43bc6 Find all escaped characters with simpler algorithm (#450) 2020-01-22 14:11:14 -05:00
Daniel Lemire ce8fe1bdf6
This isolates a fix found in the large PR https://github.com/lemire/simdjson/pull/445 (#457) 2020-01-22 12:58:59 -05:00
Daniel Lemire fa04595d90 Correcting typo. 2020-01-22 11:08:53 -05:00
Daniel Lemire aea79912ec
Adding a "get_corpus" benchmark. (#456)
* Adding a "get_corpus" benchmark.

* Improving portability.
2020-01-20 17:27:25 -05:00
Daniel Lemire 80b4dd2e8a
Removing all stdout, stderr from main library. (#455)
* Removing all stdout,stderr from main library.
2020-01-20 16:03:15 -05:00
Daniel Lemire 48530b89ea
adding link to R bindings 2020-01-20 14:55:31 -05:00
Daniel Lemire a4025788ae
Commenting out one attribute when SIMDJSON_THREADS_ENABLED is off. (#453) 2020-01-20 11:18:29 -05:00
dbj c6f2f60b03 clarifications -- documentation update (#448) 2020-01-20 10:39:47 -05:00
Daniel Lemire 33060738b6
Making the project tag in simdjson more explicit and disabling LTO (#452)
* Making the project tag in simdjson more explicit

* Let us disable deliberately LTO.
2020-01-20 10:18:58 -05:00
Daniel Lemire ab6d4871d8
Adding haswell amal. tests (#447)
* Adding an extra test.

* Disabling the AVX-accelerated minifier.

* Updating amalgamation.
2020-01-15 19:49:11 -05:00
Daniel Lemire f87e64f988
Add option to make buffers hot and remove recent benchmarking changes (#443)
* This revert the code back to how it was prior to the silly "run two stages" routine and instead
adds an option to benchmark the code over hot buffers. It turns out that it can be expensive,
when the files are large, to allocate the pages.
2020-01-15 19:48:00 -05:00
Daniel Lemire 27861f6358 SIMDJSON_PADDING is now an absolute constant. This is temporary since
padding should go away once  https://github.com/lemire/simdjson/issues/174
is resolved.
2020-01-15 15:49:50 -05:00
Daniel Lemire f611b65bc0
This updates the minifier. (#446) 2020-01-15 13:45:32 -05:00
Daniel Lemire 2dc61fbdc4
Update README.md 2020-01-15 11:57:48 -05:00
Daniel Lemire 22be05400d
Update README.md 2020-01-15 10:18:03 -05:00
Daniel Lemire a9f501fe7d
Update README.md 2020-01-14 11:50:14 -05:00
Daniel Lemire e9077370ec
Update README.md 2020-01-14 10:42:04 -05:00
Daniel Lemire a804351a76
I think that i and idx should be size_t (64-bit). (#438) 2020-01-13 17:42:52 -05:00
Daniel Lemire f97b655f02
Instead of emulating the whole parsing as stage 1 + stage 2, let us benchmark the real thing. (#441)
* Instead of emulating the whole parsing as stage 1 + stage 2, let us
benchmark the real thing.

* Adding explicit constructor.

* Adding warning to the benchmark user.

* Making re-running optional.
2020-01-11 10:14:22 -05:00
Daniel Lemire 1498b78342 Minor simplifications. 2020-01-10 14:07:57 -05:00
dbj 85e84fc1fa improved string padded (#440)
* dirent portable latest version

* improved

std::string argument passed by const reference
ctor added with std::string_view  argument
`allocate_padded_buffer()`  moved here with **optional** check on `length < 1`

* allocate_padded_buffer moved to padded_string.h
2020-01-10 10:15:48 -05:00
Daniel Lemire 833e5d8bf1
Update README.md 2020-01-09 17:01:41 -05:00
UKABUER 773883c486 Fix #420 (#421) 2020-01-09 09:56:43 -05:00
Daniel Lemire 6e5e0278c2 Exposing bug #420 2020-01-09 09:55:54 -05:00
Daniel Lemire 951c4bedf8
Simpler jsonstream (#436)
* One simplification.

* Removing untested functions.
2020-01-07 19:10:02 -05:00
dbj 9842e1f9d0 dirent portable latest version (#435) 2020-01-07 18:41:57 -05:00
Daniel Lemire 4c0c1c9830 Updating a comment. 2020-01-06 22:01:23 -05:00
Daniel Lemire 6706d6053e Upgrading gcc to gcc 8 2020-01-06 18:28:29 -05:00
Daniel Lemire 0a874a5063 Some tuning 2020-01-06 11:41:07 -05:00
dbj 2caa6e3370 C++ language version detection (#418)
* added visual_studio folder where visual_studio cmake generated, local artefacts are

* C++ version detection
2020-01-06 11:38:09 -05:00
Daniel Lemire a9e990251d
removing left over debug 2020-01-04 12:50:04 -05:00
Daniel Lemire 7bde23590a
Debugging jsonstream (#432)
Fixes #424 (and provide tests for it), as well as #401
2020-01-03 22:22:47 -05:00