Commit Graph

38 Commits

Author SHA1 Message Date
Daniel Lemire 8eed8f5155
Document stream: truncate final unfinished document and give access to the number of truncated bytes. (#1534)
* Truncate final unclosed string.

* Adding more precise remarks.

* Better documentation and more robust code.

* ARM + PPC corrections.

* Patching ARM implementation with new stage1_mode parameter.

* Fixed most problems.

* Correcting white spaces and adding a remark.

* This adds the truncated_bytes() method to the stream instances.
2021-04-23 09:24:00 -04:00
Daniel Lemire d6f33e4830
This adds a little test to see if we can compiler with very strict flags (conventional casts) (#1417)
* This adds a little test to see if we can compiler with very strict flags.

* Trimming a leftover old-style cast.

* More cleaning.

* A few more pedantic casts.
2021-01-27 18:37:30 -05:00
John Keiser 62ded15cd8 Rename tweets/text/points -> result 2021-01-05 11:55:57 -08:00
Paul Dreik af4db55e66
remove trailing whitespace (#1284) 2020-11-03 21:48:09 +01:00
Daniel Lemire 8a8eea53a2
Prefixing macros (issue 1035) (#1124)
* Renaming partially done.

* More prefixing.

* I thought that this was fixed.

* Missed one.

* Missed a few.

* Missed another one.

* Minor fixes.
2020-08-18 18:25:36 -04:00
Daniel Lemire f80668e87f
This removes the crazy alignment requirements. (#1073)
* This removes the crazy alignment requirements.
2020-07-27 16:19:01 -04:00
John Keiser a7fc7d4ffb Switch from get(v,e) to e = get(v) 2020-06-20 17:57:09 -07:00
John Keiser 1aab4752e2 Store all parser state in the implementation 2020-06-01 12:15:54 -07:00
John Keiser 6a71b24495 Reuse stored buf and len from parser 2020-06-01 12:14:09 -07:00
John Keiser a3a9bde83e Move DOM parsing into concrete interface implementation 2020-06-01 12:14:09 -07:00
John Keiser 5312fd30e5 Fix CRT_SECURE warnings in clang 2020-05-04 11:36:00 -07:00
Daniel Lemire fa4ce6a8bc
There is confusion between gigabytes and gigibytes. Let us standardize throughout. (#838)
* There is confusion between gigabytes and gigibytes.

* Trying to be consistent.
2020-05-01 12:16:18 -04:00
John Keiser 92d7af0881 Don't include benchmark overhead in documents/s 2020-04-28 13:15:01 -07:00
John Keiser 0e6ea76e88
Make checkperf work on Windows (#799)
* Make command line arguments work for Windows

* Run checkperf on Windows
2020-04-27 14:20:05 -04:00
Daniel Lemire 0d1c574cb1
A few more changes... (#775)
* More nitpicking.
2020-04-23 11:36:52 -04:00
John Keiser d4a37f6ef5 Enable conversion warnings on Linux and Windows 2020-04-22 14:21:30 -07:00
Daniel Lemire 21dce6cca9
Displaying the numbers of documents parsed per second (#652)
* Some users are interested, as a metric, in the number of documents parsed per second.
Obviously, this means reusing the same parser again and again.

* Adding a sentence

* This update the parsingcompetition benchmark so that it displays the number of documents parsed per second.
2020-03-30 17:51:03 -04:00
John Keiser d93af1161d Remove set_capacity, replace with allocate
Makes allocation point more predictable
2020-03-30 13:49:54 -07:00
John Keiser 434776db1a Deprecate more things 2020-03-30 13:48:43 -07:00
John Keiser 03746b966b Move document/element/etc. under dom 2020-03-28 13:42:21 -07:00
Daniel Lemire 6cefeb338b
std::tie does not work on some compilers (#567)
* std::tie workaround.

* Cleaner solution
2020-03-19 16:56:45 -04:00
John Keiser 8e2c06cb0e Compile with -fno-exceptions 2020-03-17 13:54:37 -07:00
John Keiser 1a5d8f1957 Add tests for SIMDJSON_EXCEPTIONS=0, add `tie()` support 2020-03-17 13:54:37 -07:00
John Keiser e4e89fe27a
Fix parse benchmarker (#554)
* Fix parse benchmarker

* Make CI fail when parse doesn't work
2020-03-13 16:19:21 -04:00
John Keiser 40c6213d7e Add parser.load() and load_many() to load files 2020-03-11 17:19:41 -07:00
John Keiser d140bc23f5 Automatically allocate memory as needed in parse 2020-03-11 16:14:54 -07:00
John Keiser 31e8a12e88 Make error_message(error_code) return C string
- Also move all error message logic to include inline
2020-03-06 15:41:51 -08:00
John Keiser b3ea8c406e Add simdjson.cpp for unified use (#515) 2020-03-04 10:12:27 -08:00
John Keiser 99667f7c55 Create top level simdjson.h (#515)
- Allows everyone to #include the same way, singleheader or not.
2020-03-04 10:12:27 -08:00
John Keiser 910f272467
Add parser implementation interface and selection API (#501)
* Make architecture implementations virtual functions

- Easier to add new architectures (add implementation to implementation.cpp)
- Easier to add new algorithms / functions to architecture selection
(add to implementation.h, implement)
- Automatically select best implementation in static initialization
- Allow user to explicitly select implementation with a string (i.e.
parameter)
- Allow user to inspect current implementation name/description
- Allow user to list available implementations
- Eliminate architecture enum and architecture-based templating
- Add noexcept in non-inline functions

* Move implementation static methods to their own classes

* Detect best supported implementation on first use

* available_implementationsI() -> available_implementations
2020-02-21 16:34:27 -05:00
John Keiser 8e7d1a5f09
Separate document state from ParsedJson
This creates a "document" class with only user-facing document state (no parser internals).

- document: user-facing document state
- document::iterator: iterator (equivalent of ParsedJsonIterator)
- document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson)

Usage:

```c++
auto doc = simdjson::document::parse(buf, len); // less efficient but simplest
```

```c++
simdjson::document::parser parser; // reusable parser
parser.allocate_capacity(len);
simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser
doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient
```
2020-02-07 10:02:36 -08:00
Daniel Lemire 4518f1fba1 Some minor nitpicking. 2020-02-07 10:41:45 -05:00
John Keiser 6978a0b8d4 Benchmark escapes (#464)
* Add escapes as a feature we benchmark

* Don't print effectiveness metric unless verbose is on
2020-01-27 09:58:14 -05:00
Daniel Lemire f87e64f988
Add option to make buffers hot and remove recent benchmarking changes (#443)
* This revert the code back to how it was prior to the silly "run two stages" routine and instead
adds an option to benchmark the code over hot buffers. It turns out that it can be expensive,
when the files are large, to allocate the pages.
2020-01-15 19:48:00 -05:00
Daniel Lemire f97b655f02
Instead of emulating the whole parsing as stage 1 + stage 2, let us benchmark the real thing. (#441)
* Instead of emulating the whole parsing as stage 1 + stage 2, let us
benchmark the real thing.

* Adding explicit constructor.

* Adding warning to the benchmark user.

* Making re-running optional.
2020-01-11 10:14:22 -05:00
John Keiser 3b9e6bff3c Print stage 2 information in feature benchmarker 2020-01-02 17:23:21 -07:00
Daniel Lemire b2ebdb0d07
I think we can align the numbers better (so it is prettier). (#399)
* I think we can align the numbers better (so it is prettier).

* Remove space before %, align third line better

Co-authored-by: John Keiser <john@johnkeiser.com>
2019-12-20 19:58:49 -05:00
John Keiser e2f349e7bd Measure impact of utf-8 blocks and structurals per block directly 2019-12-17 11:41:13 -08:00