Update README.md

This commit is contained in:
Daniel Lemire 2021-03-12 15:01:12 -05:00 committed by GitHub
parent 727644c13a
commit 430f230940
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 7 additions and 37 deletions

View File

@ -11,12 +11,11 @@ simdjson : Parsing gigabytes of JSON per second
<img src="images/logo.png" width="10%" style="float: right"> <img src="images/logo.png" width="10%" style="float: right">
JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh
approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms
to parse JSON 2.5x faster than RapidJSON and 25x faster than JSON for Modern C++. to parse JSON 4x faster than RapidJSON and 25x faster than JSON for Modern C++.
* **Fast:** Over 2.5x faster than commonly used production-grade JSON parsers. * **Fast:** Over 4x faster than commonly used production-grade JSON parsers.
* **Record Breaking Features:** Minify JSON at 6 GB/s, validate UTF-8 at 13 GB/s, NDJSON at 3.5 GB/s. * **Record Breaking Features:** Minify JSON at 6 GB/s, validate UTF-8 at 13 GB/s, NDJSON at 3.5 GB/s.
* **Easy:** First-class, easy to use and carefully documented APIs. * **Easy:** First-class, easy to use and carefully documented APIs.
* **Beyond DOM:** Try the new On Demand API for twice the speed (>4GB/s).
* **Strict:** Full JSON and UTF-8 validation, lossless parsing. Performance with no compromises. * **Strict:** Full JSON and UTF-8 validation, lossless parsing. Performance with no compromises.
* **Automatic:** Selects a CPU-tailored parser at runtime. No configuration needed. * **Automatic:** Selects a CPU-tailored parser at runtime. No configuration needed.
* **Reliable:** From memory allocation to error handling, simdjson's design avoids surprises. * **Reliable:** From memory allocation to error handling, simdjson's design avoids surprises.
@ -85,44 +84,16 @@ Usage documentation is available:
Performance results Performance results
------------------- -------------------
The simdjson library uses three-quarters less instructions than state-of-the-art parser [RapidJSON](https://rapidjson.org) and The simdjson library uses three-quarters less instructions than state-of-the-art parser [RapidJSON](https://rapidjson.org). To our knowledge, simdjson is the first fully-validating JSON parser
fifty percent less than sajson. To our knowledge, simdjson is the first fully-validating JSON parser
to run at [gigabytes per second](https://en.wikipedia.org/wiki/Gigabyte) (GB/s) on commodity processors. It can parse millions of JSON documents per second on a single core. to run at [gigabytes per second](https://en.wikipedia.org/wiki/Gigabyte) (GB/s) on commodity processors. It can parse millions of JSON documents per second on a single core.
The following figure represents parsing speed in GB/s for parsing various files The following figure represents parsing speed in GB/s for parsing various files
on an Intel Skylake processor (3.4 GHz) using the GNU GCC 9 compiler (with the -O3 flag). on an Intel Skylake processor (3.4 GHz) using the GNU GCC 10 compiler (with the -O3 flag).
We compare against the best and fastest C++ libraries. We compare against the best and fastest C++ libraries on benchmarks that load and process the data.
The simdjson library offers full unicode ([UTF-8](https://en.wikipedia.org/wiki/UTF-8)) validation and exact The simdjson library offers full unicode ([UTF-8](https://en.wikipedia.org/wiki/UTF-8)) validation and exact
number parsing. The RapidJSON library is tested in two modes: fast and number parsing.
exact number parsing. The sajson library offers fast (but not exact)
number parsing and partial unicode validation. In this data set, the file
sizes range from 65KB (github_events) all the way to 3.3GB (gsoc-2018).
Many files are mostly made of numbers: canada, mesh.pretty, mesh, random
and numbers: in such instances, we see lower JSON parsing speeds due to the
high cost of number parsing. The simdjson library uses exact number parsing which
is particular taxing.
<img src="doc/gbps.png" width="90%">
On a Skylake processor, the parsing speeds (in GB/s) of various processors on the twitter.json file are as follows, using again GNU GCC 9.1 (with the -O3 flag). The popular JSON for Modern C++ library is particularly slow: it obviously trades parsing speed for other desirable features.
| parser | GB/s |
| ------------------------------------- | ---- |
| simdjson | 2.5 |
| RapidJSON UTF8-validation | 0.29 |
| RapidJSON UTF8-valid., exact numbers | 0.28 |
| RapidJSON insitu, UTF8-validation | 0.41 |
| RapidJSON insitu, UTF8-valid., exact | 0.39 |
| sajson (insitu, dynamic) | 0.62 |
| sajson (insitu, static) | 0.88 |
| dropbox | 0.13 |
| fastjson | 0.27 |
| gason | 0.59 |
| ultrajson | 0.34 |
| jsmn | 0.25 |
| cJSON | 0.31 |
| JSON for Modern C++ (nlohmann/json) | 0.11 |
<img src="doc/rome.png" width="90%">
The simdjson library offers high speed whether it processes tiny files (e.g., 300 bytes) The simdjson library offers high speed whether it processes tiny files (e.g., 300 bytes)
or larger files (e.g., 3MB). The following plot presents parsing or larger files (e.g., 3MB). The following plot presents parsing
@ -132,7 +103,6 @@ speed for [synthetic files over various sizes generated with a script](https://g
[All our experiments are reproducible](https://github.com/simdjson/simdjson_experiments_vldb2019). [All our experiments are reproducible](https://github.com/simdjson/simdjson_experiments_vldb2019).
You can go beyond 4 GB/s with our new [On Demand API](https://github.com/simdjson/simdjson/blob/master/doc/ondemand.md).
For NDJSON files, we can exceed 3 GB/s with [our multithreaded parsing functions](https://github.com/simdjson/simdjson/blob/master/doc/parse_many.md). For NDJSON files, we can exceed 3 GB/s with [our multithreaded parsing functions](https://github.com/simdjson/simdjson/blob/master/doc/parse_many.md).