Updating the performance numbers. (#634)

* Updating the performance numbers.

* Updating with growing file sizes.
This commit is contained in:
Daniel Lemire 2020-03-27 14:11:02 -04:00 committed by GitHub
parent 2e420169c3
commit 1b6a31b277
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 36 additions and 14 deletions

View File

@ -76,25 +76,47 @@ The simdjson library uses three-quarters less instructions than state-of-the-art
fifty percent less than sajson. To our knowledge, simdjson is the first fully-validating JSON parser fifty percent less than sajson. To our knowledge, simdjson is the first fully-validating JSON parser
to run at gigabytes per second on commodity processors. to run at gigabytes per second on commodity processors.
The following figure represents parsing speed in GB/s for parsing various files
on an Intel Skylake processor (3.4 GHz) using the GNU GCC 9 compiler (with the -O3 flag).
We compare against the best and fastest C++ libraries.
The simdjson library offers full unicode (UTF-8) validation and exact
number parsing. The RapidJSON library is tested in two modes: fast and
exact number parsing. The sajson library offers fast (but not exact)
number parsing and partial unicode validation. In this data set, the file
sizes range from 65KB (github_events) all the way to 3.3GB (gsoc-2018).
Many files are mostly made of numbers: canada, mesh.pretty, mesh, random
and numbers: in such instances, we see lower JSON parsing speeds due to the
high cost of number parsing.
<img src="doc/gbps.png" width="90%"> <img src="doc/gbps.png" width="90%">
On a Skylake processor, the parsing speeds (in GB/s) of various processors on the twitter.json file On a Skylake processor, the parsing speeds (in GB/s) of various processors on the twitter.json file are as follows, using again GNU GCC 9.1 (with the -O3 flag). The popular JSON for Modern C++ library is particularly slow: it obviously trades parsing speed for other desirable features.
are as follows.
| parser | GB/s | | parser | GB/s |
| ------------------------------------- | ---- | | ------------------------------------- | ---- |
| simdjson | 2.2 | | simdjson | 2.5 |
| RapidJSON encoding-validation | 0.51 | | RapidJSON UTF8-validation | 0.29 |
| RapidJSON encoding-validation, insitu | 0.71 | | RapidJSON UTF8-valid., exact numbers | 0.28 |
| sajson (insitu, dynamic) | 0.70 | | RapidJSON insitu, UTF8-validation | 0.41 |
| sajson (insitu, static) | 0.97 | | RapidJSON insitu, UTF8-valid., exact | 0.39 |
| dropbox | 0.14 | | sajson (insitu, dynamic) | 0.62 |
| fastjson | 0.26 | | sajson (insitu, static) | 0.88 |
| gason | 0.85 | | dropbox | 0.13 |
| ultrajson | 0.42 | | fastjson | 0.27 |
| jsmn | 0.28 | | gason | 0.59 |
| cJSON | 0.34 | | ultrajson | 0.34 |
| JSON for Modern C++ (nlohmann/json) | 0.10 | | jsmn | 0.25 |
| cJSON | 0.31 |
| JSON for Modern C++ (nlohmann/json) | 0.11 |
The simdjson library offer high speed whether it processes tiny files (e.g., 300 bytes)
or larger files (e.g., 3MB). The following plot presents parsing
speed for [synthetic files over various sizes generated with a script](https://github.com/simdjson/simdjson_experiments_vldb2019/blob/master/experiments/growing/gen.py) on a 3.4 GHz Skylake processor (GNU GCC 9, -O3).
<img src="doc/growing.png" width="90%">
[All our experiments are reproducible](https://github.com/simdjson/simdjson_experiments_vldb2019).
Real-world usage Real-world usage
---------------- ----------------

Binary file not shown.

Before

Width:  |  Height:  |  Size: 49 KiB

After

Width:  |  Height:  |  Size: 67 KiB

BIN
doc/growing.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB