Update README.md

This commit is contained in:
Daniel Lemire 2018-12-27 20:33:18 -05:00 committed by GitHub
parent 46ef59c679
commit 5c6e156ea0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 12 additions and 1 deletions

View File

@ -1,11 +1,13 @@
# simdjson : Parsing gigabytes of JSON per second
A C++ library to see how fast we can parse JSON with complete validation.
# A C++ library to see how fast we can parse JSON with complete validation.
Goal: Speed up the parsing of JSON per se.
Insight: Use commonly available SIMD instructions as much as possible.
Constraint: Do full validation (including character encoding).
## Requirements
- Linux or macOS (currently)
@ -14,6 +16,15 @@ Insight: Use commonly available SIMD instructions as much as possible.
## Some performance results
In [Mison: A Fast JSON Parser for Data Analytics](http://www.vldb.org/pvldb/vol10/p1118-li.pdf) (VLDB 2018), Li et al. show how their SIMD-accelerated parser can achieve speeds exceeding slightly 2GB/s by skipping as much of the input bytes as possible. Thus Mison does not attempt to validate the document, by design. In contrast, we find that we can achieve similar speeds, but with full parsing:
<img src="doc/gbps.png" width="90%">
One key difference is that the Mison parser makes moderate use of the SIMD instructions available in their commodity processor.
We find that, for some inputs, we are limited in speed: for canada.json, marine_ik, mesh.json, mesh-pretty, about half of the processing time is due to number parsing (mostly floating-point numbers); for twitterescaped and random, string parsing is a burden.
We present the time (in cycles per input byte) needed to fully parse a JSON file (with error checking) and to collect some statistics about the document (e.g., the number of integers), for some JSON files. For these tests, we use an Intel processor with a Skylake microarchitecture. All results are single-threaded.
*Lower results are better.*