Go to file
friendlyanon 9f60093b41
Update CMake to at least 3.16 in Drone CI (#1261)
* Add script for CMake PPA

* Call the CMake PPA script in Drone CI

"apt-get update -qq" can be omitted, as that command is already called
by the script to pull in necessary packages for the CMake GPG keys.

* Remove sudo calls in the CMake PPA script

This script is intended to be run in Docker images, where the default
user is already root.

* Use echo instead of printf

* Use /etc/os-release instead of lsb_release

lsd_release could be installed, but os-release is just more convenient
to grab the version code from at this point.

* On Debian images grab CMake from buster-backports

It's not wise to mix Ubuntu PPAs with Debian and buster-backports has
CMake 3.16, which is recent enough for our purposes.

Co-authored-by: friendlyanon <friendlyanon@users.noreply.github.com>
2020-10-30 13:55:35 -04:00
.circleci Reenable the on-demand tests and allows us to convert a raw string into a C++ string. (#1232) 2020-10-19 20:22:24 -04:00
.github fuzz the on demand api (#1220) 2020-10-29 19:14:44 +01:00
benchmark PPC64 support (#1254) 2020-10-27 18:43:39 -04:00
cmake PPC64 support (#1254) 2020-10-27 18:43:39 -04:00
dependencies Adds yyjson to our internal benchmarks. (#1244) 2020-10-21 16:23:20 -04:00
doc PPC64 support (#1254) 2020-10-27 18:43:39 -04:00
examples Switch from get(v,e) to e = get(v) 2020-06-20 17:57:09 -07:00
extra Removing all stdout, stderr from main library. (#455) 2020-01-20 16:03:15 -05:00
fuzz fuzz the on demand api (#1220) 2020-10-29 19:14:44 +01:00
images Improving the doxygen. (#687) 2020-04-08 17:53:04 -04:00
include Mostly tiny changes, with one optimization to fallback for number parsing. (#1265) 2020-10-29 11:18:11 -04:00
jsonchecker Basics. (#1116) 2020-08-14 17:28:09 -04:00
jsonexamples Adding a new test file. (#922) 2020-08-18 10:42:45 -04:00
scripts Update CMake to at least 3.16 in Drone CI (#1261) 2020-10-30 13:55:35 -04:00
singleheader Issue release 2020-10-23 09:32:25 -04:00
src Fixing a minor logical error. 2020-10-29 16:42:50 -04:00
style Hiding the pointer away... (#252) 2019-08-04 15:41:00 -04:00
tests Fix for issue 1246. We document the relationship between parser instances and elements (#1250) 2020-10-26 08:40:45 -04:00
tools Make it possible to check that an implementation is supported at runtime (#1197) 2020-10-02 11:04:51 -04:00
windows Correcting typo (#1007) 2020-06-30 10:19:37 -04:00
.appveyor.yml Disable exceptions when compiling with MSVC (#1256) 2020-10-26 14:11:25 -04:00
.cirrus.yml This would disable bash scripts under FreeBSD. (#1118) 2020-08-17 11:50:57 -04:00
.clang-format We are adopting clang-format. 2019-08-01 15:40:07 -04:00
.dockerignore Add amalgamation support to cmake 2020-04-20 19:50:51 -07:00
.drone.yml Update CMake to at least 3.16 in Drone CI (#1261) 2020-10-30 13:55:35 -04:00
.gitattributes Use Unix line endings for c/c++ code (#1069) 2020-07-25 13:53:31 -04:00
.gitignore Make benchfeatures work again 2020-05-05 09:39:29 -07:00
.gitmodules Adds yyjson to our internal benchmarks. (#1244) 2020-10-21 16:23:20 -04:00
.travis.yml PPC64 support (#1254) 2020-10-27 18:43:39 -04:00
AUTHORS Update AUTHORS 2020-04-30 19:49:20 -04:00
CMakeLists.txt Issue release 2020-10-23 09:32:25 -04:00
CONTRIBUTING.md Tweaking. 2020-08-19 10:35:49 -04:00
CONTRIBUTORS Update CONTRIBUTORS 2020-10-27 18:44:28 -04:00
Dockerfile updating docker with instructions... (#901) 2020-06-04 20:06:29 -04:00
Doxyfile Issue release 2020-10-23 09:32:25 -04:00
HACKING.md PPC64 support (#1254) 2020-10-27 18:43:39 -04:00
LICENSE Updating again. 2019-02-08 10:05:50 -05:00
README.md Issue release 2020-10-23 09:32:25 -04:00
RELEASES.md release candidate (#1132) 2020-08-19 18:12:23 -04:00

README.md

Fuzzing Status Ubuntu 18.04 CI Ubuntu 20.04 CI VS16-CI MinGW64-CI Doxygen Documentation

simdjson : Parsing gigabytes of JSON per second

JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to parse JSON 2.5x faster than RapidJSON and 25x faster than JSON for Modern C++.
  • Fast: Over 2.5x faster than commonly used production-grade JSON parsers.
  • Record Breaking Features: Minify JSON at 6 GB/s, validate UTF-8 at 13 GB/s, NDJSON at 3.5 GB/s.
  • Easy: First-class, easy to use and carefully documented APIs.
  • Beyond DOM: Try the new On Demand API for twice the speed (>4GB/s).
  • Strict: Full JSON and UTF-8 validation, lossless parsing. Performance with no compromises.
  • Automatic: Selects a CPU-tailored parser at runtime. No configuration needed.
  • Reliable: From memory allocation to error handling, simdjson's design avoids surprises.
  • Peer Reviewed: Our research appears in venues like VLDB Journal, Software: Practice and Experience.

This library is part of the Awesome Modern C++ list.

Table of Contents

Quick Start

The simdjson library is easily consumable with a single .h and .cpp file.

  1. Prerequisites: g++ (version 7 or better) or clang++ (version 6 or better), and a 64-bit system with a command-line shell (e.g., Linux, macOS, freeBSD). We also support programming environnements like Visual Studio and Xcode, but different steps are needed.

  2. Pull simdjson.h and simdjson.cpp into a directory, along with the sample file twitter.json.

    wget https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.h https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.cpp https://raw.githubusercontent.com/simdjson/simdjson/master/jsonexamples/twitter.json
    
  3. Create quickstart.cpp:

    #include "simdjson.h"
    int main(void) {
      simdjson::dom::parser parser;
      simdjson::dom::element tweets = parser.load("twitter.json");
      std::cout << tweets["search_metadata"]["count"] << " results." << std::endl;
    }
    
  4. c++ -o quickstart quickstart.cpp simdjson.cpp

  5. ./quickstart

    100 results.
    

Documentation

Usage documentation is available:

  • Basics is an overview of how to use simdjson and its APIs.
  • Performance shows some more advanced scenarios and how to tune for them.
  • Implementation Selection describes runtime CPU detection and how you can work with it.
  • API contains the automatically generated API documentation.

Performance results

The simdjson library uses three-quarters less instructions than state-of-the-art parser RapidJSON and fifty percent less than sajson. To our knowledge, simdjson is the first fully-validating JSON parser to run at gigabytes per second (GB/s) on commodity processors. It can parse millions of JSON documents per second on a single core.

The following figure represents parsing speed in GB/s for parsing various files on an Intel Skylake processor (3.4 GHz) using the GNU GCC 9 compiler (with the -O3 flag). We compare against the best and fastest C++ libraries. The simdjson library offers full unicode (UTF-8) validation and exact number parsing. The RapidJSON library is tested in two modes: fast and exact number parsing. The sajson library offers fast (but not exact) number parsing and partial unicode validation. In this data set, the file sizes range from 65KB (github_events) all the way to 3.3GB (gsoc-2018). Many files are mostly made of numbers: canada, mesh.pretty, mesh, random and numbers: in such instances, we see lower JSON parsing speeds due to the high cost of number parsing. The simdjson library uses exact number parsing which is particular taxing.

On a Skylake processor, the parsing speeds (in GB/s) of various processors on the twitter.json file are as follows, using again GNU GCC 9.1 (with the -O3 flag). The popular JSON for Modern C++ library is particularly slow: it obviously trades parsing speed for other desirable features.

parser GB/s
simdjson 2.5
RapidJSON UTF8-validation 0.29
RapidJSON UTF8-valid., exact numbers 0.28
RapidJSON insitu, UTF8-validation 0.41
RapidJSON insitu, UTF8-valid., exact 0.39
sajson (insitu, dynamic) 0.62
sajson (insitu, static) 0.88
dropbox 0.13
fastjson 0.27
gason 0.59
ultrajson 0.34
jsmn 0.25
cJSON 0.31
JSON for Modern C++ (nlohmann/json) 0.11

The simdjson library offers high speed whether it processes tiny files (e.g., 300 bytes) or larger files (e.g., 3MB). The following plot presents parsing speed for synthetic files over various sizes generated with a script on a 3.4 GHz Skylake processor (GNU GCC 9, -O3).

All our experiments are reproducible.

You can go beyond 4 GB/s with our new On Demand API. For NDJSON files, we can exceed 3 GB/s with our multithreaded parsing functions.

Real-world usage

If you are planning to use simdjson in a product, please work from one of our releases.

Bindings and Ports of simdjson

We distinguish between "bindings" (which just wrap the C++ code) and a port to another programming language (which reimplements everything).

About simdjson

The simdjson library takes advantage of modern microarchitectures, parallelizing with SIMD vector instructions, reducing branch misprediction, and reducing data dependency to take advantage of each CPU's multiple execution cores.

Some people enjoy reading our paper: A description of the design and implementation of simdjson is in our research article:

We have an in-depth paper focused on the UTF-8 validation:

We also have an informal blog post providing some background and context.

For the video inclined,
simdjson at QCon San Francisco 2019
(it was the best voted talk, we're kinda proud of it).

Funding

The work is supported by the Natural Sciences and Engineering Research Council of Canada under grant number RGPIN-2017-03910.

Contributing to simdjson

Head over to CONTRIBUTING.md for information on contributing to simdjson, and HACKING.md for information on source, building, and architecture/design.

License

This code is made available under the Apache License 2.0.

Under Windows, we build some tools using the windows/dirent_portable.h file (which is outside our library code): it under the liberal (business-friendly) MIT license.

For compilers that do not support C++17, we bundle the string-view library which is published under the Boost license (http://www.boost.org/LICENSE_1_0.txt). Like the Apache license, the Boost license is a permissive license allowing commercial redistribution.