Go to file
John Keiser 90a7503181 Rename pj -> doc, fix a few other idioms 2020-03-27 09:22:46 -07:00
.circleci Typo 2020-03-26 21:08:54 -04:00
.github/workflows Refactor stage 2 into structural_parser class 2020-01-02 13:12:22 -07:00
benchmark Remove document::parse and document::load 2020-03-26 10:13:09 -07:00
dependencies Compile with -fno-exceptions 2020-03-17 13:54:37 -07:00
doc Split docs into multiple files 2020-03-25 18:25:14 -07:00
extra Removing all stdout, stderr from main library. (#455) 2020-01-20 16:03:15 -05:00
fuzz Remove document_iterator, document::iterator, ParsedJsonIterator 2020-03-25 18:26:51 -07:00
images Moved file to proper directory. 2019-04-18 13:30:27 -04:00
include Cast padded_string to string_view instead of string 2020-03-27 09:13:11 -07:00
jsonchecker Adding another test 2020-01-02 14:22:43 -05:00
jsonexamples Streams of JSON documents + Large files (>4GB) (#350) (#364) 2019-11-08 17:39:45 -05:00
scripts Various fixes. 2020-03-26 20:08:54 -04:00
singleheader Fix typos. 2020-03-22 09:14:14 -07:00
src faster minifier (#568) 2020-03-20 16:14:47 -04:00
style Hiding the pointer away... (#252) 2019-08-04 15:41:00 -04:00
tests Rename pj -> doc, fix a few other idioms 2020-03-27 09:22:46 -07:00
tools Removing dead code. 2020-03-26 17:01:51 -04:00
windows dirent portable latest version (#435) 2020-01-07 18:41:57 -05:00
.appveyor.yml Add Google Benchmark for calling conventions 2020-02-18 08:37:07 -08:00
.clang-format We are adopting clang-format. 2019-08-01 15:40:07 -04:00
.dockerignore Bring .git into docker (#259) 2019-08-06 09:39:33 -04:00
.drone.yml Testing clang + libc++ (#579) 2020-03-21 11:23:11 -04:00
.gitattributes Add sane defaults for .sh and such (#254) 2019-08-04 18:11:48 -04:00
.gitignore Add fallback implementation to CI 2020-03-17 14:59:47 -07:00
.gitmodules Remove googletest entirely as benchmark dependency (#504) 2020-02-21 12:52:38 -05:00
.travis.yml Adding style scripts. (#243) 2019-08-01 16:09:26 -04:00
AUTHORS Create AUTHORS 2018-12-27 20:46:57 -05:00
CMakeLists.txt Add fallback implementation to CI 2020-03-17 14:59:47 -07:00
CONTRIBUTING.md Split docs into multiple files 2020-03-25 18:25:14 -07:00
CONTRIBUTORS Upgrading gcc to gcc 8 2020-01-06 18:28:29 -05:00
Dockerfile Fixing amalgamate under ARM 2019-07-30 22:10:48 +00:00
HACKING.md Split docs into multiple files 2020-03-25 18:25:14 -07:00
LICENSE Updating again. 2019-02-08 10:05:50 -05:00
Makefile Remove JsonStream. Use parse_many() instead. 2020-03-26 09:25:07 -07:00
README.md Split docs into multiple files 2020-03-25 18:25:14 -07:00
amalgamation.sh faster minifier (#568) 2020-03-20 16:14:47 -04:00

README.md

Build Status CircleCI Fuzzing Status Build status

simdjson : Parsing gigabytes of JSON per second

JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to parse JSON 2.5x faster than anything else out there.
  • Fast: Over 2.5x faster than other production-grade JSON parsers.
  • Easy: First-class, easy to use API.
  • Strict: Full JSON and UTF-8 validation, lossless parsing. Performance with no compromises.
  • Automatic: Selects a CPU-tailored parser at runtime. No configuration needed.
  • Reliable: From memory allocation to error handling, simdjson's design avoids surprises.

This library is part of the Awesome Modern C++ list.

Table of Contents

Quick Start

The simdjson library is easily consumable with a single .h and .cpp file.

  1. Prerequisites: g++ or clang++.

  2. Pull simdjson.h and simdjson.cpp into a directory, along with the sample file twitter.json.

    wget https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.h https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.cpp https://raw.githubusercontent.com/simdjson/simdjson/master/jsonexamples/twitter.json
    
  3. Create parser.cpp:

    #include "simdjson.h"
    int main(void) {
      simdjson::document::parser parser;
      simdjson::document& tweets = parser.load("twitter.json");
      std::cout << tweets["search_metadata"]["count"] << " results." << std::endl;
    }
    
  4. c++ -o parser parser.cpp simdjson.cpp -std=c++17

  5. ./parser

    100 results.
    

Documentation

Usage documentation is available:

  • Basics is an overview of how to use simdjson and its APIs.
  • Performance shows some more advanced scenarios and how to tune for them.
  • Implementation Selection describes runtime CPU detection and how you can work with it.

Performance results

The simdjson library uses three-quarters less instructions than state-of-the-art parser RapidJSON and fifty percent less than sajson. To our knowledge, simdjson is the first fully-validating JSON parser to run at gigabytes per second on commodity processors.

On a Skylake processor, the parsing speeds (in GB/s) of various processors on the twitter.json file are as follows.

parser GB/s
simdjson 2.2
RapidJSON encoding-validation 0.51
RapidJSON encoding-validation, insitu 0.71
sajson (insitu, dynamic) 0.70
sajson (insitu, static) 0.97
dropbox 0.14
fastjson 0.26
gason 0.85
ultrajson 0.42
jsmn 0.28
cJSON 0.34
JSON for Modern C++ (nlohmann/json) 0.10

Real-world usage

If you are planning to use simdjson in a product, please work from one of our releases.

Bindings and Ports of simdjson

We distinguish between "bindings" (which just wrap the C++ code) and a port to another programming language (which reimplements everything).

About simdjson

The simdjson library takes advantage of modern microarchitectures, parallelizing with SIMD vector instructions, reducing branch misprediction, and reducing data dependency to take advantage of each CPU's multiple execution cores.

Some people enjoy reading our paper: A description of the design and implementation of simdjson is in our research article in VLDB journal: Geoff Langdale, Daniel Lemire, Parsing Gigabytes of JSON per Second, VLDB Journal 28 (6), 2019appear)

We also have an informal blog post providing some background and context.

For the video inclined, simdjson at QCon San Francisco 2019 (it was the best voted talk, we're kinda proud of it).

Funding

The work is supported by the Natural Sciences and Engineering Research Council of Canada under grant number RGPIN-2017-03910.

Contributing to simdjson

Head over to CONTRIBUTING.md for information on contributing to simdjson, and HACKING.md for information on source, building, and architecture/design.

License

This code is made available under the Apache License 2.0.

Under Windows, we build some tools using the windows/dirent_portable.h file (which is outside our library code): it under the liberal (business-friendly) MIT license.