simdjson/doc/ondemand.md

9.8 KiB

On Demand Basics

On Demand is a new, faster simdjson API with all the ease-of-use you are used to. While it provides a familiar DOM interface, under the hood it is different: it is parsing values as you use them. With On Demand, you do not waste time parsing JSON you do not use, and you do not pay the cost of generating an intermediate DOM tree.

We provide an overview of what you need to know to use the simdjson On Demand API, with examples.

The On Demand API supports the same JSON standards and C++ compilers as simdjson's DOM API. Refer to the DOM docs for more information:

For deeper information about the design and implementation of the simdjson On Demand API, refer to the design document.

Including On Demand

To include simdjson, copy simdjson.h and simdjson.cpp into your project. Then include it in your project with:

#include "simdjson.h"
using namespace simdjson; // optional

You can generally compile with:

c++ -O3 myproject.cpp simdjson.cpp

Note:

  • Users on macOS and other platforms where compilers do not provide C++11 compliant by default should request it with the appropriate flag (e.g., c++ -march=native -std=c++17 myproject.cpp simdjson.cpp).

The Basics: Loading and Parsing JSON Documents

The simdjson library offers a simple DOM tree API, which you can access by creating a ondemand::parser and calling the iterate() method:

ondemand::parser parser;
auto json = padded_string::load("twitter.json");
ondemand::document doc = parser.iterate(json); // load and parse a file

Or by creating a padded string (for efficiency reasons, simdjson requires a string with SIMDJSON_PADDING bytes at the end) and calling iterate():

ondemand::parser parser;
auto json = "[1,2,3]"_padded; // The _padded suffix creates a simdjson::padded_string instance
ondemand::document doc = parser.iterate(json); // parse a string

Documents Are Iterators

A document is not a fully-parsed JSON value; rather, it is an iterator over the JSON text. This means that while you iterate an array, or search for a field in an object, it is actually walking through the original JSON text, merrily reading commas and colons and brackets to make sure you get where you are going. This is the key to On Demand's performance: since it's just an iterator, it lets you parse values as you use them. And particularly, it lets you skip values you do not want to use.

Parser, Document and JSON Scope

Because a document is an iterator over the JSON text, both the JSON text and the parser must remain alive (in scope) while you are using it. Further, a parser may have at most one document open at a time, since it holds allocated memory used for the parsing.

During the iterate call, the original JSON text is never modified--only read. After you are done with the document, the source (whether file or string) can be safely discarded.

For best performance, a parser instance should be reused over several files: otherwise you will needlessly reallocate memory, an expensive process. It is also possible to avoid entirely memory allocations during parsing when using simdjson. See our performance notes for details.

Using the Parsed JSON

Once you have a document, you can navigate it with idiomatic C++ iterators, operators and casts. The following show how to use the JSON when exceptions are enabled, but simdjson has full, idiomatic support for users who avoid exceptions. See the simdjson DOM API's error handling documentation for more.

  • Extracting Values: You can cast a JSON element to a native type: double(element) or double x = json_element. This works for double, uint64_t, int64_t, bool, ondemand::object and ondemand::array. At this point, the number, string or boolean will be parsed, or the initial [ or { will be verified. An exception is thrown if the cast is not possible.

    IMPORTANT NOTE: values can only be parsed once. Since documents are iterators, once you have parsed a value (such as by casting to double), you cannot get at it again.

  • Field Access: To get the value of the "foo" field in an object, use object["foo"]. This will scan through the object looking for the field with the matching string.

    NOTE: simdjson does not unescape keys when matching. This is not generally a problem for applications with well-defined key names (which generally do not use escapes). If you do need this support, it's best to iterate through the object fields to find the field you are looking for.

    By default, field lookup is order-insensitive, so you can look up values in any order. However, we still encourage you to look up fields in the order you expect them in the JSON, as it is still much faster.

    If you want to enforce finding fields in order, you can use object.find_field("foo") instead. This will only look forward, and will fail to find fields in the wrong order: for example, this will fail:

    ondemand::parser parser;
    auto json = R"(  { "x": 1, "y": 2 }  )"_padded;
    auto doc = parser.iterate(json);
    double y = doc.find_field("y"); // The cursor is now after the 2 (at })
    double x = doc.find_field("x"); // This fails, because there are no more fields after "y"
    

    By contrast, using the default (order-insensitive) lookup succeeds:

    ondemand::parser parser;
    auto json = R"(  { "x": 1, "y": 2 }  )"_padded;
    auto doc = parser.iterate(json);
    double y = doc["y"]; // The cursor is now after the 2 (at })
    double x = doc["x"]; // Success: [] loops back around to find "x"
    
  • Array Iteration: To iterate through an array, use for (auto value : array) { ... }. This will step through each value in the JSON array.

    If you know the type of the value, you can cast it right there, too! for (double value : array) { ... }.

  • Object Iteration: You can iterate through an object's fields, as well: for (auto field : object) { ... }

    • field.unescaped_key() will get you the key string.
    • field.value() will get you the value, which you can then use all these other methods on.
  • Array Index: Because it is forward-only, you cannot look up an array element by index. Instead, you will need to iterate through the array and keep an index yourself.

Examples

The following code illustrates many of the above concepts:

ondemand::parser parser;
auto cars_json = R"( [
  { "make": "Toyota", "model": "Camry",  "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
  { "make": "Kia",    "model": "Soul",   "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
  { "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
] )"_padded;

// Iterating through an array of objects
for (ondemand::object car : parser.iterate(cars_json)) {
  // Accessing a field by name
  cout << "Make/Model: " << std::string_view(car["make"]) << "/" << std::string_view(car["model"]) << endl;

  // Casting a JSON element to an integer
  uint64_t year = car["year"];
  cout << "- This car is " << 2020 - year << "years old." << endl;

  // Iterating through an array of floats
  double total_tire_pressure = 0;
  for (double tire_pressure : car["tire_pressure"]) {
    total_tire_pressure += tire_pressure;
  }
  cout << "- Average tire pressure: " << (total_tire_pressure / 4) << endl;
}

Here is a different example illustrating the same ideas:

ondemand::parser parser;
auto points_json = R"( [
    {  "12345" : {"x":12.34, "y":56.78, "z": 9998877}   },
    {  "12545" : {"x":11.44, "y":12.78, "z": 11111111}  }
  ] )"_padded;

// Parse and iterate through an array of objects
for (ondemand::object points : parser.iterate(points_json)) {
  for (auto point : points) {
    cout << "id: " << std::string_view(point.unescaped_key()) << ": (";
    cout << point.value()["x"].get_double() << ", ";
    cout << point.value()["y"].get_double() << ", ";
    cout << point.value()["z"].get_int64() << endl;
  }
}

And another one:

auto abstract_json = R"(
  { "str" : { "123" : {"abc" : 3.14 } } }
)"_padded;
ondemand::parser parser;
auto doc = parser.iterate(abstract_json);
cout << doc["str"]["123"]["abc"].get_double() << endl; // Prints 3.14
  • Extracting Values (without exceptions): You can use a variant usage of get() with error codes to avoid exceptions. You first declare the variable of the appropriate type (double, uint64_t, int64_t, bool, ondemand::object and ondemand::array) and pass it by reference to get() which gives you back an error code: e.g.,

    auto abstract_json = R"(
      { "str" : { "123" : {"abc" : 3.14 } } }
    )"_padded;
    ondemand::parser parser;
    
    double value;
    auto doc = parser.iterate(abstract_json);
    auto error = doc["str"]["123"]["abc"].get(value);
    if (error) { std::cerr << error << std::endl; return EXIT_FAILURE; }
    cout << value << endl; // Prints 3.14