On Demand Basics ================ On Demand is a new, faster simdjson API with all the ease-of-use you're used to. While it provides a familiar DOM interface, under the hood it is anything but: it is parsing values *as you use them.* This means you don't waste time parsing JSON you don't use, and you don't pay the cost of generating an intermediate DOM tree. An overview of what you need to know to use simdjson, with examples. * [Including ondemand](#including-ondemand) * [The Basics: Loading and Parsing JSON Documents](#the-basics-loading-and-parsing-json-documents) * [Using the Parsed JSON](#using-the-parsed-json) ondemand supports the same JSON standards and C++ compilers as simdjson's older DOM API. Refer to the DOM docs for more information: * [Requirements](basics.md##requirements) * [Using simdjson as a CMake dependency](#using-simdjson-as-a-cmake-dependency) * [Error Handling](basics.md#error-handling) * [Error Handling Example](basics.md#error-handling-example) * [Exceptions](basics.md#exceptions) * [Thread Safety](basics.md#thread-safety) * [Standard Compliance](basics.md#standard-compliance) * [C++11 Support and string_view](basics.md#c11-support-and-string_view) * [C++17 Support](basics.md#c17-support) * [Backwards Compatibility](basics.md#backwards-compatibility) For deeper information about the design and implementation of simdjson's ondemand API, refer to the [design document](ondemand.md). Including ondemand ------------------ To include simdjson, copy [simdjson.h](/singleheader/simdjson.h) and [simdjson.cpp](/singleheader/simdjson.cpp) into your project. Then include it in your project with: ```c++ #include "simdjson.h" using namespace simdjson; // optional using namespace simdjson::builtin; // optional, for ondemand ``` You can compile with: ``` c++ -march=native myproject.cpp simdjson.cpp ``` Note: - Users on macOS and other platforms where compilers do not provide C++11 compliant by default should request it with the appropriate flag (e.g., `c++ -march=native -std=c++17 myproject.cpp simdjson.cpp`). ### The native architecture flag Passing `-march=native` to the compiler makes On Demand much faster by allowing it to use optimizations specific to your machine. You cannot do this, however, if you are compiling code that might be run on less advanced machines. On Demand uses advanced architecture-specific code for many common processors to make JSON preprocessing and string parsing faster. By default, however, most c++ compilers will compile to the least common denominator (since the program could theoretically be run anywhere). Since On Demand is inlined into your own code, it cannot use these advanced versions unless the compiler is told to target them. -march=native says "target the current computer," which is a reasonable default for many applications which both compile and run on the same processor. The Basics: Loading and Parsing JSON Documents ---------------------------------------------- The simdjson library offers a simple DOM tree API, which you can access by creating a `ondemand::parser` and calling the `iterate()` method: ```c++ ondemand::parser parser; auto json = padded_string::load("twitter.json"); ondemand::document doc = parser.iterate(json); // load and parse a file ``` Or by creating a padded string (for efficiency reasons, simdjson requires a string with SIMDJSON_PADDING bytes at the end) and calling `iterate()`: ```c++ ondemand::parser parser; auto json = "[1,2,3]"_padded; // The _padded suffix creates a simdjson::padded_string instance ondemand::document doc = parser.iterate(json); // parse a string ``` Documents Are Iterators ----------------------- A `document` is *not* a fully-parsed JSON value; rather, it is an **iterator** over the JSON text. This means that while you iterate an array, or search for a field in an object, it is actually walking through the original JSON text, merrily reading commas and colons and brackets to make sure you get where you're going. This is the key to On Demand's performance: since it's just an iterator, it lets you parse values as you use them. And particularly, it lets you *skip* values you don't want to use. ### Parser, Document and JSON Scope Because a document is an iterator over the JSON text, both the JSON text and the parser must remain alive (in scope) while you are using it. Further, a `parser` may have at most one document open at a time, since it holds allocated memory used for the parsing. During the `iterate` call, the original JSON text is never modified--only read. After you're done with the document, the source (whether file or string) can be safely discarded. For best performance, a `parser` instance should be reused over several files: otherwise you will needlessly reallocate memory, an expensive process. It is also possible to avoid entirely memory allocations during parsing when using simdjson. [See our performance notes for details](performance.md). Using the Parsed JSON --------------------- Once you have a document, you can navigate it with idiomatic C++ iterators, operators and casts. The following show how to use the JSON when exceptions are enabled, but simdjson has full, idiomatic support for users who avoid exceptions. See [the simdjson DOM API's error handling documentation](basics.md#error-handling) for more. * **Extracting Values:** You can cast a JSON element to a native type: `double(element)` or `double x = json_element`. This works for double, uint64_t, int64_t, bool, ondemand::object and ondemand::array. At this point, the number, string or boolean will be parsed, or the initial `[` or `{` will be verified. An exception is thrown if the cast is not possible. > IMPORTANT NOTE: values can only be parsed once. Since documents are *iterators*, once you have > parsed a value (such as by casting to double), you can't get at it again. * **Field Access:** To get the value of the "foo" field in an object, use `object["foo"]`. This will scan through the object looking for the field with the matching string. > NOTE: simdjson does *not* unescape keys when matching. This is not generally a problem for > applications with well-defined key names (which generally do not use escapes). If you do need this > support, it's best to iterate through the object fields to find the field you are looking for. > > By default, field lookup is order-insensitive, so you can look up values in any order. However, > we still encourage you to look up fields in the order you expect them in the JSON, as it is still > much faster. > > If you want to enforce finding fields in order, you can use `object.find_field("foo")` instead. > This will only look forward, and will fail to find fields in the wrong order: for example, this > will fail: > > ```c++ > ondemand::parser parser; > auto json = R"( { "x": 1, "y": 2 } )"_padded; > auto doc = parser.iterate(json); > double y = doc.find_field("y"); // The cursor is now after the 2 (at }) > double x = doc.find_field("x"); // This fails, because there are no more fields after "y" > ``` > > By contrast, using the default (order-insensitive) lookup succeeds: > > ```c++ > ondemand::parser parser; > auto json = R"( { "x": 1, "y": 2 } )"_padded; > auto doc = parser.iterate(json); > double y = doc["y"]; // The cursor is now after the 2 (at }) > double x = doc["x"]; // Success: [] loops back around to find "x" > ``` * **Array Iteration:** To iterate through an array, use `for (auto value : array) { ... }`. This will step through each value in the JSON array. If you know the type of the value, you can cast it right there, too! `for (double value : array) { ... }`. * **Object Iteration:** You can iterate through an object's fields, as well: `for (auto field : object) { ... }` - `field.unescaped_key()` will get you the key string. - `field.value()` will get you the value, which you can then use all these other methods on. * **Array Index:** Because it is forward-only, you cannot look up an array element by index. Instead, you will need to iterate through the array and keep an index yourself. ### Examples The following code illustrates many of the above concepts: ```c++ ondemand::parser parser; auto cars_json = R"( [ { "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] }, { "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] }, { "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] } ] )"_padded; // Iterating through an array of objects for (ondemand::object car : parser.iterate(cars_json)) { // Accessing a field by name cout << "Make/Model: " << std::string_view(car["make"]) << "/" << std::string_view(car["model"]) << endl; // Casting a JSON element to an integer uint64_t year = car["year"]; cout << "- This car is " << 2020 - year << "years old." << endl; // Iterating through an array of floats double total_tire_pressure = 0; for (double tire_pressure : car["tire_pressure"]) { total_tire_pressure += tire_pressure; } cout << "- Average tire pressure: " << (total_tire_pressure / 4) << endl; } ``` Here is a different example illustrating the same ideas: ```C++ ondemand::parser parser; auto points_json = R"( [ { "12345" : {"x":12.34, "y":56.78, "z": 9998877} }, { "12545" : {"x":11.44, "y":12.78, "z": 11111111} } ] )"_padded; // Parse and iterate through an array of objects for (ondemand::object points : parser.iterate(points_json)) { for (auto point : points) { cout << "id: " << std::string_view(point.unescaped_key()) << ": ("; cout << point.value()["x"].get_double() << ", "; cout << point.value()["y"].get_double() << ", "; cout << point.value()["z"].get_int64() << endl; } } ``` And another one: ```C++ auto abstract_json = R"( { "str" : { "123" : {"abc" : 3.14 } } } )"_padded; ondemand::parser parser; auto doc = parser.iterate(abstract_json); cout << doc["str"]["123"]["abc"].get_double() << endl; // Prints 3.14 ``` * **Extracting Values (without exceptions):** You can use a variant usage of `get()` with error codes to avoid exceptions. You first declare the variable of the appropriate type (`double`, `uint64_t`, `int64_t`, `bool`, `ondemand::object` and `ondemand::array`) and pass it by reference to `get()` which gives you back an error code: e.g., ```c++ auto abstract_json = R"( { "str" : { "123" : {"abc" : 3.14 } } } )"_padded; ondemand::parser parser; double value; auto doc = parser.iterate(abstract_json); auto error = doc["str"]["123"]["abc"].get(value); if (error) { std::cerr << error << std::endl; return EXIT_FAILURE; } cout << value << endl; // Prints 3.14 ```