simdjson/doc/ondemand.md

231 lines
9.8 KiB
Markdown

On Demand Basics
================
On Demand is a new, faster simdjson API with all the ease-of-use you are used to. While it provides a
familiar DOM interface, under the hood it is different: it is parsing values *as you use them.*
With On Demand, you do not waste time parsing JSON you do not use, and you do not pay the cost of generating
an intermediate DOM tree.
We provide an overview of what you need to know to use the simdjson On Demand API, with examples.
* [Including ondemand](#including-on-demand)
* [The Basics: Loading and Parsing JSON Documents](#the-basics-loading-and-parsing-json-documents)
* [Using the Parsed JSON](#using-the-parsed-json)
The On Demand API supports the same JSON standards and C++ compilers as simdjson's DOM API. Refer to the DOM docs for more information:
* [Requirements](basics.md##requirements)
* [Using simdjson as a CMake dependency](#using-simdjson-as-a-cmake-dependency)
* [Error Handling](basics.md#error-handling)
* [Error Handling Example](basics.md#error-handling-example)
* [Exceptions](basics.md#exceptions)
* [Thread Safety](basics.md#thread-safety)
* [Standard Compliance](basics.md#standard-compliance)
* [C++11 Support and string_view](basics.md#c11-support-and-string_view)
* [C++17 Support](basics.md#c17-support)
* [Backwards Compatibility](basics.md#backwards-compatibility)
For deeper information about the design and implementation of the simdjson On Demand API, refer to
the [design document](ondemand.md).
Including On Demand
------------------
To include simdjson, copy [simdjson.h](/singleheader/simdjson.h) and [simdjson.cpp](/singleheader/simdjson.cpp)
into your project. Then include it in your project with:
```c++
#include "simdjson.h"
using namespace simdjson; // optional
```
You can generally compile with:
```
c++ -O3 myproject.cpp simdjson.cpp
```
Note:
- Users on macOS and other platforms where compilers do not provide C++11 compliant by default
should request it with the appropriate flag (e.g., `c++ -march=native -std=c++17 myproject.cpp simdjson.cpp`).
The Basics: Loading and Parsing JSON Documents
----------------------------------------------
The simdjson library offers a simple DOM tree API, which you can access by creating a
`ondemand::parser` and calling the `iterate()` method:
```c++
ondemand::parser parser;
auto json = padded_string::load("twitter.json");
ondemand::document doc = parser.iterate(json); // load and parse a file
```
Or by creating a padded string (for efficiency reasons, simdjson requires a string with
SIMDJSON_PADDING bytes at the end) and calling `iterate()`:
```c++
ondemand::parser parser;
auto json = "[1,2,3]"_padded; // The _padded suffix creates a simdjson::padded_string instance
ondemand::document doc = parser.iterate(json); // parse a string
```
Documents Are Iterators
-----------------------
A `document` is *not* a fully-parsed JSON value; rather, it is an **iterator** over the JSON text.
This means that while you iterate an array, or search for a field in an object, it is actually
walking through the original JSON text, merrily reading commas and colons and brackets to make sure
you get where you are going. This is the key to On Demand's performance: since it's just an iterator,
it lets you parse values as you use them. And particularly, it lets you *skip* values you do not want
to use.
### Parser, Document and JSON Scope
Because a document is an iterator over the JSON text, both the JSON text and the parser must
remain alive (in scope) while you are using it. Further, a `parser` may have at most
one document open at a time, since it holds allocated memory used for the parsing.
During the `iterate` call, the original JSON text is never modified--only read. After you are done
with the document, the source (whether file or string) can be safely discarded.
For best performance, a `parser` instance should be reused over several files: otherwise you will
needlessly reallocate memory, an expensive process. It is also possible to avoid entirely memory
allocations during parsing when using simdjson. [See our performance notes for details](performance.md).
Using the Parsed JSON
---------------------
Once you have a document, you can navigate it with idiomatic C++ iterators, operators and casts.
The following show how to use the JSON when exceptions are enabled, but simdjson has full, idiomatic
support for users who avoid exceptions. See [the simdjson DOM API's error handling documentation](basics.md#error-handling) for more.
* **Extracting Values:** You can cast a JSON element to a native type:
`double(element)` or `double x = json_element`. This works for double, uint64_t, int64_t, bool,
ondemand::object and ondemand::array. At this point, the number, string or boolean will be parsed,
or the initial `[` or `{` will be verified. An exception is thrown if the cast is not possible.
> IMPORTANT NOTE: values can only be parsed once. Since documents are *iterators*, once you have
> parsed a value (such as by casting to double), you cannot get at it again.
* **Field Access:** To get the value of the "foo" field in an object, use `object["foo"]`. This will
scan through the object looking for the field with the matching string.
> NOTE: simdjson does *not* unescape keys when matching. This is not generally a problem for
> applications with well-defined key names (which generally do not use escapes). If you do need this
> support, it's best to iterate through the object fields to find the field you are looking for.
>
> By default, field lookup is order-insensitive, so you can look up values in any order. However,
> we still encourage you to look up fields in the order you expect them in the JSON, as it is still
> much faster.
>
> If you want to enforce finding fields in order, you can use `object.find_field("foo")` instead.
> This will only look forward, and will fail to find fields in the wrong order: for example, this
> will fail:
>
> ```c++
> ondemand::parser parser;
> auto json = R"( { "x": 1, "y": 2 } )"_padded;
> auto doc = parser.iterate(json);
> double y = doc.find_field("y"); // The cursor is now after the 2 (at })
> double x = doc.find_field("x"); // This fails, because there are no more fields after "y"
> ```
>
> By contrast, using the default (order-insensitive) lookup succeeds:
>
> ```c++
> ondemand::parser parser;
> auto json = R"( { "x": 1, "y": 2 } )"_padded;
> auto doc = parser.iterate(json);
> double y = doc["y"]; // The cursor is now after the 2 (at })
> double x = doc["x"]; // Success: [] loops back around to find "x"
> ```
* **Array Iteration:** To iterate through an array, use `for (auto value : array) { ... }`. This will
step through each value in the JSON array.
If you know the type of the value, you can cast it right there, too! `for (double value : array) { ... }`.
* **Object Iteration:** You can iterate through an object's fields, as well: `for (auto field : object) { ... }`
- `field.unescaped_key()` will get you the key string.
- `field.value()` will get you the value, which you can then use all these other methods on.
* **Array Index:** Because it is forward-only, you cannot look up an array element by index. Instead,
you will need to iterate through the array and keep an index yourself.
### Examples
The following code illustrates many of the above concepts:
```c++
ondemand::parser parser;
auto cars_json = R"( [
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
] )"_padded;
// Iterating through an array of objects
for (ondemand::object car : parser.iterate(cars_json)) {
// Accessing a field by name
cout << "Make/Model: " << std::string_view(car["make"]) << "/" << std::string_view(car["model"]) << endl;
// Casting a JSON element to an integer
uint64_t year = car["year"];
cout << "- This car is " << 2020 - year << "years old." << endl;
// Iterating through an array of floats
double total_tire_pressure = 0;
for (double tire_pressure : car["tire_pressure"]) {
total_tire_pressure += tire_pressure;
}
cout << "- Average tire pressure: " << (total_tire_pressure / 4) << endl;
}
```
Here is a different example illustrating the same ideas:
```C++
ondemand::parser parser;
auto points_json = R"( [
{ "12345" : {"x":12.34, "y":56.78, "z": 9998877} },
{ "12545" : {"x":11.44, "y":12.78, "z": 11111111} }
] )"_padded;
// Parse and iterate through an array of objects
for (ondemand::object points : parser.iterate(points_json)) {
for (auto point : points) {
cout << "id: " << std::string_view(point.unescaped_key()) << ": (";
cout << point.value()["x"].get_double() << ", ";
cout << point.value()["y"].get_double() << ", ";
cout << point.value()["z"].get_int64() << endl;
}
}
```
And another one:
```C++
auto abstract_json = R"(
{ "str" : { "123" : {"abc" : 3.14 } } }
)"_padded;
ondemand::parser parser;
auto doc = parser.iterate(abstract_json);
cout << doc["str"]["123"]["abc"].get_double() << endl; // Prints 3.14
```
* **Extracting Values (without exceptions):** You can use a variant usage of `get()` with error
codes to avoid exceptions. You first declare the variable of the appropriate type (`double`,
`uint64_t`, `int64_t`, `bool`, `ondemand::object` and `ondemand::array`) and pass it by reference
to `get()` which gives you back an error code: e.g.,
```c++
auto abstract_json = R"(
{ "str" : { "123" : {"abc" : 3.14 } } }
)"_padded;
ondemand::parser parser;
double value;
auto doc = parser.iterate(abstract_json);
auto error = doc["str"]["123"]["abc"].get(value);
if (error) { std::cerr << error << std::endl; return EXIT_FAILURE; }
cout << value << endl; // Prints 3.14
```