520 lines
20 KiB
Markdown
520 lines
20 KiB
Markdown
The Document-Object-Model (DOM) front-end
|
|
==========
|
|
|
|
An overview of what you need to know to use simdjson, with examples.
|
|
|
|
* [DOM vs On Demand](#dom-vs-ondemand)
|
|
* [The Basics: Loading and Parsing JSON Documents](#the-basics-loading-and-parsing-json-documents)
|
|
* [Using the Parsed JSON](#using-the-parsed-json)
|
|
* [C++17 Support](#c17-support)
|
|
* [JSON Pointer](#json-pointer)
|
|
* [Error Handling](#error-handling)
|
|
* [Error Handling Example](#error-handling-example)
|
|
* [Exceptions](#exceptions)
|
|
* [Tree Walking and JSON Element Types](#tree-walking-and-json-element-types)
|
|
|
|
DOM vs On Demand
|
|
----------------------------------------------
|
|
|
|
The simdjson library offers two distinct approaches on how to access a JSON document. We support
|
|
a conventional Document-Object-Model (DOM) front-end. In such a scenario, the JSON document is
|
|
entirely parsed, validated and materialized in memory as the first step. The programmer may
|
|
then access the parsed data using this in-memory model.
|
|
|
|
The Basics: Loading and Parsing JSON Documents using the DOM front-end
|
|
----------------------------------------------
|
|
|
|
The simdjson library offers a simple DOM tree API, which you can access by creating a
|
|
`dom::parser` and calling the `load()` method:
|
|
|
|
```c++
|
|
dom::parser parser;
|
|
dom::element doc = parser.load(filename); // load and parse a file
|
|
```
|
|
|
|
Or by creating a padded string (for efficiency reasons, simdjson requires a string with
|
|
SIMDJSON_PADDING bytes at the end) and calling `parse()`:
|
|
|
|
```c++
|
|
dom::parser parser;
|
|
dom::element doc = parser.parse("[1,2,3]"_padded); // parse a string, the _padded suffix creates a simdjson::padded_string instance
|
|
```
|
|
|
|
The parsed document resulting from the `parser.load` and `parser.parse` calls depends on the `parser` instance. Thus the `parser` instance must remain in scope. Furthermore, you must have at most one parsed document in play per `parser` instance.
|
|
You cannot copy a `parser` instance, you may only move it.
|
|
|
|
If you need to keep a document around long term, you can keep or move the parser instance. Note that moving a parser instance, or keeping one in a movable data structure like vector or map, can cause any outstanding `element`, `object` or `array` instances to be invalidated. If you need to store a parser in a movable data structure, you should use a `std::unique_ptr` to avoid this invalidation(e.g., `std::unique_ptr<dom::parser> parser(new dom::parser{})`).
|
|
|
|
During the`load` or `parse` calls, neither the input file nor the input string are ever modified. After calling `load` or `parse`, the source (either a file or a string) can be safely discarded. All of the JSON data is stored in the `parser` instance. The parsed document is also immutable in simdjson: you do not modify it by accessing it.
|
|
|
|
For best performance, a `parser` instance should be reused over several files: otherwise you will needlessly reallocate memory, an expensive process. It is also possible to avoid entirely memory allocations during parsing when using simdjson. [See our performance notes for details](performance.md).
|
|
|
|
If you need a lower-level interface, you may call the function `parser.parse(const char * p, size_t l)` on a pointer `p` while specifying the
|
|
length of your input `l` in bytes. To see how to get the very best performance from a low-level approach, you way want to read our [performance notes](https://github.com/simdjson/simdjson/blob/master/doc/performance.md#padding-and-temporary-copies) on this topic (see the Padding and Temporary Copies section).
|
|
|
|
|
|
Using the Parsed JSON
|
|
---------------------
|
|
|
|
Once you have an element, you can navigate it with idiomatic C++ iterators, operators and casts.
|
|
|
|
* **Extracting Values (with exceptions):** You can cast a JSON element to a native type: `double(element)` or
|
|
`double x = json_element`. This works for double, uint64_t, int64_t, bool,
|
|
dom::object and dom::array. An exception is thrown if the cast is not possible.
|
|
* **Extracting Values (without exceptions):** You can use a variant usage of `get()` with error codes to avoid exceptions. You first declare the variable of the appropriate type (`double`, `uint64_t`, `int64_t`, `bool`,
|
|
`dom::object` and `dom::array`) and pass it by reference to `get()` which gives you back an error code: e.g.,
|
|
```c++
|
|
simdjson::error_code error;
|
|
simdjson::padded_string numberstring = "1.2"_padded; // our JSON input ("1.2")
|
|
simdjson::dom::parser parser;
|
|
double value; // variable where we store the value to be parsed
|
|
error = parser.parse(numberstring).get(value);
|
|
if (error) { std::cerr << error << std::endl; return EXIT_FAILURE; }
|
|
std::cout << "I parsed " << value << " from " << numberstring.data() << std::endl;
|
|
```
|
|
* **Field Access:** To get the value of the "foo" field in an object, use `object["foo"]`.
|
|
* **Array Iteration:** To iterate through an array, use `for (auto value : array) { ... }`. If you
|
|
know the type of the value, you can cast it right there, too! `for (double value : array) { ... }`
|
|
* **Object Iteration:** You can iterate through an object's fields, too: `for (auto [key, value] : object)`
|
|
* **Array Index:** To get at an array value by index, use the at() method: `array.at(0)` gets the
|
|
first element.
|
|
> Note that array[0] does not compile, because implementing [] gives the impression indexing is a
|
|
> O(1) operation, which it is not presently in simdjson. Instead, you should iterate over the elements
|
|
> using a for-loop, as in our examples.
|
|
* **Array and Object size** Given an array or an object, you can get its size (number of elements or keys)
|
|
with the `size()` method.
|
|
* **Checking an Element Type:** You can check an element's type with `element.type()`. It
|
|
returns an `element_type` with values such as `simdjson::dom::element_type::ARRAY`, `simdjson::dom::element_type::OBJECT`, `simdjson::dom::element_type::INT64`, `simdjson::dom::element_type::UINT64`,`simdjson::dom::element_type::DOUBLE`, `simdjson::dom::element_type::BOOL` or, `simdjson::dom::element_type::NULL_VALUE`.
|
|
* **Output to streams and strings:** Given a document or an element (or node) out of a JSON document, you can output a minified string version using the C++ stream idiom (`out << element`). You can also request the construction of a minified string version (`simdjson::minify(element)`).
|
|
|
|
### Examples
|
|
|
|
The following code illustrates all of the above:
|
|
|
|
```c++
|
|
auto cars_json = R"( [
|
|
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
|
|
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
|
|
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
|
|
] )"_padded;
|
|
dom::parser parser;
|
|
|
|
// Iterating through an array of objects
|
|
for (dom::object car : parser.parse(cars_json)) {
|
|
// Accessing a field by name
|
|
cout << "Make/Model: " << car["make"] << "/" << car["model"] << endl;
|
|
|
|
// Casting a JSON element to an integer
|
|
uint64_t year = car["year"];
|
|
cout << "- This car is " << 2020 - year << "years old." << endl;
|
|
|
|
// Iterating through an array of floats
|
|
double total_tire_pressure = 0;
|
|
for (double tire_pressure : car["tire_pressure"]) {
|
|
total_tire_pressure += tire_pressure;
|
|
}
|
|
cout << "- Average tire pressure: " << (total_tire_pressure / 4) << endl;
|
|
|
|
// Writing out all the information about the car
|
|
for (auto field : car) {
|
|
cout << "- " << field.key << ": " << field.value << endl;
|
|
}
|
|
}
|
|
```
|
|
|
|
Here is a different example illustrating the same ideas:
|
|
|
|
```C++
|
|
auto abstract_json = R"( [
|
|
{ "12345" : {"a":12.34, "b":56.78, "c": 9998877} },
|
|
{ "12545" : {"a":11.44, "b":12.78, "c": 11111111} }
|
|
] )"_padded;
|
|
dom::parser parser;
|
|
|
|
// Parse and iterate through an array of objects
|
|
for (dom::object obj : parser.parse(abstract_json)) {
|
|
for(const auto key_value : obj) {
|
|
cout << "key: " << key_value.key << " : ";
|
|
dom::object innerobj = key_value.value;
|
|
cout << "a: " << double(innerobj["a"]) << ", ";
|
|
cout << "b: " << double(innerobj["b"]) << ", ";
|
|
cout << "c: " << int64_t(innerobj["c"]) << endl;
|
|
}
|
|
}
|
|
```
|
|
|
|
And another one:
|
|
|
|
|
|
```C++
|
|
auto abstract_json = R"(
|
|
{ "str" : { "123" : {"abc" : 3.14 } } } )"_padded;
|
|
dom::parser parser;
|
|
double v = parser.parse(abstract_json)["str"]["123"]["abc"];
|
|
cout << "number: " << v << endl;
|
|
```
|
|
|
|
|
|
C++17 Support
|
|
-------------
|
|
|
|
While the simdjson library can be used in any project using C++ 11 and above, field iteration has special support C++ 17's destructuring syntax. For example:
|
|
|
|
```c++
|
|
padded_string json = R"( { "foo": 1, "bar": 2 } )"_padded;
|
|
dom::parser parser;
|
|
dom::object object;
|
|
auto error = parser.parse(json).get(object);
|
|
if (error) { cerr << error << endl; return; }
|
|
for (auto [key, value] : object) {
|
|
cout << key << " = " << value << endl;
|
|
}
|
|
```
|
|
|
|
For comparison, here is the C++ 11 version of the same code:
|
|
|
|
```c++
|
|
// C++ 11 version for comparison
|
|
padded_string json = R"( { "foo": 1, "bar": 2 } )"_padded;
|
|
dom::parser parser;
|
|
dom::object object;
|
|
auto error = parser.parse(json).get(object);
|
|
if (error) { cerr << error << endl; return; }
|
|
for (dom::key_value_pair field : object) {
|
|
cout << field.key << " = " << field.value << endl;
|
|
}
|
|
```
|
|
|
|
|
|
JSON Pointer
|
|
------------
|
|
|
|
The simdjson library also supports [JSON pointer](https://tools.ietf.org/html/rfc6901) through the
|
|
`at_pointer()` method, letting you reach further down into the document in a single call:
|
|
|
|
```c++
|
|
auto cars_json = R"( [
|
|
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
|
|
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
|
|
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
|
|
] )"_padded;
|
|
dom::parser parser;
|
|
dom::element cars = parser.parse(cars_json);
|
|
cout << cars.at_pointer("/0/tire_pressure/1") << endl; // Prints 39.9
|
|
```
|
|
|
|
A JSON Path is a sequence of segments each starting with the '/' character. Within arrays, an integer
|
|
index allows you to select the indexed node. Within objects, the string value of the key allows you to
|
|
select the value. If your keys contain the characters '/' or '~', they must be escaped as '~1' and
|
|
'~0' respectively. An empty JSON Path refers to the whole document.
|
|
|
|
We also extend the JSON Pointer support to include *relative* paths.
|
|
You can apply a JSON path to any node and the path gets interpreted relatively, as if the currrent node were a whole JSON document.
|
|
|
|
Consider the following example:
|
|
|
|
```c++
|
|
auto cars_json = R"( [
|
|
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
|
|
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
|
|
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
|
|
] )"_padded;
|
|
dom::parser parser;
|
|
dom::element cars = parser.parse(cars_json);
|
|
cout << cars.at_pointer("/0/tire_pressure/1") << endl; // Prints 39.9
|
|
for (dom::element car_element : cars) {
|
|
dom::object car;
|
|
simdjson::error_code error;
|
|
if ((error = car_element.get(car))) { std::cerr << error << std::endl; return; }
|
|
double x = car.at_pointer("/tire_pressure/1");
|
|
cout << x << endl; // Prints 39.9, 31 and 30
|
|
}
|
|
```
|
|
|
|
|
|
|
|
Error Handling
|
|
--------------
|
|
|
|
All simdjson APIs that can fail return `simdjson_result<T>`, which is a <value, error_code>
|
|
pair. You can retrieve the value with .get(), like so:
|
|
|
|
```c++
|
|
dom::element doc;
|
|
auto error = parser.parse(json).get(doc);
|
|
if (error) { cerr << error << endl; exit(1); }
|
|
```
|
|
|
|
When you use the code this way, it is your responsibility to check for error before using the
|
|
result: if there is an error, the result value will not be valid and using it will caused undefined
|
|
behavior.
|
|
|
|
We can write a "quick start" example where we attempt to parse the following JSON file and access some data, without triggering exceptions:
|
|
```JavaScript
|
|
{
|
|
"statuses": [
|
|
{
|
|
"id": 505874924095815700
|
|
},
|
|
{
|
|
"id": 505874922023837700
|
|
}
|
|
],
|
|
"search_metadata": {
|
|
"count": 100
|
|
}
|
|
}
|
|
```
|
|
|
|
Our program loads the file, selects value corresponding to key "search_metadata" which expected to be an object, and then
|
|
it selects the key "count" within that object.
|
|
|
|
```C++
|
|
#include "simdjson.h"
|
|
|
|
int main(void) {
|
|
simdjson::dom::parser parser;
|
|
simdjson::dom::element tweets;
|
|
auto error = parser.load("twitter.json").get(tweets);
|
|
if (error) { std::cerr << error << std::endl; return EXIT_FAILURE; }
|
|
|
|
simdjson::dom::element res;
|
|
if ((error = tweets["search_metadata"]["count"].get(res))) {
|
|
std::cerr << "could not access keys" << std::endl;
|
|
return EXIT_FAILURE;
|
|
}
|
|
std::cout << res << " results." << std::endl;
|
|
}
|
|
```
|
|
|
|
The following is a similar example where one wants to get the id of the first tweet without
|
|
triggering exceptions. To do this, we use `["statuses"].at(0)["id"]`. We break that expression down:
|
|
|
|
- Get the list of tweets (the `"statuses"` key of the document) using `["statuses"]`). The result is expected to be an array.
|
|
- Get the first tweet using `.at(0)`. The result is expected to be an object.
|
|
- Get the id of the tweet using ["id"]. We expect the value to be a non-negative integer.
|
|
|
|
Observe how we use the `at` method when querying an index into an array, and not the bracket operator.
|
|
|
|
```C++
|
|
#include "simdjson.h"
|
|
|
|
int main(void) {
|
|
simdjson::dom::parser parser;
|
|
simdjson::dom::element tweets;
|
|
auto error = parser.load("twitter.json").get(tweets);
|
|
if(error) { std::cerr << error << std::endl; return EXIT_FAILURE; }
|
|
uint64_t identifier;
|
|
error = tweets["statuses"].at(0)["id"].get(identifier);
|
|
if(error) { std::cerr << error << std::endl; return EXIT_FAILURE; }
|
|
std::cout << identifier << std::endl;
|
|
return EXIT_SUCCESS;
|
|
}
|
|
```
|
|
|
|
### Error Handling Example
|
|
|
|
This is how the example in "Using the Parsed JSON" could be written using only error code checking:
|
|
|
|
```c++
|
|
auto cars_json = R"( [
|
|
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
|
|
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
|
|
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
|
|
] )"_padded;
|
|
dom::parser parser;
|
|
dom::array cars;
|
|
auto error = parser.parse(cars_json).get(cars);
|
|
if (error) { cerr << error << endl; exit(1); }
|
|
|
|
// Iterating through an array of objects
|
|
for (dom::element car_element : cars) {
|
|
dom::object car;
|
|
if ((error = car_element.get(car))) { cerr << error << endl; exit(1); }
|
|
|
|
// Accessing a field by name
|
|
std::string_view make, model;
|
|
if ((error = car["make"].get(make))) { cerr << error << endl; exit(1); }
|
|
if ((error = car["model"].get(model))) { cerr << error << endl; exit(1); }
|
|
cout << "Make/Model: " << make << "/" << model << endl;
|
|
|
|
// Casting a JSON element to an integer
|
|
uint64_t year;
|
|
if ((error = car["year"].get(year))) { cerr << error << endl; exit(1); }
|
|
cout << "- This car is " << 2020 - year << "years old." << endl;
|
|
|
|
// Iterating through an array of floats
|
|
double total_tire_pressure = 0;
|
|
dom::array tire_pressure_array;
|
|
if ((error = car["tire_pressure"].get(tire_pressure_array))) { cerr << error << endl; exit(1); }
|
|
for (dom::element tire_pressure_element : tire_pressure_array) {
|
|
double tire_pressure;
|
|
if ((error = tire_pressure_element.get(tire_pressure))) { cerr << error << endl; exit(1); }
|
|
total_tire_pressure += tire_pressure;
|
|
}
|
|
cout << "- Average tire pressure: " << (total_tire_pressure / 4) << endl;
|
|
|
|
// Writing out all the information about the car
|
|
for (auto field : car) {
|
|
cout << "- " << field.key << ": " << field.value << endl;
|
|
}
|
|
}
|
|
```
|
|
|
|
Here is another example:
|
|
|
|
```C++
|
|
auto abstract_json = R"( [
|
|
{ "12345" : {"a":12.34, "b":56.78, "c": 9998877} },
|
|
{ "12545" : {"a":11.44, "b":12.78, "c": 11111111} }
|
|
] )"_padded;
|
|
dom::parser parser;
|
|
dom::array array;
|
|
auto error = parser.parse(abstract_json).get(array);
|
|
if (error) { cerr << error << endl; exit(1); }
|
|
// Iterate through an array of objects
|
|
for (dom::element elem : array) {
|
|
dom::object obj;
|
|
if ((error = elem.get(obj))) { cerr << error << endl; exit(1); }
|
|
for (auto & key_value : obj) {
|
|
cout << "key: " << key_value.key << " : ";
|
|
dom::object innerobj;
|
|
if ((error = key_value.value.get(innerobj))) { cerr << error << endl; exit(1); }
|
|
|
|
double va, vb;
|
|
if ((error = innerobj["a"].get(va))) { cerr << error << endl; exit(1); }
|
|
cout << "a: " << va << ", ";
|
|
if ((error = innerobj["b"].get(vc))) { cerr << error << endl; exit(1); }
|
|
cout << "b: " << vb << ", ";
|
|
|
|
int64_t vc;
|
|
if ((error = innerobj["c"].get(vc))) { cerr << error << endl; exit(1); }
|
|
cout << "c: " << vc << endl;
|
|
}
|
|
}
|
|
```
|
|
|
|
And another one:
|
|
|
|
```C++
|
|
auto abstract_json = R"(
|
|
{ "str" : { "123" : {"abc" : 3.14 } } } )"_padded;
|
|
dom::parser parser;
|
|
double v;
|
|
auto error = parser.parse(abstract_json)["str"]["123"]["abc"].get(v);
|
|
if (error) { cerr << error << endl; exit(1); }
|
|
cout << "number: " << v << endl;
|
|
```
|
|
|
|
Notice how we can string several operations (`parser.parse(abstract_json)["str"]["123"]["abc"].get(v)`) and only check for the error once, a strategy we call *error chaining*.
|
|
|
|
The next two functions will take as input a JSON document containing an array with a single element, either a string or a number. They return true upon success.
|
|
|
|
```C++
|
|
simdjson::dom::parser parser{};
|
|
|
|
bool parse_double(const char *j, double &d) {
|
|
auto error = parser.parse(j, std::strlen(j))
|
|
.at(0)
|
|
.get(d, error);
|
|
if (error) { return false; }
|
|
return true;
|
|
}
|
|
|
|
bool parse_string(const char *j, std::string &s) {
|
|
std::string_view answer;
|
|
auto error = parser.parse(j,strlen(j))
|
|
.at(0)
|
|
.get(answer, error);
|
|
if (error) { return false; }
|
|
s.assign(answer.data(), answer.size());
|
|
return true;
|
|
}
|
|
```
|
|
|
|
To ensure you don't write any code that uses exceptions, compile with `SIMDJSON_EXCEPTIONS=OFF`. For example, if including the project via cmake:
|
|
|
|
```cmake
|
|
target_compile_definitions(simdjson PUBLIC SIMDJSON_EXCEPTIONS=OFF)
|
|
```
|
|
|
|
### Exceptions
|
|
|
|
Users more comfortable with an exception flow may choose to directly cast the `simdjson_result<T>` to the desired type:
|
|
|
|
```c++
|
|
dom::element doc = parser.parse(json); // Throws an exception if there was an error!
|
|
```
|
|
|
|
When used this way, a `simdjson_error` exception will be thrown if an error occurs, preventing the
|
|
program from continuing if there was an error.
|
|
|
|
|
|
If one is willing to trigger exceptions, it is possible to write simpler code:
|
|
|
|
```C++
|
|
#include "simdjson.h"
|
|
|
|
int main(void) {
|
|
simdjson::dom::parser parser;
|
|
simdjson::dom::element tweets = parser.load("twitter.json");
|
|
std::cout << "ID: " << tweets["statuses"].at(0)["id"] << std::endl;
|
|
return EXIT_SUCCESS;
|
|
}
|
|
```
|
|
|
|
|
|
Tree Walking and JSON Element Types
|
|
-----------------------------------
|
|
|
|
Sometimes you don't necessarily have a document with a known type, and are trying to generically
|
|
inspect or walk over JSON elements. To do that, you can use iterators and the type() method. For
|
|
example, here's a quick and dirty recursive function that verbosely prints the JSON document as JSON
|
|
(* ignoring nuances like trailing commas and escaping strings, for brevity's sake):
|
|
|
|
```c++
|
|
void print_json(dom::element element) {
|
|
switch (element.type()) {
|
|
case dom::element_type::ARRAY:
|
|
cout << "[";
|
|
for (dom::element child : dom::array(element)) {
|
|
print_json(child);
|
|
cout << ",";
|
|
}
|
|
cout << "]";
|
|
break;
|
|
case dom::element_type::OBJECT:
|
|
cout << "{";
|
|
for (dom::key_value_pair field : dom::object(element)) {
|
|
cout << "\"" << field.key << "\": ";
|
|
print_json(field.value);
|
|
}
|
|
cout << "}";
|
|
break;
|
|
case dom::element_type::INT64:
|
|
cout << int64_t(element) << endl;
|
|
break;
|
|
case dom::element_type::UINT64:
|
|
cout << uint64_t(element) << endl;
|
|
break;
|
|
case dom::element_type::DOUBLE:
|
|
cout << double(element) << endl;
|
|
break;
|
|
case dom::element_type::STRING:
|
|
cout << std::string_view(element) << endl;
|
|
break;
|
|
case dom::element_type::BOOL:
|
|
cout << bool(element) << endl;
|
|
break;
|
|
case dom::element_type::NULL_VALUE:
|
|
cout << "null" << endl;
|
|
break;
|
|
}
|
|
}
|
|
|
|
void basics_treewalk_1() {
|
|
dom::parser parser;
|
|
print_json(parser.load("twitter.json"));
|
|
}
|
|
```
|