The simdjson library offers two distinct approaches on how to access a JSON document. We support
a conventional Document-Object-Model (DOM) front-end. In such a scenario, the JSON document is
entirely parsed, validated and materialized in memory as the first step. The programmer may
then access the parsed data using this in-memory model.
The Basics: Loading and Parsing JSON Documents using the DOM front-end
----------------------------------------------
The simdjson library offers a simple DOM tree API, which you can access by creating a
`dom::parser` and calling the `load()` method:
```c++
dom::parser parser;
dom::element doc = parser.load(filename); // load and parse a file
```
Or by creating a padded string (for efficiency reasons, simdjson requires a string with
SIMDJSON_PADDING bytes at the end) and calling `parse()`:
```c++
dom::parser parser;
dom::element doc = parser.parse("[1,2,3]"_padded); // parse a string, the _padded suffix creates a simdjson::padded_string instance
```
The parsed document resulting from the `parser.load` and `parser.parse` calls depends on the `parser` instance. Thus the `parser` instance must remain in scope. Furthermore, you must have at most one parsed document in play per `parser` instance.
You cannot copy a `parser` instance, you may only move it.
If you need to keep a document around long term, you can keep or move the parser instance. Note that moving a parser instance, or keeping one in a movable data structure like vector or map, can cause any outstanding `element`, `object` or `array` instances to be invalidated. If you need to store a parser in a movable data structure, you should use a `std::unique_ptr` to avoid this invalidation(e.g., `std::unique_ptr<dom::parser> parser(new dom::parser{})`).
During the`load` or `parse` calls, neither the input file nor the input string are ever modified. After calling `load` or `parse`, the source (either a file or a string) can be safely discarded. All of the JSON data is stored in the `parser` instance. The parsed document is also immutable in simdjson: you do not modify it by accessing it.
For best performance, a `parser` instance should be reused over several files: otherwise you will needlessly reallocate memory, an expensive process. It is also possible to avoid entirely memory allocations during parsing when using simdjson. [See our performance notes for details](performance.md).
If you need a lower-level interface, you may call the function `parser.parse(const char * p, size_t l)` on a pointer `p` while specifying the
length of your input `l` in bytes. To see how to get the very best performance from a low-level approach, you way want to read our [performance notes](https://github.com/simdjson/simdjson/blob/master/doc/performance.md#padding-and-temporary-copies) on this topic (see the Padding and Temporary Copies section).
Using the Parsed JSON
---------------------
Once you have an element, you can navigate it with idiomatic C++ iterators, operators and casts.
* **Extracting Values (with exceptions):** You can cast a JSON element to a native type: `double(element)` or
`double x = json_element`. This works for double, uint64_t, int64_t, bool,
dom::object and dom::array. An exception is thrown if the cast is not possible.
* **Extracting Values (without exceptions):** You can use a variant usage of `get()` with error codes to avoid exceptions. You first declare the variable of the appropriate type (`double`, `uint64_t`, `int64_t`, `bool`,
`dom::object` and `dom::array`) and pass it by reference to `get()` which gives you back an error code: e.g.,
double value; // variable where we store the value to be parsed
error = parser.parse(numberstring).get(value);
if (error) { std::cerr <<error<<std::endl;returnEXIT_FAILURE;}
std::cout << "I parsed " <<value<<"from"<<numberstring.data()<<std::endl;
```
* **Field Access:** To get the value of the "foo" field in an object, use `object["foo"]`.
* **Array Iteration:** To iterate through an array, use `for (auto value : array) { ... }`. If you
know the type of the value, you can cast it right there, too! `for (double value : array) { ... }`
* **Object Iteration:** You can iterate through an object's fields, too: `for (auto [key, value] : object)`
* **Array Index:** To get at an array value by index, use the at() method: `array.at(0)` gets the
first element.
> Note that array[0] does not compile, because implementing [] gives the impression indexing is a
> O(1) operation, which it is not presently in simdjson. Instead, you should iterate over the elements
> using a for-loop, as in our examples.
* **Array and Object size** Given an array or an object, you can get its size (number of elements or keys)
with the `size()` method.
* **Checking an Element Type:** You can check an element's type with `element.type()`. It
returns an `element_type` with values such as `simdjson::dom::element_type::ARRAY`, `simdjson::dom::element_type::OBJECT`, `simdjson::dom::element_type::INT64`, `simdjson::dom::element_type::UINT64`,`simdjson::dom::element_type::DOUBLE`, `simdjson::dom::element_type::BOOL` or, `simdjson::dom::element_type::NULL_VALUE`.
* **Output to streams and strings:** Given a document or an element (or node) out of a JSON document, you can output a minified string version using the C++ stream idiom (`out <<element`).Youcanalsorequesttheconstructionofaminifiedstringversion(`simdjson::minify(element)`).Numbersareserializedas64-bitfloating-pointnumbers(`double`).
double v = parser.parse(abstract_json)["str"]["123"]["abc"];
cout << "number: " <<v<<endl;
```
C++17 Support
-------------
While the simdjson library can be used in any project using C++ 11 and above, field iteration has special support C++ 17's destructuring syntax. For example:
auto error = parser.parse(abstract_json)["str"]["123"]["abc"].get(v);
if (error) { cerr <<error<<endl;exit(1);}
cout << "number: " <<v<<endl;
```
Notice how we can string several operations (`parser.parse(abstract_json)["str"]["123"]["abc"].get(v)`) and only check for the error once, a strategy we call *error chaining*.
The next two functions will take as input a JSON document containing an array with a single element, either a string or a number. They return true upon success.
// This reuses the existing buffers, and reuses and *overwrites* the old document
doc = parser.parse("[1, 2, 3]"_padded);
cout <<doc<<endl;
// This also reuses the existing buffers, and reuses and *overwrites* the old document
dom::element doc2 = parser.parse("true"_padded);
// Even if you keep the old reference around, doc and doc2 refer to the same document.
cout <<doc<<endl;
cout <<doc2<<endl;
```
It's not just internal buffers though. The simdjson library reuses the document itself. The dom::element, dom::object and dom::array instances are *references* to the internal document.
You are only *borrowing* the document from simdjson, which purposely reuses and overwrites it each
time you call parse. This prevent wasteful and unnecessary memory allocation in 99% of cases where
JSON is just read, used, and converted to native values or thrown away.
> **You are only borrowing the document from the simdjson parser. Don't keep it long term!**
This is key: don't keep the `document&`, `dom::element`, `dom::array`, `dom::object`
or `string_view` objects you get back from the API. Convert them to C++ native values, structs and
arrays that you own.
Server Loops: Long-Running Processes and Memory Capacity
The simdjson library automatically expands its memory capacity when larger documents are parsed, so
that you don't unexpectedly fail. In a short process that reads a bunch of files and then exits,
this works pretty flawlessly.
Server loops, though, are long-running processes that will keep the parser around forever. This
means that if you encounter a really, really large document, simdjson will not resize back down.
The simdjson library lets you adjust your allocation strategy to prevent your server from growing
without bound:
* You can set a *max capacity* when constructing a parser:
```c++
dom::parser parser(1000*1000); // Never grow past documents > 1MB
for (web_request request : listen()) {
dom::element doc;
auto error = parser.parse(request.body).get(doc);
// If the document was above our limit, emit 413 = payload too large
if (error == CAPACITY) { request.respond(413); continue; }
// ...
}
```
This parser will grow normally as it encounters larger documents, but will never pass 1MB.
* You can set a *fixed capacity* that never grows, as well, which can be excellent for
predictability and reliability, since simdjson will never call malloc after startup!
```c++
dom::parser parser(0); // This parser will refuse to automatically grow capacity
auto error = parser.allocate(1000*1000); // This allocates enough capacity to handle documents <= 1MB
if (error) { cerr <<error<<endl;exit(1);}
for (web_request request : listen()) {
dom::element doc;
error = parser.parse(request.body).get(doc);
// If the document was above our limit, emit 413 = payload too large
if (error == CAPACITY) { request.respond(413); continue; }
// ...
}
```
Best Use of the DOM API
-------------------------
The simdjson API provides access to the JSON DOM (document-object-model) content as a tree of `dom::element` instances, each representing an object, an array or an atomic type (null, true, false, number). These `dom::element` instances are lightweight objects (e.g., spanning 16 bytes) and it might be advantageous to pass them by value, as opposed to passing them by reference or by pointer.
Padding and Temporary Copies
--------------
The simdjson function `parser.parse` reads data from a padded buffer, containing SIMDJSON_PADDING extra bytes added at the end.
If you are passing a `padded_string` to `parser.parse` or loading the JSON directly from
disk (`parser.load`), padding is automatically handled.
When calling `parser.parse` on a pointer (e.g., `parser.parse(my_char_pointer, my_length_in_bytes)`) a temporary copy is made by default with adequate padding and you, again, do not need to be concerned with padding.
Some users may not be able use our `padded_string` class or to load the data directly from disk (`parser.load`). They may need to pass data pointers to the library. If these users wish to avoid temporary copies and corresponding temporary memory allocations, they may want to call `parser.parse` with the `realloc_if_needed` parameter set to false (e.g., `parser.parse(my_char_pointer, my_length_in_bytes, false)`). In such cases, they need to ensure that there are at least SIMDJSON_PADDING extra bytes at the end that can be safely accessed and read. They do not need to initialize the padded bytes to any value in particular. The following example is safe:
simdjson::dom::element element = parser.parse(padded_json_copy.get(), json_len, false);
````
Setting the `realloc_if_needed` parameter `false` in this manner may lead to better performance since copies are avoided, but it requires that the user takes more responsibilities: the simdjson library cannot verify that the input buffer was padded with SIMDJSON_PADDING extra bytes.