This streaming approach means that unused fields and values are not parsed or
converted, thus saving space and time. In our example, the `"name"`, `"followers_count"`,
and `"friends_count"` keys and matching values are skipped.
Further, the On Demand API does not parse a value *at all* until you try to convert it (e.g., to `double`,
`int`, `string`, or `bool`). In our example, when accessing the key-value pair `"retweet_count": 82`, the parser
may not convert the pair of characters `82` to the binary integer 82. Because the programmer specifies the data
type, we avoid branch mispredictions related to data type determination and improve the performance.
We expect users of an On Demand API to work in terms of a JSON dialect, which is a set of expectations and
specifications that come in addition to the [JSON specification](https://www.rfc-editor.org/rfc/rfc8259.txt).
The On Demand approach is designed around several principles:
* **Streaming (\*):** It avoids preparsing values, keeping the memory usage and the latency down.
* **Forward-Only:** To prevent reiteration of the same values and to keep the number of variables down (literally), only a single index is maintained and everything uses it (even if you have nested for loops). This means when you are going through an array of arrays, for example, that the inner array loop will advance the index to the next comma, and the array can just pick it up and look at it.
* **Natural Iteration:** A JSON array or object can be iterated with a normal C++ for loop. Nested arrays and objects are supported by nested for loops.
* **Use-Specific Parsing:** Parsing is always specific to the type required by the programmer. For example, if the programmer asks for an unsigned integer, we just start parsing digits. If there were no digits, we toss an error. There are even different parsers for `double`, `uint64_t` and `int64_t` values. This use-specific parsing avoids the branchiness of a generic "type switch," and makes the code more inlineable and compact.
* **Validate What You Use:** On Demand deliberately validates the values you use and the structure leading to it, but nothing else. The goal is a guarantee that the value you asked for is the correct one and is not malformed: there must be no confusion over whether you got the right value.
To understand why On Demand is different, it is helpful to review the major
approaches to parsing and parser APIs in use today.
### DOM Parsers
Many of the most usable, popular JSON APIs (including simdjson) deserialize into a **DOM**: an intermediate tree of
objects, arrays and values. In this model, we convert the input data all at once into a tree-like structure (the DOM).
The DOM is then accessed by the programmer like any other in-memory data structure. The resulting API let
you refer to each array or object separately, using familiar techniques like iteration (`for (auto value : array)`)
or indexing (`object["key"]`). In some cases, the values are even deserialized directly into familiar C++ constructs like vectors and
maps.
The DOM approach is conceptually simple and "programmer friendly". Using the
DOM tree is often easy enough that many users use the DOM as-is instead of creating
their own custom data structures.
The DOM approach was the only way to parse JSON documents up to version 0.6 of the simdjson library.
Our DOM API looks similar to our On Demand example, except
This is a large amount of code, requiring mental gymnastics even to read. An actual implementation is harder to write
and to maintain.
Pros of the event-based approach:
* Speed and space benefits from low, predictable memory usage.
* Parsing can be done more lazily: the API can delegate work to the programmer for better performance.
* It is highly flexible: given enough effort, most tasks can be accomplished efficiently.
Cons of the event-based approach:
* Performance drain from context blindness (e.g., switch statements for "where am I in the document")
* Difficult to use (high code complexity, high maintenance, difficult to debug)
* Lacks the safety of DOM: malformed documents could be ingested.
Though an event-based approach might have its niche uses, we believe that it is rarely ideally suited. We suspect that it is mostly used when performance and memory is a concern, and no other option (except DOM) is readily available.
### Schema-Based Parser Generators
In a schema-based model, the programmer provides a description of a data structure, and the parser constructs the data structure in question during parsing. These parsers take a schema--a description of
your JSON, with field names, types, everything--and generate classes/structs in your language of
choice, as well as a parser to deserialize the JSON into those structs. Some such parsers let you
define your own data structures (`struct`) and they let a preprocessor inspects it and generates a custom JSON parser for it.
Though not all of these schema-based parser generators generate a parser or even optimize for
streaming, but they are *able* to in principle. Unlike the DOM and the event-based models, a schema-based approach assumes
that the structure of the document is known at compile-time.
Pros of the schema-based approach:
* Ease of Use is on par with DOM
* Parsers that generate iterators and lazy values in structs can keep memory pressure down to event-based levels.
* Type Blindness can be entirely solved with specific parsers for each type, saving many branches.
* Context Blindness can be solved, especially if object fields are required and in order, saving even more branches.
* Can be made a safe as DOM: the input can be entirely validated prior to ingestion.
Cons of the schema-based approach:
* It is less flexible than the DOM or event-based approaches, sometimes limited to a deserialization-to-objects scenario.
* The structure of the data must be fully known at compile-time.
### Type Blindness and Branch Misprediction
The DOM and event-based parsing model suffer from **type
blindness**: even when the programmer knows exactly what fields and what types are in the JSON document,
the parser does not. This means it has to look at each value blind with a big "switch"
statement, asking "is this a number? A string? A boolean? An array? An object?"
In modern processors, this kind of switch statement can make your program run slower
than it needs to because of the high cost of branch misprediction. Indeed, modern processor
cores rely on speculative execution for speed. They "read ahead" in your program, predicting
which instructions to run as soon as the data is available. A single-threaded program can
execute 2, 3 or even more instructions per cycle--largely because of speculative execution.
Unfortunately, when the processor mispredicts the instructions, typically due to a mispredicted
branch, all of the work done from the misprediction has be discarded and started anew. The
processor may have been executing 3 or 4 instructions per cycle, and consuming the corresponding
power, but all of the work may have been wasteful.
Type blindness means that the processor has to guess, for every JSON value, whether it will be an array,
an object, number, string or boolean since these correspond to distinct code paths.
Though some JSON files have predictable content, we find in practice that many JSON files
stress the branch prediction. Though branch predictors improve with each new generation of processors,
the cost of branch mispredictions also tends to increase as pipelines expand, and the processors become
able to schedule longer streams of instructions.
On Demand parsing is tailor-made to solve this problem at the source, parsing values only after the
user declares their type by asking for a `double`, an `int`, a `string`, etc. It attempts to do so while
preserving most of the flexibility of DOM parsing.
Algorithm
---------
To help visualize the algorithm, we'll walk through the example C++ given at the top, for this JSON:
We expect that the On Demand approach has many of the performance benefits of the schema-based approach, while providing a flexibility that is similar to that of the DOM-based approach.
* Faster than DOM in some cases. Reduced memory usage.
* Straightforward, programmer-friendly interface (arrays and objects).
* Highly expressive, beyond deserialization and pointer queries: many tasks can be accomplished with little code.
### Limitations of the On Demand Approach
The On Demand approach has some limitations:
* Because it operates in streaming mode, you only have access to the current element in the JSON document. Furthermore, the document is traversed in order so the code is sensitive to the order of the JSON nodes in the same manner as an event-based approach (e.g., SAX). (The one exception to this is field lookup, which is more *performant* when the order of lookups matches the order of fields in the document, but which will still work with out-of-order fields, with a performance hit.)
* The On Demand approach is less safe than DOM: we only validate the components of the JSON document that are used and it is possible to begin ingesting an invalid document only to find out later that the document is invalid. Are you fine ingesting a large JSON document that starts with well formed JSON but ends with invalid JSON content?
There are currently additional technical limitations which we expect to resolve in future releases of the simdjson library:
* The simdjson library offers runtime dispatching which allows you to compile one binary and have it run at full speed on different processors, taking advantage of the specific features of the processor. The On Demand API has limited runtime dispatch support. Under x64 systems, to fully benefit from the On Demand API, we recommend that you compile your code for a specific processor. E.g., if your processor supports AVX2 instructions, you should compile your binary executable with AVX2 instruction support (by using your compiler's commands). If you are sufficiently technically proficient, you can implement runtime dispatching within your application, by compiling your On Demand code for different processors.
* There is an initial phase which scans the entire document quickly, irrespective of the size of the document. We plan to break this phase into distinct steps for large files in a future release as we have done with other components of our API (e.g., `parse_many`).
* The On Demand API does not support JSON Pointer. This capability is currently limited to our core API.
### Applicability of the On Demand Approach
At this time we recommend the On Demand API in the following cases:
1. The 64-bit hardware (CPU) used to run the software is known at compile time. If you need runtime dispatching because you cannot be certain of the hardware used to run your software, you will be better served with the core simdjson API. (This only applies to x64 (AMD/Intel). On 64-bit ARM hardware, runtime dispatching is unnecessary.)
2. The used parts of JSON files do not need to be validated and the layout of the nodes follows a strict JSON dialect. If you are receiving JSON from other systems, you might be better served with core simdjson API as it fully validates the JSON inputs and allows you to navigate through the document at will.
3. Speed and efficiency are of the utmost importance. Keep in mind that the core simdjson API is highly efficient so adopting the On Demand API is not necessary for high efficiency.
4. As a developer, you value a clean, flexible and maintainable API.
Good applications for the On Demand API might be:
* You are working from pre-existing large JSON files that have been vetted. You expect them to be well formed according to a known JSON dialect and to have a consistent layout. For example, you might be doing biomedical research or machine learning on top of static data dumps in JSON.
* Both the generation and the consumption of JSON data is within your system. Your team controls both the software that produces the JSON and the software the parses it, your team knows and control the hardware. Thus you can fully test your system.
The On Demand API uses advanced architecture-specific code for many common processors to make JSON preprocessing and string parsing faster. By default, however, most c++ compilers will compile to the least common denominator (since the program could theoretically be run anywhere). Since On Demand is inlined into your own code, it cannot always use these advanced versions unless the compiler is told to target them.
On relevant systems, the On Demand API provides some support for runtime dispatching: that is, it will attempt to detect, at runtime, the instructions that your processor supports and optimize the code accordingly. However, it cannot always make full use of the features of your processor.
Some users wish to run at the best possible speed. Under recent Intel and AMD processors, these users should take additional steps to verify that their code is well optimized.
Given that the On Demand API offer limited runtime dispatching, it matters that your code is compiled against a specific CPU target. You should verify that the code is compiled against the target you expect. Thankfully, the simdjson library will tell you exactly what it detects as an implementation: `haswell` (AVX2 x64 processors), `westmere` (SSE4 x64 processors), `arm64` (64-bit ARM), `ppc64` (64-bit POWER), `fallback` (others). Under x64 processors, many programmers will want to target `haswell` whereas under ARM, most programmers will want to target `arm64` (and it should do so automatically). The `fallback` is probably only good for testing purposes, not for deployment.
If the `simdjson::builtin_implementation()->name()` call does not return the architecture you wish to target, you may need to pass flags to your compiler.
If you are using CMake for your C++ project, then you can pass compilation flags to your compiler by using the `CMAKE_CXX_FLAGS` variable:
You can also pass the flags directly to your compiler when compiling 'by hand':
````
c++ -march=haswell -O3 myproject.cpp simdjson.cpp
````
In these examples, the `-march=haswell` flags targets a haswell processor and the resulting binary will run on processors that support all features of the haswell processors.
Instead of specifying a specific microarchitecture, you can let your compiler do the work. The `-march=native` flags says "target the current computer," which is a reasonable default for many applications which both compile and run on the same processor.
Passing `-march=native` to the compiler may make On Demand faster by allowing it to use optimizations specific to your machine. You cannot do this, however, if you are compiling code that might be run on less advanced machines. That is, be mindful that when compiling with the `-march=native` flag, the resulting binary will run on the current system but may not run on other systems (e.g., on an old processor).
If you are compiling on an ARM or POWER system, you do not need to be concerned with CPU selection during compilation. The `-march=native` flag useful for best performance on x64 (e.g., Intel) systems but it is generally unsupported on some platforms such as ARM (aarch64) or POWER.