Trimming the documentation a bit.
This commit is contained in:
parent
5c590b8434
commit
3dbecab494
|
@ -10,8 +10,8 @@ An overview of what you need to know to use simdjson, with examples.
|
||||||
* [Versions](#versions)
|
* [Versions](#versions)
|
||||||
* [The Basics: Loading and Parsing JSON Documents](#the-basics-loading-and-parsing-json-documents)
|
* [The Basics: Loading and Parsing JSON Documents](#the-basics-loading-and-parsing-json-documents)
|
||||||
* [Documents Are Iterators](#documents-are-iterators)
|
* [Documents Are Iterators](#documents-are-iterators)
|
||||||
* [Using the Parsed JSON](#using-the-parsed-json)
|
|
||||||
* [C++11 Support and string_view](#c11-support-and-string_view)
|
* [C++11 Support and string_view](#c11-support-and-string_view)
|
||||||
|
* [Using the Parsed JSON](#using-the-parsed-json)
|
||||||
* [C++17 Support](#c17-support)
|
* [C++17 Support](#c17-support)
|
||||||
* [Minifying JSON strings without parsing](#minifying-json-strings-without-parsing)
|
* [Minifying JSON strings without parsing](#minifying-json-strings-without-parsing)
|
||||||
* [UTF-8 validation (alone)](#utf-8-validation-alone)
|
* [UTF-8 validation (alone)](#utf-8-validation-alone)
|
||||||
|
@ -177,6 +177,41 @@ For best performance, a `parser` instance should be reused over several files: o
|
||||||
needlessly reallocate memory, an expensive process. It is also possible to avoid entirely memory
|
needlessly reallocate memory, an expensive process. It is also possible to avoid entirely memory
|
||||||
allocations during parsing when using simdjson. [See our performance notes for details](performance.md).
|
allocations during parsing when using simdjson. [See our performance notes for details](performance.md).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
C++11 Support and string_view
|
||||||
|
-------------
|
||||||
|
|
||||||
|
The simdjson library builds on compilers supporting the [C++11 standard](https://en.wikipedia.org/wiki/C%2B%2B11). It is also a strict requirement: we have no plan to support older C++ compilers.
|
||||||
|
|
||||||
|
We represent parsed strings in simdjson using the `std::string_view` class. It avoids
|
||||||
|
the need to copy the data, as would be necessary with the `std::string` class. It also
|
||||||
|
avoids the pitfalls of null-terminated C strings. It makes it easier for our users to
|
||||||
|
copy the data into their own favorite class instances (e.g., alternatives to `std::string`).
|
||||||
|
|
||||||
|
A `std::string_view` instance is effectively just a pointer to a region in memory representing
|
||||||
|
a string. In simdjson, we return `std::string_view` instances that either point within the
|
||||||
|
input string you parsed, or to a temporary string buffer inside our parser class instances.
|
||||||
|
When using `std::string_view` instances, it is your responsibility to ensure that
|
||||||
|
`std::string_view` instance does not outlive the pointed-to memory (e.g., either the input
|
||||||
|
buffer or the parser instance). Furthermore, some operations reset the string buffer
|
||||||
|
inside our parser instances: e.g., when we parse a new document. Thus a `std::string_view` instance
|
||||||
|
is often best viewed as a temporary string value that is tied to the document you are parsing.
|
||||||
|
At the cost of some memory allocation, you may convert your `std::string_view` instances for long-term storage into `std::string` instances:
|
||||||
|
`std::string mycopy(view)` (C++17) or `std::string mycopy(view.begin(), view.end())` (prior to C++17).
|
||||||
|
|
||||||
|
|
||||||
|
The `std::string_view` class has become standard as part of C++17 but it is not always available
|
||||||
|
on compilers which only supports C++11. When we detect that `string_view` is natively
|
||||||
|
available, we define the macro `SIMDJSON_HAS_STRING_VIEW`.
|
||||||
|
|
||||||
|
When we detect that it is unavailable,
|
||||||
|
we use [string-view-lite](https://github.com/martinmoene/string-view-lite) as a
|
||||||
|
substitute. In such cases, we use the type alias `using string_view = nonstd::string_view;` to
|
||||||
|
offer the same API, irrespective of the compiler and standard library. The macro
|
||||||
|
`SIMDJSON_HAS_STRING_VIEW` will be *undefined* to indicate that we emulate `string_view`.
|
||||||
|
|
||||||
|
|
||||||
Using the Parsed JSON
|
Using the Parsed JSON
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
|
@ -470,26 +505,6 @@ void basics_treewalk() {
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
C++11 Support and string_view
|
|
||||||
-------------
|
|
||||||
|
|
||||||
The simdjson library builds on compilers supporting the [C++11 standard](https://en.wikipedia.org/wiki/C%2B%2B11). It is also a strict requirement: we have no plan to support older C++ compilers.
|
|
||||||
|
|
||||||
We represent parsed strings in simdjson using the `std::string_view` class. It avoids
|
|
||||||
the need to copy the data, as would be necessary with the `std::string` class. It also
|
|
||||||
avoids the pitfalls of null-terminated C strings.
|
|
||||||
|
|
||||||
The `std::string_view` class has become standard as part of C++17 but it is not always available
|
|
||||||
on compilers which only supports C++11. When we detect that `string_view` is natively
|
|
||||||
available, we define the macro `SIMDJSON_HAS_STRING_VIEW`.
|
|
||||||
|
|
||||||
When we detect that it is unavailable,
|
|
||||||
we use [string-view-lite](https://github.com/martinmoene/string-view-lite) as a
|
|
||||||
substitute. In such cases, we use the type alias `using string_view = nonstd::string_view;` to
|
|
||||||
offer the same API, irrespective of the compiler and standard library. The macro
|
|
||||||
`SIMDJSON_HAS_STRING_VIEW` will be *undefined* to indicate that we emulate `string_view`.
|
|
||||||
|
|
||||||
|
|
||||||
C++17 Support
|
C++17 Support
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
|
@ -928,6 +943,7 @@ before printout the data.
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Performance note: the On Demand front-end does not materialize the parsed numbers and other values. If you are accessing everything twice, you may need to parse them twice. Thus the rewind functionality is best suited for cases where the first pass only scans the structure of the document.
|
||||||
|
|
||||||
Direct Access to the Raw String
|
Direct Access to the Raw String
|
||||||
--------------------------------
|
--------------------------------
|
||||||
|
@ -945,15 +961,26 @@ simdjson::padded_string docdata = R"({"value":123213232132132132132132132132112
|
||||||
simdjson::ondemand::document doc = parser.iterate(docdata);
|
simdjson::ondemand::document doc = parser.iterate(docdata);
|
||||||
simdjson::ondemand::object obj = doc.get_object();
|
simdjson::ondemand::object obj = doc.get_object();
|
||||||
std::string_view token = obj["value"].raw_json_token();
|
std::string_view token = obj["value"].raw_json_token();
|
||||||
// token has value "12321323213213213213213213213211223"
|
// token has value 12321323213213213213213213213211223, it points inside the input string
|
||||||
```
|
```
|
||||||
|
|
||||||
Performance note: the On Demand front-end does not materialize the parsed numbers and other values. If you are accessing everything twice, you may need to parse them twice. Thus the rewind functionality is
|
|
||||||
best suited for cases where the first pass only scans the structure of the document.
|
|
||||||
The `raw_json_token` method even works when the JSON value is a string. In such cases, it
|
The `raw_json_token` method even works when the JSON value is a string. In such cases, it
|
||||||
will return the complete string with the quotes and with eventual escaped sequences as in the
|
will return the complete string with the quotes and with eventual escaped sequences as in the
|
||||||
source document.
|
source document.
|
||||||
|
|
||||||
|
```C++
|
||||||
|
simdjson::ondemand::parser parser;
|
||||||
|
simdjson::padded_string docdata = R"({"value":"12321323213213213213213213213211223"})"_padded;
|
||||||
|
simdjson::ondemand::document doc = parser.iterate(docdata);
|
||||||
|
simdjson::ondemand::object obj = doc.get_object();
|
||||||
|
string_view token = obj["value"].raw_json_token();
|
||||||
|
// token has value "12321323213213213213213213213211223", it points inside the input string
|
||||||
|
```
|
||||||
|
|
||||||
|
The `raw_json_token()` should be fast and free of allocation.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Newline-Delimited JSON (ndjson) and JSON lines
|
Newline-Delimited JSON (ndjson) and JSON lines
|
||||||
----------------------------------------------
|
----------------------------------------------
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue