Update basics.md

This commit is contained in:
Daniel Lemire 2021-06-07 14:54:18 -04:00 committed by GitHub
parent 34bb2079e7
commit 8bc12fe7cb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 7 additions and 21 deletions

View File

@ -247,12 +247,12 @@ support for users who avoid exceptions. See [the simdjson error handling documen
- `field.value()` will get you the value, which you can then use all these other methods on. - `field.value()` will get you the value, which you can then use all these other methods on.
* **Array Index:** Because it is forward-only, you cannot look up an array element by index. Instead, * **Array Index:** Because it is forward-only, you cannot look up an array element by index. Instead,
you will need to iterate through the array and keep an index yourself. you will need to iterate through the array and keep an index yourself.
* **Output to sstrings:** Given a document or an element (or node) out of a JSON document, you can output a JSON string version suitable to be parsed again as JSON content: `simdjson::to_string(element)` returns a `simdjson::simdjson_result<std::string>` instance. You can cast it to `std::string` and it will throw when an error was encountered (`std::string(simdjson::to_string(element))`). Or else you can do `std::string s; if(simdjson::to_string(element).get(s) == simdjson::SUCCESS) { ... }`. This consumes fully the element: if you apply it on a document, the JSON pointer is advanced to the end of the document. The returned string contains a serialized version of the element or document that is suitable to be parsed again. It is also a newly allocated `std::string` that is independent from the simdjson parser. The `to_string` function should not be confused with retrieving the value of a string instance which are escaped and represented using a lightweight `std::string_view` instance pointing at an internal string buffer inside the parser instance. To illustrate, the first of the following two code segments will print the unescaped string `"test"` complete with the quote whereas the second one will print the escaped content of the string (without the quotes). Th * **Output to strings (simdjson 1.0 or better):** Given a document or an element (or node) out of a JSON document, you can output a JSON string version suitable to be parsed again as JSON content: `simdjson::to_string(element)` returns a `simdjson::simdjson_result<std::string>` instance. You can cast it to `std::string` and it will throw when an error was encountered (`std::string(simdjson::to_string(element))`). Or else you can do `std::string s; if(simdjson::to_string(element).get(s) == simdjson::SUCCESS) { ... }`. This consumes fully the element: if you apply it on a document, the JSON pointer is advanced to the end of the document. The returned string contains a serialized version of the element or document that is suitable to be parsed again. It is also a newly allocated `std::string` that is independent from the simdjson parser. The `to_string` function should not be confused with retrieving the value of a string instance which are escaped and represented using a lightweight `std::string_view` instance pointing at an internal string buffer inside the parser instance. To illustrate, the first of the following two code segments will print the unescaped string `"test"` complete with the quote whereas the second one will print the escaped content of the string (without the quotes). Th
> ```C++ > ```C++
> // serialize a JSON to an escaped std::string instance so that it can be parsed again as JSON > // serialize a JSON to an escaped std::string instance so that it can be parsed again as JSON
> auto cars_json = R"( { "test": "result" } )"_padded; > auto cars_json = R"( { "test": "result" } )"_padded;
> ondemand::document doc = parser.iterate(cars_json); > ondemand::document doc = parser.iterate(cars_json);
> std::cout << simdjson::to_string(doc["test"]) << std::endl; > std::cout << simdjson::to_string(doc["test"]) << std::endl; // Requires simdjson 1.0 or better
>```` >````
> ```C++ > ```C++
> // retrieves an unescaped string value as a string_view instance > // retrieves an unescaped string value as a string_view instance
@ -348,7 +348,7 @@ whole array. You may use it as follows if your document is itself an array:
```C++ ```C++
auto cars_json = R"( [ 40.1, 39.9, 37.7, 40.4 ] )"_padded; auto cars_json = R"( [ 40.1, 39.9, 37.7, 40.4 ] )"_padded;
auto doc = parser.iterate(cars_json); auto doc = parser.iterate(cars_json);
size_t count = doc.count_elements(); size_t count = doc.count_elements(); // requires simdjson 1.0 or better
std::vector<double> values(count); std::vector<double> values(count);
size_t index = 0; size_t index = 0;
for(double x : doc) { values[index++] = x; } for(double x : doc) { values[index++] = x; }
@ -361,7 +361,7 @@ You should not let the array instance go out of scope before consuming it after
auto cars_json = R"( { "test":[ { "val1":1, "val2":2 }, { "val1":1, "val2":2 } ] } )"_padded; auto cars_json = R"( { "test":[ { "val1":1, "val2":2 }, { "val1":1, "val2":2 } ] } )"_padded;
auto doc = parser.iterate(cars_json); auto doc = parser.iterate(cars_json);
auto test_array = doc.find_field("test").get_array(); auto test_array = doc.find_field("test").get_array();
size_t count = test_array.count_elements(); size_t count = test_array.count_elements(); // requires simdjson 1.0 or better
std::cout << "Number of elements: " << count << std::endl; std::cout << "Number of elements: " << count << std::endl;
for(ondemand::object elem: test_array) { for(ondemand::object elem: test_array) {
std::cout << simdjson::to_string(elem); std::cout << simdjson::to_string(elem);
@ -801,7 +801,8 @@ Rewinding
---------- ----------
In some instances, you may need to go through a document more than once. For that purpose, you may In some instances, you may need to go through a document more than once. For that purpose, you may
call the `rewind()` method on the document instance. It allows you to restart processing from the beginning without rescanning all of the input data again. It invalidates all values, objects and arrays call the `rewind()` method on the document instance. It allows you to restart processing from the
beginning without rescanning all of the input data again. It invalidates all values, objects and arrays
that you have created so far (including unescaped strings). that you have created so far (including unescaped strings).
In the following example, we print on the screen the number of cars in the JSON input file In the following example, we print on the screen the number of cars in the JSON input file
@ -820,27 +821,12 @@ before printout the data.
if(car["make"] == "Toyota") { count++; } if(car["make"] == "Toyota") { count++; }
} }
std::cout << "We have " << count << " Toyota cars.\n"; std::cout << "We have " << count << " Toyota cars.\n";
doc.rewind(); doc.rewind(); // requires simdjson 1.0 or better
for (ondemand::object car : doc) { for (ondemand::object car : doc) {
cout << "Make/Model: " << std::string_view(car["make"]) << "/" << std::string_view(car["model"]) << endl; cout << "Make/Model: " << std::string_view(car["make"]) << "/" << std::string_view(car["model"]) << endl;
} }
``` ```
You may also rewind arrays and objects, as in the following example:
```C++
auto doc = parser.iterate(cars_json);
ondemand::array doc_array = doc;
size_t count = 0;
for (simdjson_unused ondemand::object car : doc) {
if(car["make"] == "Toyota") { count++; }
}
std::cout << "We have " << count << " Toyota cars.\n";
doc_array.rewind();
for (ondemand::object car : doc_array) {
cout << "Make/Model: " << std::string_view(car["make"]) << "/" << std::string_view(car["model"]) << endl;
}
```
Direct Access to the Raw String Direct Access to the Raw String
-------------------------------- --------------------------------