Update basic.md to document JSON pointer for On Demand. (#1618)

* Update basic.md to document JSON pointer for On Demand.

* Add automatic rewind for at_pointer

* Remove DOM examples in basics.md and update documentation reflecting addition of at_pointer automatic rewinding.

* Review

* Add test

Co-authored-by: Daniel Lemire <lemire@gmail.com>
This commit is contained in:
Nicolas Boyer 2021-06-26 11:38:17 -04:00 committed by GitHub
parent f146294a85
commit eb849662c0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 154 additions and 28 deletions

View File

@ -528,27 +528,9 @@ Your input string does not need any padding. Any string will do. The `validate_u
JSON Pointer
------------
The simdjson library also supports [JSON pointer](https://tools.ietf.org/html/rfc6901) through the
`at_pointer()` method, letting you reach further down into the document in a single call:
The simdjson library also supports [JSON pointer](https://tools.ietf.org/html/rfc6901) through the `at_pointer()` method, letting you reach further down into the document in a single call. JSON pointer is supported by both the [DOM approach](https://github.com/simdjson/simdjson/blob/master/doc/dom.md#json-pointer) as well as the On Demand approach.
```c++
auto cars_json = R"( [
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
] )"_padded;
dom::parser parser;
dom::element cars = parser.parse(cars_json);
cout << cars.at_pointer("/0/tire_pressure/1") << endl; // Prints 39.9
```
A JSON Path is a sequence of segments each starting with the '/' character. Within arrays, an integer
index allows you to select the indexed node. Within objects, the string value of the key allows you to
select the value. If your keys contain the characters '/' or '~', they must be escaped as '~1' and
'~0' respectively. An empty JSON Path refers to the whole document.
We also extend the JSON Pointer support to include *relative* paths.
You can apply a JSON path to any node and the path gets interpreted relatively, as if the current node were a whole JSON document.
**Note:** The On Demand implementation of JSON pointer relies on `find_field` which implies that it does not unescape keys when matching.
Consider the following example:
@ -558,18 +540,101 @@ auto cars_json = R"( [
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
] )"_padded;
dom::parser parser;
dom::element cars = parser.parse(cars_json);
ondemand::parser parser;
auto cars = parser.iterate(cars_json);
cout << cars.at_pointer("/0/tire_pressure/1") << endl; // Prints 39.9
for (dom::element car_element : cars) {
dom::object car;
simdjson::error_code error;
if ((error = car_element.get(car))) { std::cerr << error << std::endl; return; }
double x = car.at_pointer("/tire_pressure/1");
cout << x << endl; // Prints 39.9, 31 and 30
```
A JSON Path is a sequence of segments each starting with the '/' character. Within arrays, an integer
index allows you to select the indexed node. Within objects, the string value of the key allows you to
select the value. If your keys contain the characters '/' or '~', they must be escaped as '~1' and
'~0' respectively. An empty JSON Path refers to the whole document.
For multiple JSON pointer queries on a document, one can call `at_pointer` multiple times.
```c++
auto cars_json = R"( [
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
] )"_padded;
ondemand::parser parser;
auto cars = parser.iterate(cars_json);
size_t size = cars.count_elements();
for (size_t i = 0; i < size; i++) {
std::string json_pointer = "/" + std::to_string(i) + "/tire_pressure/1";
double x = cars.at_pointer(json_pointer);
std::cout << x << std::endl; // Prints 39.9, 31 and 30
}
```
Note that `at_pointer` calls [`rewind`](#rewind) to reset the parser at the beginning of the document. Hence, it invalidates all previously parsed values, objects and arrays: make sure to consume the values between each call to `at_pointer`. Consider the following example where one wants to store each object from the JSON into a vector of `struct car_type`:
```c++
struct car_type {
std::string make;
std::string model;
uint64_t year;
std::vector<double> tire_pressure;
car_type(std::string_view _make, std::string_view _model, uint64_t _year,
std::vector<double>&& _tire_pressure) :
make{_make}, model{_model}, year(_year), tire_pressure(_tire_pressure) {}
};
auto cars_json = R"( [
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
] )"_padded;
ondemand::parser parser;
ondemand::document cars;
std::vector<double> measured;
parser.iterate(cars_json).get(cars);
std::vector<car_type> content;
for (int i = 0; i < 3; i++) {
ondemand::object obj;
std::string json_pointer = "/" + std::to_string(i);
// Each successive at_pointer call invalidates
// previously parsed values, strings, objects and array.
cars.at_pointer(json_pointer).get(obj);
// We materialize the object.
std::string_view make;
ASSERT_SUCCESS(obj["make"].get(make));
std::string_view model;
ASSERT_SUCCESS(obj["model"].get(model));
uint64_t year;
ASSERT_SUCCESS(obj["year"].get(year));
// We materialize the array.
ondemand::array arr;
ASSERT_SUCCESS(obj["tire_pressure"].get(arr));
std::vector<double> values;
for(auto x : arr) {
double value_double;
ASSERT_SUCCESS(x.get(value_double));
values.push_back(value_double);
}
content.emplace_back(make, model, year, std::move(values));
}
```
Furthermore, `at_pointer` calls `rewind` at the beginning of the call (i.e. the document is not reset after `at_pointer`). Consider the following example,
```c++
auto json = R"( {
"k0": 27,
"k1": [13,26],
"k2": true
} )"_padded;
ondemand::parser parser;
auto doc = parser.iterate(json);
std::cout << doc.at_pointer("/k1/1") << std::endl; // Prints 26
std::cout << doc.at_pointer("/k2") << std::endl; // Prints true
doc.rewind(); // Need to manually rewind to be able to use find_field properly from start of document
std::cout << doc.find_field("k0") << std::endl; // Prints 27
```
Error Handling

View File

@ -341,6 +341,7 @@ public:
* are invalidated. After calling at_pointer, you need to consume the result: string values
* should be stored in your own variables, arrays should be decoded and stored in your own array-like
* structures and so forth.
*
* Also note that at_pointer() relies on find_field() which implies that we do not unescape keys when matching
*
* @return The value associated with the given JSON pointer, or:

View File

@ -292,6 +292,63 @@ bool using_the_parsed_json_6() {
TEST_SUCCEED();
}
const padded_string cars_json = R"( [
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
] )"_padded;
bool json_pointer_simple() {
TEST_START();
ondemand::parser parser;
ondemand::document cars;
double x;
ASSERT_SUCCESS(parser.iterate(cars_json).get(cars));
ASSERT_SUCCESS(cars.at_pointer("/0/tire_pressure/1").get(x));
ASSERT_EQUAL(x,39.9);
TEST_SUCCEED();
}
bool json_pointer_multiple() {
TEST_START();
ondemand::parser parser;
ondemand::document cars;
size_t size;
ASSERT_SUCCESS(parser.iterate(cars_json).get(cars));
ASSERT_SUCCESS(cars.count_elements().get(size));
double expected[] = {39.9, 31, 30};
for (size_t i = 0; i < size; i++) {
std::string json_pointer = "/" + std::to_string(i) + "/tire_pressure/1";
double x;
ASSERT_SUCCESS(cars.at_pointer(json_pointer).get(x));
ASSERT_EQUAL(x,expected[i]);
}
TEST_SUCCEED();
}
bool json_pointer_rewind() {
TEST_START();
auto json = R"( {
"k0": 27,
"k1": [13,26],
"k2": true
} )"_padded;
ondemand::parser parser;
ondemand::document doc;
uint64_t i;
bool b;
ASSERT_SUCCESS(parser.iterate(json).get(doc));
ASSERT_SUCCESS(doc.at_pointer("/k1/1").get(i));
ASSERT_EQUAL(i,26);
ASSERT_SUCCESS(doc.at_pointer("/k2").get(b));
ASSERT_EQUAL(b,true);
doc.rewind(); // Need to manually rewind to be able to use find_field properly from start of document
ASSERT_SUCCESS(doc.find_field("k0").get(i));
ASSERT_EQUAL(i,27);
TEST_SUCCEED();
}
int main() {
if (
true
@ -312,6 +369,9 @@ int main() {
&& using_the_parsed_json_5()
#endif
&& using_the_parsed_json_6()
&& json_pointer_simple()
&& json_pointer_multiple()
&& json_pointer_rewind()
) {
return 0;
} else {