Tweaking current_location(). (#1707)

* Tweaking current_location().

* Well.
This commit is contained in:
Daniel Lemire 2021-08-28 20:19:30 -04:00 committed by GitHub
parent ed7343f7f2
commit cebe3fb299
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 87 additions and 33 deletions

View File

@ -966,11 +966,11 @@ int main(void) {
### Current location in document ### Current location in document
Sometimes, it might be helpful to know the current location in the document during iteration. This is especially useful when Sometimes, it might be helpful to know the current location in the document during iteration. This is especially useful when encountering errors. The `current_location()` method on a
encountering errors. Using `current_location()` in combination of exception-free error handling makes it easy to identify broken JSON `document` instances makes it easy to identify common JSON errors. Users can call the `current_location()` method on a validdocument instance to retrieve a `const char *` pointer to the current location in the document. This method also works even after an error has invalidated the document and the parser (e.g. `TAPE_ERROR`, `INCOMPLETE_ARRAY_OR_OBJECT`).
and errors. Users can call the `current_location()` method on a document instance to retrieve a `const char *` pointer to the current When the input was a `padding_string` or another null-terminated source, then you may
location in the document. This method also works even after an error has invalidated the document and the parser (e.g. `TAPE_ERROR`, use the `const char *` pointer as a C string. As an example, consider the following
`INCOMPLETE_ARRAY_OR_OBJECT`). As an example, consider the following, example where we used the exception-free simdjson interface:
```c++ ```c++
auto broken_json = R"( {"double": 13.06, false, "integer": -343} )"_padded; // Missing key auto broken_json = R"( {"double": 13.06, false, "integer": -343} )"_padded; // Missing key
@ -984,9 +984,24 @@ if (error) {
} }
``` ```
In the previous example, we tried to access the `"integer"` key, but since the parser had to go through a value without a key before You may also use `current_location()` with exceptions as follows:
(`false`), a `TAPE_ERROR` error gets thrown. `current_location()` will then point at the location of the error, and the user can now easily see the relevant problem. `current_location()` also has uses when the error/exception is triggered but an incorrect
call done by the user. For example, ```c++
auto broken_json = R"( {"double": 13.06, false, "integer": -343} )"_padded;
ondemand::parser parser;
ondemand::document doc = parser.iterate(broken_json);
try {
return int64_t(doc["integer"]);
} catch(simdjson_error& err) {
std::cerr << doc.current_location() << std::endl;
return -1;
}
```
In these examples, we tried to access the `"integer"` key, but since the parser
had to go through a value without a key before (`false`), a `TAPE_ERROR` error is thrown.
The pointer returned by the `current_location()` method then points at the location of the error. The `current_location()` may also be used when the error is triggered
by a user action, even if the JSON input is valid. Consider the following example:
```c++ ```c++
auto json = R"( [1,2,3] )"_padded; auto json = R"( [1,2,3] )"_padded;
@ -1000,7 +1015,8 @@ if (error) {
} }
``` ```
If the location is invalid (i.e. at the end of a document), `current_location()` will return an `OUT_OF_BOUNDS` error. Example: If the location is invalid (i.e. at the end of a document), the `current_location()`
methods returns an `OUT_OF_BOUNDS` error. For example:
```c++ ```c++
auto json = R"( [1,2,3] )"_padded; auto json = R"( [1,2,3] )"_padded;
@ -1012,8 +1028,8 @@ for (auto val : doc) {
std::cout << doc.current_location() << std::endl; // Throws OUT_OF_BOUNDS std::cout << doc.current_location() << std::endl; // Throws OUT_OF_BOUNDS
``` ```
Finally, note that `current_location()` can also be used even when no exceptions/errors are thrown. This can be helpful for users Finally, the `current_location()` method may also be used even when no exceptions/errors
that want to know the current state of iteration during parsing. For example, are thrown. This can be helpful for users that want to know the current state of iteration during parsing. For example:
```c++ ```c++
auto json = R"( [[1,2,3], -23.4, {"key": "value"}, true] )"_padded; auto json = R"( [[1,2,3], -23.4, {"key": "value"}, true] )"_padded;
@ -1028,6 +1044,15 @@ for (auto val : doc) {
} }
``` ```
The `current_location()` method requires a valid `document` instance. If the
`iterate` function fails to return a valid document, then you cannot use
`current_location()` to identify the location of an error in the input string.
The errors reported by `iterate` function include EMPTY if no JSON document is detected,
UTF8_ERROR if the string is not a valid UTF-8 string, UNESCAPED_CHARS if a string
contains control characters that must be escaped and UNCLOSED_STRING if there
is an unclosed string in the document. We do not provide location information for these
errors.
Rewinding Rewinding
---------- ----------

View File

@ -454,37 +454,37 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<raw_json_string> val
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iterator::get_uint64() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iterator::get_uint64() noexcept {
auto result = numberparsing::parse_unsigned(peek_non_root_scalar("uint64")); auto result = numberparsing::parse_unsigned(peek_non_root_scalar("uint64"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("uint64"); } if(result.error() == SUCCESS) { advance_non_root_scalar("uint64"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iterator::get_uint64_in_string() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iterator::get_uint64_in_string() noexcept {
auto result = numberparsing::parse_unsigned_in_string(peek_non_root_scalar("uint64")); auto result = numberparsing::parse_unsigned_in_string(peek_non_root_scalar("uint64"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("uint64"); } if(result.error() == SUCCESS) { advance_non_root_scalar("uint64"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_int64() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_int64() noexcept {
auto result = numberparsing::parse_integer(peek_non_root_scalar("int64")); auto result = numberparsing::parse_integer(peek_non_root_scalar("int64"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("int64"); } if(result.error() == SUCCESS) { advance_non_root_scalar("int64"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_int64_in_string() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_int64_in_string() noexcept {
auto result = numberparsing::parse_integer_in_string(peek_non_root_scalar("int64")); auto result = numberparsing::parse_integer_in_string(peek_non_root_scalar("int64"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("int64"); } if(result.error() == SUCCESS) { advance_non_root_scalar("int64"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_double() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_double() noexcept {
auto result = numberparsing::parse_double(peek_non_root_scalar("double")); auto result = numberparsing::parse_double(peek_non_root_scalar("double"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("double"); } if(result.error() == SUCCESS) { advance_non_root_scalar("double"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_double_in_string() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_double_in_string() noexcept {
auto result = numberparsing::parse_double_in_string(peek_non_root_scalar("double")); auto result = numberparsing::parse_double_in_string(peek_non_root_scalar("double"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("double"); } if(result.error() == SUCCESS) { advance_non_root_scalar("double"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<bool> value_iterator::get_bool() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<bool> value_iterator::get_bool() noexcept {
auto result = parse_bool(peek_non_root_scalar("bool")); auto result = parse_bool(peek_non_root_scalar("bool"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("bool"); } if(result.error() == SUCCESS) { advance_non_root_scalar("bool"); }
return result; return result;
} }
simdjson_really_inline bool value_iterator::is_null() noexcept { simdjson_really_inline bool value_iterator::is_null() noexcept {
@ -533,9 +533,8 @@ simdjson_really_inline simdjson_result<number> value_iterator::get_root_number()
} }
number num; number num;
error_code error = numberparsing::parse_number(tmpbuf, num); error_code error = numberparsing::parse_number(tmpbuf, num);
if(error == INCORRECT_TYPE) { return error; }
advance_root_scalar("number"); // we consume!
if(error) { return error; } if(error) { return error; }
advance_root_scalar("number");
return num; return num;
} }
@ -554,7 +553,7 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iter
return NUMBER_ERROR; return NUMBER_ERROR;
} }
auto result = numberparsing::parse_unsigned(tmpbuf); auto result = numberparsing::parse_unsigned(tmpbuf);
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("uint64"); } if(result.error() == SUCCESS) { advance_root_scalar("uint64"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iterator::get_root_uint64_in_string() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iterator::get_root_uint64_in_string() noexcept {
@ -566,7 +565,7 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iter
return NUMBER_ERROR; return NUMBER_ERROR;
} }
auto result = numberparsing::parse_unsigned_in_string(tmpbuf); auto result = numberparsing::parse_unsigned_in_string(tmpbuf);
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("uint64"); } if(result.error() == SUCCESS) { advance_root_scalar("uint64"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_root_int64() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_root_int64() noexcept {
@ -579,7 +578,7 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_itera
} }
auto result = numberparsing::parse_integer(tmpbuf); auto result = numberparsing::parse_integer(tmpbuf);
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("int64"); } if(result.error() == SUCCESS) { advance_root_scalar("int64"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_root_int64_in_string() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_root_int64_in_string() noexcept {
@ -592,7 +591,7 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_itera
} }
auto result = numberparsing::parse_integer_in_string(tmpbuf); auto result = numberparsing::parse_integer_in_string(tmpbuf);
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("int64"); } if(result.error() == SUCCESS) { advance_root_scalar("int64"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_root_double() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_root_double() noexcept {
@ -607,7 +606,7 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterat
return NUMBER_ERROR; return NUMBER_ERROR;
} }
auto result = numberparsing::parse_double(tmpbuf); auto result = numberparsing::parse_double(tmpbuf);
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("double"); } if(result.error() == SUCCESS) { advance_root_scalar("double"); }
return result; return result;
} }
@ -623,7 +622,7 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterat
return NUMBER_ERROR; return NUMBER_ERROR;
} }
auto result = numberparsing::parse_double_in_string(tmpbuf); auto result = numberparsing::parse_double_in_string(tmpbuf);
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("double"); } if(result.error() == SUCCESS) { advance_root_scalar("double"); }
return result; return result;
} }
simdjson_warn_unused simdjson_really_inline simdjson_result<bool> value_iterator::get_root_bool() noexcept { simdjson_warn_unused simdjson_really_inline simdjson_result<bool> value_iterator::get_root_bool() noexcept {
@ -631,13 +630,14 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<bool> value_iterator
auto json = peek_root_scalar("bool"); auto json = peek_root_scalar("bool");
uint8_t tmpbuf[5+1]; uint8_t tmpbuf[5+1];
if (!_json_iter->copy_to_buffer(json, max_len, tmpbuf)) { return incorrect_type_error("Not a boolean"); } if (!_json_iter->copy_to_buffer(json, max_len, tmpbuf)) { return incorrect_type_error("Not a boolean"); }
advance_root_scalar("bool"); auto result = parse_bool(tmpbuf);
return parse_bool(tmpbuf); if(result.error() == SUCCESS) { advance_root_scalar("bool"); }
return result;
} }
simdjson_really_inline bool value_iterator::is_root_null() noexcept { simdjson_really_inline bool value_iterator::is_root_null() noexcept {
auto max_len = peek_start_length(); auto max_len = peek_start_length();
auto json = peek_root_scalar("null"); auto json = peek_root_scalar("null");
auto result = (max_len >= 4 && !atomparsing::str4ncmp(json, "null") && bool result = (max_len >= 4 && !atomparsing::str4ncmp(json, "null") &&
(max_len == 4 || jsoncharutils::is_structural_or_whitespace(json[5]))); (max_len == 4 || jsoncharutils::is_structural_or_whitespace(json[5])));
if(result) { advance_root_scalar("null"); } if(result) { advance_root_scalar("null"); }
return result; return result;

View File

@ -83,7 +83,7 @@ namespace error_location_tests {
double d; double d;
ASSERT_ERROR(doc.at_pointer("/b/c/0").get(d), NUMBER_ERROR); ASSERT_ERROR(doc.at_pointer("/b/c/0").get(d), NUMBER_ERROR);
ASSERT_SUCCESS(doc.current_location().get(ptr)); ASSERT_SUCCESS(doc.current_location().get(ptr));
ASSERT_EQUAL(ptr, ", 2.3]}} "); ASSERT_EQUAL(ptr, "1.2., 2.3]}} ");
uint64_t i; uint64_t i;
ASSERT_ERROR(doc.at_pointer("/a/2/1").get(i), TAPE_ERROR); ASSERT_ERROR(doc.at_pointer("/a/2/1").get(i), TAPE_ERROR);
ASSERT_SUCCESS(doc.current_location().get(ptr)); ASSERT_SUCCESS(doc.current_location().get(ptr));
@ -93,14 +93,14 @@ namespace error_location_tests {
bool broken_json1() { bool broken_json1() {
TEST_START(); TEST_START();
auto json = R"( <20>{"a":1, 3} )"_padded; auto json = " \xc3\x94\xc3\xb8\xe2\x84\xa6{\"a\":1, 3} "_padded;
ondemand::parser parser; ondemand::parser parser;
ondemand::document doc; ondemand::document doc;
const char * ptr; const char * ptr;
ASSERT_SUCCESS(parser.iterate(json).get(doc)); ASSERT_SUCCESS(parser.iterate(json).get(doc));
ASSERT_ERROR(doc["a"], INCORRECT_TYPE); ASSERT_ERROR(doc["a"], INCORRECT_TYPE);
ASSERT_SUCCESS(doc.current_location().get(ptr)); ASSERT_SUCCESS(doc.current_location().get(ptr));
ASSERT_EQUAL(ptr, "<EFBFBD>{\"a\":1, 3} "); ASSERT_EQUAL(ptr, "\xc3\x94\xc3\xb8\xe2\x84\xa6{\"a\":1, 3} ");
TEST_SUCCEED(); TEST_SUCCEED();
} }
@ -221,7 +221,20 @@ namespace error_location_tests {
double d; double d;
ASSERT_ERROR(doc.at_pointer("/0").get_double().get(d), NUMBER_ERROR); ASSERT_ERROR(doc.at_pointer("/0").get_double().get(d), NUMBER_ERROR);
ASSERT_SUCCESS(doc.current_location().get(ptr)); ASSERT_SUCCESS(doc.current_location().get(ptr));
ASSERT_EQUAL(ptr, "] "); ASSERT_EQUAL(ptr, "13.34.514] ");
TEST_SUCCEED();
}
bool number_parsing_root_error() {
TEST_START();
auto json = R"( 13.34.514 )"_padded;
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(json).get(doc));
const char * ptr;
double d;
ASSERT_ERROR(doc.get_double().get(d), NUMBER_ERROR);
ASSERT_SUCCESS(doc.current_location().get(ptr));
ASSERT_EQUAL(ptr, "13.34.514 ");
TEST_SUCCEED(); TEST_SUCCEED();
} }
@ -239,6 +252,7 @@ namespace error_location_tests {
no_such_field() && no_such_field() &&
object_with_no_such_field() && object_with_no_such_field() &&
number_parsing_error() && number_parsing_error() &&
number_parsing_root_error() &&
true; true;
} }

View File

@ -708,6 +708,20 @@ bool simple_error_example() {
return false; return false;
} }
} }
int64_t current_location_tape_error_with_except() {
auto broken_json = R"( {"double": 13.06, false, "integer": -343} )"_padded;
ondemand::parser parser;
ondemand::document doc = parser.iterate(broken_json);
try {
return int64_t(doc["integer"]);
} catch(simdjson_error& err) {
std::cerr << err.error() << std::endl;
std::cerr << doc.current_location() << std::endl;
return -1;
}
}
#endif #endif
int load_example() { int load_example() {
@ -885,6 +899,7 @@ int main() {
&& current_location_no_error() && current_location_no_error()
#if SIMDJSON_EXCEPTIONS #if SIMDJSON_EXCEPTIONS
&& number_tests() && number_tests()
&& current_location_tape_error_with_except()
#endif #endif
) { ) {
return 0; return 0;