Parse numbers inside strings (#1667)

* Update basic.md to document JSON pointer for On Demand.

* Add automatic rewind for at_pointer

* Remove DOM examples in basics.md and update documentation reflecting addition of at_pointer automatic rewinding.

* Review

* Add test

* Naive implementation for doubles in string.

* Add double from string in atom doc.

* Simplification (removed all *_from_string())

* Add int and uint parsing in string.

* Make duplicates instead.

* Make tests exceptionless.

* Add missing declarations.

* Add more tests (errors, JSON pointer).

* Add crypto json tests.

* Update doc.

* Update doc after review.

Co-authored-by: Daniel Lemire <lemire@gmail.com>
This commit is contained in:
Nicolas Boyer 2021-07-27 08:50:44 -04:00 committed by GitHub
parent 9d405a5df4
commit 7d887fdc1e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 687 additions and 2 deletions

View File

@ -22,6 +22,7 @@ An overview of what you need to know to use simdjson, with examples.
* [Rewinding](#rewinding)
* [Direct Access to the Raw String](#direct-access-to-the-raw-string)
* [Newline-Delimited JSON (ndjson) and JSON lines](#newline-delimited-json-ndjson-and-json-lines)
* [Parsing Numbers Inside Strings](#parsing-numbers-inside-strings)
* [Thread Safety](#thread-safety)
* [Standard Compliance](#standard-compliance)
@ -984,7 +985,7 @@ The `raw_json_token()` should be fast and free of allocation.
Newline-Delimited JSON (ndjson) and JSON lines
----------------------------------------------
The simdjson library also support multithreaded JSON streaming through a large file containing many
The simdjson library also supports multithreaded JSON streaming through a large file containing many
smaller JSON documents in either [ndjson](http://ndjson.org) or [JSON lines](http://jsonlines.org)
format. If your JSON documents all contain arrays or objects, we even support direct file
concatenation without whitespace. The concatenated file has no size restrictions (including larger
@ -1016,6 +1017,105 @@ If your documents are large (e.g., larger than a megabyte), then the `iterate_ma
See [iterate_many.md](iterate_many.md) for detailed information and design.
Parsing Numbers Inside Strings
------------------------------
Though the JSON specification allows for numbers and string values, many engineers choose to integrate the numbers inside strings, e.g., they prefer `{"a":"1.9"}` to`{"a":1.9}`.
The simdjson library supports parsing valid numbers inside strings which makes it more convenient for people working with those types of documents. This feature is supported through
three methods: `get_double_in_string`, `get_int64_in_string` and `get_uint64_in_string`. However, it is important to note that these methods are not substitute to the regular
`get_double`, `get_int64` and `get_uint64`. The usage of the `get_*_in_string` methods is solely to parse valid JSON numbers inside strings, and so we expect users to call these
methods appropriately. In particular, a valid JSON number has no leading and no trailing whitespaces, and the strings `"nan"`, `"1e"` and `"infinity"` will not be accepted as valid
numbers. As an example, suppose we have the following JSON text:
```c++
auto json =
{
"ticker":{
"base":"BTC",
"target":"USD",
"price":"443.7807865468",
"volume":"31720.1493969300",
"change":"Infinity",
"markets":[
{
"market":"bitfinex",
"price":"447.5000000000",
"volume":"10559.5293639000"
},
{
"market":"bitstamp",
"price":"448.5400000000",
"volume":"11628.2880079300"
},
{
"market":"btce",
"price":"432.8900000000",
"volume":"8561.0563600000"
}
]
},
"timestamp":1399490941,
"timestampstr":"1399490941"
}
```
Now, suppose that a user wants to get the time stamp from the `timestampstr` key. One could do the following:
```c++
ondemand::parser parser;
auto doc = parser.iterate(json);
uint64_t time = doc.at_pointer("/timestampstr").get_uint64_in_string();
std::cout << time << std::endl; // Prints 1399490941
```
Another thing a user might want to do is extract the `markets` array and get the market name, price and volume. Here is one way to do so:
```c++
ondemand::parser parser;
auto doc = parser.iterate(json);
// Getting markets array
ondemand::array markets = doc.find_field("ticker").find_field("markets").get_array();
// Iterating through markets array
for (auto value : markets) {
std::cout << "Market: " << value.find_field("market").get_string();
std::cout << "\tPrice: " << value.find_field("price").get_double_in_string();
std::cout << "\tVolume: " << value.find_field("volume").get_double_in_string() << std::endl;
}
/* The above prints
Market: bitfinex Price: 447.5 Volume: 10559.5
Market: bitstamp Price: 448.54 Volume: 11628.3
Market: btce Price: 432.89 Volume: 8561.06
*/
```
Finally, here is an example dealing with errors where the user wants to convert the string `"Infinity"`(`"change"` key) to a float with infinity value.
```c++
ondemand::parser parser;
auto doc = parser.iterate(json);
// Get "change"/"Infinity" key/value pair
ondemand::value value = doc.find_field("ticker").find_field("change");
double d;
std::string_view view;
auto error = value.get_double_in_string().get(d);
// Check if parsed value into double successfully
if (error) {
error = value.get_string().get(view);
if (error) { /* Handle error */ }
else if (view == "Infinity") {
d = std::numeric_limits::infinity();
}
else { /* Handle wrong value */ }
}
```
It is also important to note that when dealing an invalid number inside a string, simdjson will report a `NUMBER_ERROR` error if the string begins with a number whereas simdjson
will report a `INCORRECT_TYPE` error otherwise.
Thread Safety
-------------

View File

@ -513,6 +513,9 @@ simdjson_really_inline error_code parse_number(const uint8_t *const, W &writer)
simdjson_unused simdjson_really_inline simdjson_result<uint64_t> parse_unsigned(const uint8_t * const src) noexcept { return 0; }
simdjson_unused simdjson_really_inline simdjson_result<int64_t> parse_integer(const uint8_t * const src) noexcept { return 0; }
simdjson_unused simdjson_really_inline simdjson_result<double> parse_double(const uint8_t * const src) noexcept { return 0; }
simdjson_unused simdjson_really_inline simdjson_result<uint64_t> parse_unsigned_in_string(const uint8_t * const src) noexcept { return 0; }
simdjson_unused simdjson_really_inline simdjson_result<int64_t> parse_integer_in_string(const uint8_t * const src) noexcept { return 0; }
simdjson_unused simdjson_really_inline simdjson_result<double> parse_double_in_string(const uint8_t * const src) noexcept { return 0; }
#else
@ -773,6 +776,54 @@ simdjson_unused simdjson_really_inline simdjson_result<uint64_t> parse_unsigned(
return i;
}
// Parse any number from 0 to 18,446,744,073,709,551,615
simdjson_unused simdjson_really_inline simdjson_result<uint64_t> parse_unsigned_in_string(const uint8_t * const src) noexcept {
const uint8_t *p = src + 1;
//
// Parse the integer part.
//
// PERF NOTE: we don't use is_made_of_eight_digits_fast because large integers like 123456789 are rare
const uint8_t *const start_digits = p;
uint64_t i = 0;
while (parse_digit(*p, i)) { p++; }
// If there were no digits, or if the integer starts with 0 and has more than one digit, it's an error.
// Optimization note: size_t is expected to be unsigned.
size_t digit_count = size_t(p - start_digits);
// The longest positive 64-bit number is 20 digits.
// We do it this way so we don't trigger this branch unless we must.
// Optimization note: the compiler can probably merge
// ((digit_count == 0) || (digit_count > 20))
// into a single branch since digit_count is unsigned.
if ((digit_count == 0) || (digit_count > 20)) { return INCORRECT_TYPE; }
// Here digit_count > 0.
if (('0' == *start_digits) && (digit_count > 1)) { return NUMBER_ERROR; }
// We can do the following...
// if (!jsoncharutils::is_structural_or_whitespace(*p)) {
// return (*p == '.' || *p == 'e' || *p == 'E') ? INCORRECT_TYPE : NUMBER_ERROR;
// }
// as a single table lookup:
if (*p != '"') { return NUMBER_ERROR; }
if (digit_count == 20) {
// Positive overflow check:
// - A 20 digit number starting with 2-9 is overflow, because 18,446,744,073,709,551,615 is the
// biggest uint64_t.
// - A 20 digit number starting with 1 is overflow if it is less than INT64_MAX.
// If we got here, it's a 20 digit number starting with the digit "1".
// - If a 20 digit number starting with 1 overflowed (i*10+digit), the result will be smaller
// than 1,553,255,926,290,448,384.
// - That is smaller than the smallest possible 20-digit number the user could write:
// 10,000,000,000,000,000,000.
// - Therefore, if the number is positive and lower than that, it's overflow.
// - The value we are looking at is less than or equal to 9,223,372,036,854,775,808 (INT64_MAX).
//
if (src[0] != uint8_t('1') || i <= uint64_t(INT64_MAX)) { return INCORRECT_TYPE; }
}
return i;
}
// Parse any number from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
simdjson_unused simdjson_really_inline simdjson_result<int64_t> parse_integer(const uint8_t *src) noexcept {
//
@ -859,6 +910,48 @@ simdjson_unused simdjson_really_inline simdjson_result<int64_t> parse_integer(co
return negative ? (~i+1) : i;
}
// Parse any number from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
simdjson_unused simdjson_really_inline simdjson_result<int64_t> parse_integer_in_string(const uint8_t *src) noexcept {
//
// Check for minus sign
//
bool negative = (*(src + 1) == '-');
const uint8_t *p = src + negative + 1;
//
// Parse the integer part.
//
// PERF NOTE: we don't use is_made_of_eight_digits_fast because large integers like 123456789 are rare
const uint8_t *const start_digits = p;
uint64_t i = 0;
while (parse_digit(*p, i)) { p++; }
// If there were no digits, or if the integer starts with 0 and has more than one digit, it's an error.
// Optimization note: size_t is expected to be unsigned.
size_t digit_count = size_t(p - start_digits);
// We go from
// -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
// so we can never represent numbers that have more than 19 digits.
size_t longest_digit_count = 19;
// Optimization note: the compiler can probably merge
// ((digit_count == 0) || (digit_count > longest_digit_count))
// into a single branch since digit_count is unsigned.
if ((digit_count == 0) || (digit_count > longest_digit_count)) { return INCORRECT_TYPE; }
// Here digit_count > 0.
if (('0' == *start_digits) && (digit_count > 1)) { return NUMBER_ERROR; }
// We can do the following...
// if (!jsoncharutils::is_structural_or_whitespace(*p)) {
// return (*p == '.' || *p == 'e' || *p == 'E') ? INCORRECT_TYPE : NUMBER_ERROR;
// }
// as a single table lookup:
if(*p != '"') { return NUMBER_ERROR; }
// Negative numbers have can go down to - INT64_MAX - 1 whereas positive numbers are limited to INT64_MAX.
// Performance note: This check is only needed when digit_count == longest_digit_count but it is
// so cheap that we might as well always make it.
if(i > uint64_t(INT64_MAX) + uint64_t(negative)) { return INCORRECT_TYPE; }
return negative ? (~i+1) : i;
}
simdjson_unused simdjson_really_inline simdjson_result<double> parse_double(const uint8_t * src) noexcept {
//
// Check for minus sign
@ -1020,6 +1113,83 @@ simdjson_unused simdjson_really_inline simdjson_result<double> parse_double(cons
return d;
}
simdjson_unused simdjson_really_inline simdjson_result<double> parse_double_in_string(const uint8_t * src) noexcept {
//
// Check for minus sign
//
bool negative = (*(src + 1) == '-');
src += negative + 1;
//
// Parse the integer part.
//
uint64_t i = 0;
const uint8_t *p = src;
p += parse_digit(*p, i);
bool leading_zero = (i == 0);
while (parse_digit(*p, i)) { p++; }
// no integer digits, or 0123 (zero must be solo)
if ( p == src ) { return INCORRECT_TYPE; }
if ( (leading_zero && p != src+1)) { return NUMBER_ERROR; }
//
// Parse the decimal part.
//
int64_t exponent = 0;
bool overflow;
if (simdjson_likely(*p == '.')) {
p++;
const uint8_t *start_decimal_digits = p;
if (!parse_digit(*p, i)) { return NUMBER_ERROR; } // no decimal digits
p++;
while (parse_digit(*p, i)) { p++; }
exponent = -(p - start_decimal_digits);
// Overflow check. More than 19 digits (minus the decimal) may be overflow.
overflow = p-src-1 > 19;
if (simdjson_unlikely(overflow && leading_zero)) {
// Skip leading 0.00000 and see if it still overflows
const uint8_t *start_digits = src + 2;
while (*start_digits == '0') { start_digits++; }
overflow = start_digits-src > 19;
}
} else {
overflow = p-src > 19;
}
//
// Parse the exponent
//
if (*p == 'e' || *p == 'E') {
p++;
bool exp_neg = *p == '-';
p += exp_neg || *p == '+';
uint64_t exp = 0;
const uint8_t *start_exp_digits = p;
while (parse_digit(*p, exp)) { p++; }
// no exp digits, or 20+ exp digits
if (p-start_exp_digits == 0 || p-start_exp_digits > 19) { return NUMBER_ERROR; }
exponent += exp_neg ? 0-exp : exp;
}
if (*p != '"') { return NUMBER_ERROR; }
overflow = overflow || exponent < simdjson::internal::smallest_power || exponent > simdjson::internal::largest_power;
//
// Assemble (or slow-parse) the float
//
double d;
if (simdjson_likely(!overflow)) {
if (compute_float_64(exponent, i, negative, d)) { return d; }
}
if (!parse_float_fallback(src-negative, &d)) {
return NUMBER_ERROR;
}
return d;
}
} //namespace {}
#endif // SIMDJSON_SKIPNUMBERPARSING

View File

@ -64,12 +64,21 @@ simdjson_really_inline simdjson_result<object> document::get_object() & noexcept
simdjson_really_inline simdjson_result<uint64_t> document::get_uint64() noexcept {
return get_root_value_iterator().get_root_uint64();
}
simdjson_really_inline simdjson_result<uint64_t> document::get_uint64_in_string() noexcept {
return get_root_value_iterator().get_root_uint64_in_string();
}
simdjson_really_inline simdjson_result<int64_t> document::get_int64() noexcept {
return get_root_value_iterator().get_root_int64();
}
simdjson_really_inline simdjson_result<int64_t> document::get_int64_in_string() noexcept {
return get_root_value_iterator().get_root_int64_in_string();
}
simdjson_really_inline simdjson_result<double> document::get_double() noexcept {
return get_root_value_iterator().get_root_double();
}
simdjson_really_inline simdjson_result<double> document::get_double_in_string() noexcept {
return get_root_value_iterator().get_root_double_in_string();
}
simdjson_really_inline simdjson_result<std::string_view> document::get_string() noexcept {
return get_root_value_iterator().get_root_string();
}

View File

@ -53,6 +53,13 @@ public:
* @returns INCORRECT_TYPE If the JSON value is not a 64-bit unsigned integer.
*/
simdjson_really_inline simdjson_result<uint64_t> get_uint64() noexcept;
/**
* Cast this JSON value (inside string) to an unsigned integer.
*
* @returns A signed 64-bit integer.
* @returns INCORRECT_TYPE If the JSON value is not a 64-bit unsigned integer.
*/
simdjson_really_inline simdjson_result<uint64_t> get_uint64_in_string() noexcept;
/**
* Cast this JSON value to a signed integer.
*
@ -60,6 +67,13 @@ public:
* @returns INCORRECT_TYPE If the JSON value is not a 64-bit integer.
*/
simdjson_really_inline simdjson_result<int64_t> get_int64() noexcept;
/**
* Cast this JSON value (inside string) to a signed integer.
*
* @returns A signed 64-bit integer.
* @returns INCORRECT_TYPE If the JSON value is not a 64-bit integer.
*/
simdjson_really_inline simdjson_result<int64_t> get_int64_in_string() noexcept;
/**
* Cast this JSON value to a double.
*
@ -67,6 +81,14 @@ public:
* @returns INCORRECT_TYPE If the JSON value is not a valid floating-point number.
*/
simdjson_really_inline simdjson_result<double> get_double() noexcept;
/**
* Cast this JSON value (inside string) to a double.
*
* @returns A double.
* @returns INCORRECT_TYPE If the JSON value is not a valid floating-point number.
*/
simdjson_really_inline simdjson_result<double> get_double_in_string() noexcept;
/**
* Cast this JSON value to a string.
*
@ -408,6 +430,7 @@ public:
simdjson_really_inline simdjson_result<uint64_t> get_uint64() noexcept;
simdjson_really_inline simdjson_result<int64_t> get_int64() noexcept;
simdjson_really_inline simdjson_result<double> get_double() noexcept;
simdjson_really_inline simdjson_result<double> get_double_from_string() noexcept;
simdjson_really_inline simdjson_result<std::string_view> get_string() noexcept;
simdjson_really_inline simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::raw_json_string> get_raw_json_string() noexcept;
simdjson_really_inline simdjson_result<bool> get_bool() noexcept;

View File

@ -36,12 +36,21 @@ simdjson_really_inline simdjson_result<std::string_view> value::get_string() noe
simdjson_really_inline simdjson_result<double> value::get_double() noexcept {
return iter.get_double();
}
simdjson_really_inline simdjson_result<double> value::get_double_in_string() noexcept {
return iter.get_double_in_string();
}
simdjson_really_inline simdjson_result<uint64_t> value::get_uint64() noexcept {
return iter.get_uint64();
}
simdjson_really_inline simdjson_result<uint64_t> value::get_uint64_in_string() noexcept {
return iter.get_uint64_in_string();
}
simdjson_really_inline simdjson_result<int64_t> value::get_int64() noexcept {
return iter.get_int64();
}
simdjson_really_inline simdjson_result<int64_t> value::get_int64_in_string() noexcept {
return iter.get_int64_in_string();
}
simdjson_really_inline simdjson_result<bool> value::get_bool() noexcept {
return iter.get_bool();
}
@ -221,14 +230,26 @@ simdjson_really_inline simdjson_result<uint64_t> simdjson_result<SIMDJSON_IMPLEM
if (error()) { return error(); }
return first.get_uint64();
}
simdjson_really_inline simdjson_result<uint64_t> simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::value>::get_uint64_in_string() noexcept {
if (error()) { return error(); }
return first.get_uint64_in_string();
}
simdjson_really_inline simdjson_result<int64_t> simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::value>::get_int64() noexcept {
if (error()) { return error(); }
return first.get_int64();
}
simdjson_really_inline simdjson_result<int64_t> simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::value>::get_int64_in_string() noexcept {
if (error()) { return error(); }
return first.get_int64_in_string();
}
simdjson_really_inline simdjson_result<double> simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::value>::get_double() noexcept {
if (error()) { return error(); }
return first.get_double();
}
simdjson_really_inline simdjson_result<double> simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::value>::get_double_in_string() noexcept {
if (error()) { return error(); }
return first.get_double_in_string();
}
simdjson_really_inline simdjson_result<std::string_view> simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::value>::get_string() noexcept {
if (error()) { return error(); }
return first.get_string();

View File

@ -69,11 +69,19 @@ public:
/**
* Cast this JSON value to an unsigned integer.
*
* @returns A signed 64-bit integer.
* @returns A unsigned 64-bit integer.
* @returns INCORRECT_TYPE If the JSON value is not a 64-bit unsigned integer.
*/
simdjson_really_inline simdjson_result<uint64_t> get_uint64() noexcept;
/**
* Cast this JSON value (inside string) to a unsigned integer.
*
* @returns A unsigned 64-bit integer.
* @returns INCORRECT_TYPE If the JSON value is not a 64-bit unsigned integer.
*/
simdjson_really_inline simdjson_result<uint64_t> get_uint64_in_string() noexcept;
/**
* Cast this JSON value to a signed integer.
*
@ -82,6 +90,14 @@ public:
*/
simdjson_really_inline simdjson_result<int64_t> get_int64() noexcept;
/**
* Cast this JSON value (inside string) to a signed integer.
*
* @returns A signed 64-bit integer.
* @returns INCORRECT_TYPE If the JSON value is not a 64-bit integer.
*/
simdjson_really_inline simdjson_result<int64_t> get_int64_in_string() noexcept;
/**
* Cast this JSON value to a double.
*
@ -90,6 +106,14 @@ public:
*/
simdjson_really_inline simdjson_result<double> get_double() noexcept;
/**
* Cast this JSON value (inside string) to a double
*
* @returns A double.
* @returns INCORRECT_TYPE If the JSON value is not a valid floating-point number.
*/
simdjson_really_inline simdjson_result<double> get_double_in_string() noexcept;
/**
* Cast this JSON value to a string.
*
@ -416,8 +440,11 @@ public:
simdjson_really_inline simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::object> get_object() noexcept;
simdjson_really_inline simdjson_result<uint64_t> get_uint64() noexcept;
simdjson_really_inline simdjson_result<uint64_t> get_uint64_in_string() noexcept;
simdjson_really_inline simdjson_result<int64_t> get_int64() noexcept;
simdjson_really_inline simdjson_result<int64_t> get_int64_in_string() noexcept;
simdjson_really_inline simdjson_result<double> get_double() noexcept;
simdjson_really_inline simdjson_result<double> get_double_in_string() noexcept;
simdjson_really_inline simdjson_result<std::string_view> get_string() noexcept;
simdjson_really_inline simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::raw_json_string> get_raw_json_string() noexcept;
simdjson_really_inline simdjson_result<bool> get_bool() noexcept;

View File

@ -455,16 +455,31 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iter
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("uint64"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iterator::get_uint64_in_string() noexcept {
auto result = numberparsing::parse_unsigned_in_string(peek_non_root_scalar("uint64"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("uint64"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_int64() noexcept {
auto result = numberparsing::parse_integer(peek_non_root_scalar("int64"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("int64"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_int64_in_string() noexcept {
auto result = numberparsing::parse_integer_in_string(peek_non_root_scalar("int64"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("int64"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_double() noexcept {
auto result = numberparsing::parse_double(peek_non_root_scalar("double"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("double"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_double_in_string() noexcept {
auto result = numberparsing::parse_double_in_string(peek_non_root_scalar("double"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("double"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<bool> value_iterator::get_bool() noexcept {
auto result = parse_bool(peek_non_root_scalar("bool"));
if(result.error() != INCORRECT_TYPE) { advance_non_root_scalar("bool"); }
@ -496,6 +511,18 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iter
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("uint64"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> value_iterator::get_root_uint64_in_string() noexcept {
auto max_len = peek_start_length();
auto json = peek_root_scalar("uint64");
uint8_t tmpbuf[20+1]; // <20 digits> is the longest possible unsigned integer
if (!_json_iter->copy_to_buffer(json, max_len, tmpbuf)) {
logger::log_error(*_json_iter, start_position(), depth(), "Root number more than 20 characters");
return NUMBER_ERROR;
}
auto result = numberparsing::parse_unsigned_in_string(tmpbuf);
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("uint64"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_root_int64() noexcept {
auto max_len = peek_start_length();
auto json = peek_root_scalar("int64");
@ -509,6 +536,19 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_itera
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("int64"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> value_iterator::get_root_int64_in_string() noexcept {
auto max_len = peek_start_length();
auto json = peek_root_scalar("int64");
uint8_t tmpbuf[20+1]; // -<19 digits> is the longest possible integer
if (!_json_iter->copy_to_buffer(json, max_len, tmpbuf)) {
logger::log_error(*_json_iter, start_position(), depth(), "Root number more than 20 characters");
return NUMBER_ERROR;
}
auto result = numberparsing::parse_integer_in_string(tmpbuf);
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("int64"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_root_double() noexcept {
auto max_len = peek_start_length();
auto json = peek_root_scalar("double");
@ -524,6 +564,21 @@ simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterat
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("double"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<double> value_iterator::get_root_double_in_string() noexcept {
auto max_len = peek_start_length();
auto json = peek_root_scalar("double");
// Per https://www.exploringbinary.com/maximum-number-of-decimal-digits-in-binary-floating-point-numbers/,
// 1074 is the maximum number of significant fractional digits. Add 8 more digits for the biggest
// number: -0.<fraction>e-308.
uint8_t tmpbuf[1074+8+1];
if (!_json_iter->copy_to_buffer(json, max_len, tmpbuf)) {
logger::log_error(*_json_iter, start_position(), depth(), "Root number more than 1082 characters");
return NUMBER_ERROR;
}
auto result = numberparsing::parse_double_in_string(tmpbuf);
if(result.error() != INCORRECT_TYPE) { advance_root_scalar("double"); }
return result;
}
simdjson_warn_unused simdjson_really_inline simdjson_result<bool> value_iterator::get_root_bool() noexcept {
auto max_len = peek_start_length();
auto json = peek_root_scalar("bool");

View File

@ -283,16 +283,22 @@ public:
simdjson_warn_unused simdjson_really_inline simdjson_result<std::string_view> get_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<raw_json_string> get_raw_json_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> get_uint64() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> get_uint64_in_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> get_int64() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> get_int64_in_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<double> get_double() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<double> get_double_in_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<bool> get_bool() noexcept;
simdjson_really_inline bool is_null() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<std::string_view> get_root_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<raw_json_string> get_root_raw_json_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> get_root_uint64() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<uint64_t> get_root_uint64_in_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> get_root_int64() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<int64_t> get_root_int64_in_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<double> get_root_double() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<double> get_root_double_in_string() noexcept;
simdjson_warn_unused simdjson_really_inline simdjson_result<bool> get_root_bool() noexcept;
simdjson_really_inline bool is_root_null() noexcept;

View File

@ -13,6 +13,7 @@ add_cpp_test(ondemand_json_pointer_tests LABELS ondemand acceptance per_impl
add_cpp_test(ondemand_key_string_tests LABELS ondemand acceptance per_implementation)
add_cpp_test(ondemand_misc_tests LABELS ondemand acceptance per_implementation)
add_cpp_test(ondemand_number_tests LABELS ondemand acceptance per_implementation)
add_cpp_test(ondemand_number_in_string_tests LABELS ondemand acceptance per_implementation)
add_cpp_test(ondemand_object_tests LABELS ondemand acceptance per_implementation)
add_cpp_test(ondemand_object_error_tests LABELS ondemand acceptance per_implementation)
add_cpp_test(ondemand_ordering_tests LABELS ondemand acceptance per_implementation)

View File

@ -0,0 +1,273 @@
#include "simdjson.h"
#include "test_ondemand.h"
#include <string>
using namespace simdjson;
namespace number_in_string_tests {
const padded_string CRYPTO_JSON = R"(
{
"ticker":{
"base":"BTC",
"target":"USD",
"price":"443.7807865468",
"volume":"31720.1493969300",
"change":"Infinity",
"markets":[
{
"market":"bitfinex",
"price":"447.5000000000",
"volume":"10559.5293639000"
},
{
"market":"bitstamp",
"price":"448.5400000000",
"volume":"11628.2880079300"
},
{
"market":"btce",
"price":"432.8900000000",
"volume":"8561.0563600000"
}
]
},
"timestamp":1399490941,
"timestampstr":"1399490941"
}
)"_padded;
bool array_double() {
TEST_START();
auto json = R"(["1.2","2.3","-42.3","2.43442e3", "-1.234e3"])"_padded;
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(json).get(doc));
size_t counter{0};
std::vector<double> expected = {1.2, 2.3, -42.3, 2434.42, -1234};
double d;
for (auto value : doc) {
ASSERT_SUCCESS(value.get_double_in_string().get(d));
ASSERT_EQUAL(d,expected[counter++]);
}
TEST_SUCCEED();
}
bool array_int() {
TEST_START();
auto json = R"(["1", "2", "-3", "1000", "-7844"])"_padded;
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(json).get(doc));
size_t counter{0};
std::vector<int> expected = {1, 2, -3, 1000, -7844};
int64_t i;
for (auto value : doc) {
ASSERT_SUCCESS(value.get_int64_in_string().get(i));
ASSERT_EQUAL(i,expected[counter++]);
}
TEST_SUCCEED();
}
bool array_unsigned() {
TEST_START();
auto json = R"(["1", "2", "24", "9000", "156934"])"_padded;
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(json).get(doc));
size_t counter{0};
std::vector<int> expected = {1, 2, 24, 9000, 156934};
uint64_t u;
for (auto value : doc) {
ASSERT_SUCCESS(value.get_uint64_in_string().get(u));
ASSERT_EQUAL(u,expected[counter++]);
}
TEST_SUCCEED();
}
bool object() {
TEST_START();
auto json = R"({"a":"1.2", "b":"-2.342e2", "c":"22", "d":"-112358", "e":"1080", "f":"123456789"})"_padded;
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(json).get(doc));
size_t counter{0};
std::vector<double> expected = {1.2, -234.2, 22, -112358, 1080, 123456789};
double d;
int64_t i;
uint64_t u;
// Doubles
ASSERT_SUCCESS(doc.find_field("a").get_double_in_string().get(d));
ASSERT_EQUAL(d,expected[counter++]);
ASSERT_SUCCESS(doc.find_field("b").get_double_in_string().get(d));
ASSERT_EQUAL(d,expected[counter++]);
// Integers
ASSERT_SUCCESS(doc.find_field("c").get_int64_in_string().get(i));
ASSERT_EQUAL(i,expected[counter++]);
ASSERT_SUCCESS(doc.find_field("d").get_int64_in_string().get(i));
ASSERT_EQUAL(i,expected[counter++]);
// Unsigned integers
ASSERT_SUCCESS(doc.find_field("e").get_uint64_in_string().get(u));
ASSERT_EQUAL(u,expected[counter++]);
ASSERT_SUCCESS(doc.find_field("f").get_uint64_in_string().get(u));
ASSERT_EQUAL(u,expected[counter++]);
TEST_SUCCEED();
}
bool docs() {
TEST_START();
auto double_doc = R"( "-1.23e1" )"_padded;
auto int_doc = R"( "-243" )"_padded;
auto uint_doc = R"( "212213" )"_padded;
ondemand::parser parser;
ondemand::document doc;
double d;
int64_t i;
uint64_t u;
// Double
ASSERT_SUCCESS(parser.iterate(double_doc).get(doc));
ASSERT_SUCCESS(doc.get_double_in_string().get(d));
ASSERT_EQUAL(d,-12.3);
// Integer
ASSERT_SUCCESS(parser.iterate(int_doc).get(doc));
ASSERT_SUCCESS(doc.get_int64_in_string().get(i));
ASSERT_EQUAL(i,-243);
// Unsinged integer
ASSERT_SUCCESS(parser.iterate(uint_doc).get(doc));
ASSERT_SUCCESS(doc.get_uint64_in_string().get(u));
ASSERT_EQUAL(u,212213);
TEST_SUCCEED();
}
bool number_parsing_error() {
TEST_START();
auto json = R"( ["13.06.54", "1.0e", "2e3r4,,."])"_padded;
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(json).get(doc));
size_t counter{0};
std::string expected[3] = {"13.06.54", "1.0e", "2e3r4,,."};
for (auto value : doc) {
double d;
std::string_view view;
ASSERT_ERROR(value.get_double_in_string().get(d),NUMBER_ERROR);
ASSERT_SUCCESS(value.get_string().get(view));
ASSERT_EQUAL(view,expected[counter++]);
}
ASSERT_EQUAL(counter,3);
TEST_SUCCEED();
}
bool incorrect_type_error() {
TEST_START();
auto json = R"( ["e", "i", "pi", "one", "zero"])"_padded;
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(json).get(doc));
size_t counter{0};
std::string expected[5] = {"e", "i", "pi", "one", "zero"};
for (auto value : doc) {
double d;
std::string_view view;
ASSERT_ERROR(value.get_double_in_string().get(d),INCORRECT_TYPE);
ASSERT_SUCCESS(value.get_string().get(view));
ASSERT_EQUAL(view,expected[counter++]);
}
ASSERT_EQUAL(counter,5);
TEST_SUCCEED();
}
bool json_pointer_test() {
TEST_START();
auto json = R"( ["12.34", { "a":["3","5.6"], "b":{"c":"1.23e1"} }, ["1", "3.5"] ])"_padded;
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(json).get(doc));
std::vector<double> expected = {12.34, 5.6, 12.3, 1, 3.5};
size_t counter{0};
double d;
ASSERT_SUCCESS(doc.at_pointer("/0").get_double_in_string().get(d));
ASSERT_EQUAL(d,expected[counter++]);
ASSERT_SUCCESS(doc.at_pointer("/1/a/1").get_double_in_string().get(d));
ASSERT_EQUAL(d,expected[counter++]);
ASSERT_SUCCESS(doc.at_pointer("/1/b/c").get_double_in_string().get(d));
ASSERT_EQUAL(d,expected[counter++]);
ASSERT_SUCCESS(doc.at_pointer("/2/0").get_double_in_string().get(d));
ASSERT_EQUAL(d,expected[counter++]);
ASSERT_SUCCESS(doc.at_pointer("/2/1").get_double_in_string().get(d));
ASSERT_EQUAL(d,expected[counter++]);
ASSERT_EQUAL(counter,5);
TEST_SUCCEED();
}
bool crypto_timestamp() {
TEST_START();
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(CRYPTO_JSON).get(doc));
uint64_t u;
ASSERT_SUCCESS(doc.at_pointer("/timestampstr").get_uint64_in_string().get(u));
ASSERT_EQUAL(u,1399490941);
TEST_SUCCEED();
}
bool crypto_market() {
TEST_START();
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(CRYPTO_JSON).get(doc));
ondemand::array markets;
ASSERT_SUCCESS(doc.find_field("ticker").find_field("markets").get_array().get(markets));
std::string_view expected_views[3] = {"bitfinex", "bitstamp", "btce"};
double expected_prices[3] = {447.5, 448.54, 432.89};
double expected_volumes[3] = {10559.5293639, 11628.28800793, 8561.05636};
size_t counter{0};
for (auto value : markets) {
std::string_view view;
double price;
double volume;
ASSERT_SUCCESS(value.find_field("market").get_string().get(view));
ASSERT_EQUAL(view,expected_views[counter]);
ASSERT_SUCCESS(value.find_field("price").get_double_in_string().get(price));
ASSERT_EQUAL(price,expected_prices[counter]);
ASSERT_SUCCESS(value.find_field("volume").get_double_in_string().get(volume));
ASSERT_EQUAL(volume,expected_volumes[counter]);
counter++;
}
ASSERT_EQUAL(counter,3);
TEST_SUCCEED();
}
bool crypto_infinity() {
TEST_START();
ondemand::parser parser;
ondemand::document doc;
ASSERT_SUCCESS(parser.iterate(CRYPTO_JSON).get(doc));
ondemand::value value;
double d;
std::string_view view;
ASSERT_SUCCESS(doc.find_field("ticker").find_field("change").get(value));
ASSERT_ERROR(value.get_double_in_string().get(d), INCORRECT_TYPE);
ASSERT_SUCCESS(value.get_string().get(view));
ASSERT_EQUAL(view,"Infinity");
TEST_SUCCEED();
}
bool run() {
return array_double() &&
array_int() &&
array_unsigned() &&
object() &&
docs() &&
number_parsing_error() &&
incorrect_type_error() &&
json_pointer_test() &&
crypto_timestamp() &&
crypto_market() &&
crypto_infinity() &&
true;
}
} // number_in_string_tests
int main(int argc, char *argv[]) {
return test_main(argc, argv, number_in_string_tests::run);
}