From ef45cd334247180ca97a17821c54bcb2448ce459 Mon Sep 17 00:00:00 2001 From: Daniel Lemire Date: Thu, 6 Aug 2020 18:24:36 -0400 Subject: [PATCH] Let us be explicit about standard compliance (#1099) * Let us be explicit about standard compliance * More explicit. --- doc/basics.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/doc/basics.md b/doc/basics.md index 083df1c9..5d278cf5 100644 --- a/doc/basics.md +++ b/doc/basics.md @@ -19,6 +19,7 @@ An overview of what you need to know to use simdjson, with examples. * [Tree Walking and JSON Element Types](#tree-walking-and-json-element-types) * [Newline-Delimited JSON (ndjson) and JSON lines](#newline-delimited-json-ndjson-and-json-lines) * [Thread Safety](#thread-safety) +* [Standard Compliance](#standard-compliance) Requirements @@ -568,6 +569,25 @@ The parsed results (`dom::document`, `dom::element`, `array`, `object`) depend o The CPU detection, which runs the first time parsing is attempted and switches to the fastest parser for your CPU, is transparent and thread-safe. + +Standard Compliance +-------------------- + +The simdjson library is fully compliant with the [RFC 8259](https://www.tbray.org/ongoing/When/201x/2017/12/14/rfc8259.html) JSON specification. + +- The only insignificant whitespace characters allowed are the space, the horizontal tab, the line feed and the carriage return. In particular, a JSON document may not contain an unespaced null character. +- A single string or a single number is considered to be a valid JSON document. +- We fully validate the numbers according to the JSON specification. For example, the string `01` is not valid JSON document since the specification states that *leading zeros are not allowed*. +- The specification allows implementations to set limits on the range and precision of numbers accepted. We support 64-bit floating-point numbers as well as integer values. + - We parse integers and floating-point numbers as separate types which allows us to support all signed (two complement's) 64-bit integers, like a Java `long` or a C/C++ `long long` and all 64-bit unsigned integers. When we cannot represent exactly an integer as a signed or unsigned 64-bit value, we reject the JSON document. + - We support the full range of 64-bit floating-point numbers (binary64). The values range from `std::numeric_limits::lowest()` to `std::numeric_limits::max()`, so from -1.7976e308 all the way to 1.7975e308. Extreme values (less or equal to -1e308, greater or equal to 1e308) are rejected: we refuse to parse the input document. Numbers are parsed with with a perfect accuracy (ULP 0): the nearest floating-point value is chosen, rounding to even when needed. If you serialized your floating-point numbers with 17 significant digits in a standard compliant manner, the simdjson library is guaranteed to recovere the example same numbers, exactly. +- The specification states that JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8. The simdjson library does full UTF-8 validation as part of the parsing. The specification states that implementations MUST NOT add a byte order mark: the simdjson library rejects documents starting with a byte order mark. +- The simdjson library validates string content for unescaped characters. Unescaped line breaks and tabs in strings are not allowed. +- The simdjson library accepts objects with repeated keys: all of the name/value pairs, including duplicates, are reported. We do not enforce key uniqueness. +- The specification states that an implementation may set limits on the size of texts that it accepts. The simdjson library limits single JSON documents to 4 GiB. It will refuse to parse a JSON document larger than 4294967295 bytes. (This limitation does not apply to streams of JSON documents, only to single JSON documents.) +- The specification states that an implementation may set limits on the maximum depth of nesting. By default, the simdjson will refuse to parse documents with a depth exceeding 1024. + + Backwards Compatibility -----------------------