Improving the documentation.
This commit is contained in:
parent
771e9cd68a
commit
3c0f5a3fe4
|
@ -325,9 +325,8 @@ The parser builds a useful immutable (read-only) DOM (document-object model) whi
|
|||
|
||||
To simplify the engineering, we make some assumptions.
|
||||
|
||||
- We support UTF-8 (and thus ASCII), nothing else (no Latin, no UTF-16). We do not believe this is a genuine limitation, because we do not think there is any serious application that needs to process JSON data without an ASCII or UTF-8 encoding.
|
||||
- We support UTF-8 (and thus ASCII), nothing else (no Latin, no UTF-16). We do not believe this is a genuine limitation, because we do not think there is any serious application that needs to process JSON data without an ASCII or UTF-8 encoding. If the UTF-8 contains a leading BOM, it should be omitted: the user is responsible for detecting and skipping the BOM; UTF-8 BOMs are discouraged.
|
||||
- All strings in the JSON document may have up to 4294967295 bytes in UTF-8 (4GB). To enforce this constraint, we refuse to parse a document that contains more than 4294967295 bytes (4GB). This should accommodate most JSON documents.
|
||||
- In cases of failure, we report a failure without any indication to the nature of the problem. (This can be easily improved without affecting performance.)
|
||||
- As allowed by the specification, we allow repeated keys within an object (other parsers like sajson do the same).
|
||||
- Performance is optimized for JSON documents spanning at least a tens kilobytes up to many megabytes: the performance issues with having to parse many tiny JSON documents or one truly enormous JSON document are different.
|
||||
|
||||
|
@ -339,9 +338,11 @@ _We do not aim to provide a general-purpose JSON library._ A library like RapidJ
|
|||
- We parse integers and floating-point numbers as separate types which allows us to support large 64-bit integers in [-9223372036854775808,9223372036854775808), like a Java `long` or a C/C++ `long long`. Among the parsers that differentiate between integers and floating-point numbers, not all support 64-bit integers. (For example, sajson rejects JSON files with integers larger than or equal to 2147483648. RapidJSON will parse a file containing an overly long integer like 18446744073709551616 as a floating-point number.) When we cannot represent exactly an integer as a signed 64-bit value, we reject the JSON document.
|
||||
- We support the full range of 64-bit floating-point numbers (binary64). The values range from ` std::numeric_limits<double>::lowest()` to `std::numeric_limits<double>::max()`, so from -1.7976e308 all the way to 1.7975e308. Extreme values (less or equal to -1e308, greater or equal to 1e308) are rejected: we refuse to parse the input document.
|
||||
- We aim for accurate float parsing with a bound on the [unit of least precision (ULP)](https://en.wikipedia.org/wiki/Unit_in_the_last_place) of one.
|
||||
- We do full UTF-8 validation as part of the parsing. (Parsers like fastjson, gason and dropbox json11 do not do UTF-8 validation.)
|
||||
- We do full UTF-8 validation as part of the parsing. (Parsers like fastjson, gason and dropbox json11 do not do UTF-8 validation. The sajson parser does incomplete UTF-8 validation, accepting code point
|
||||
sequences like 0xb1 0x87.)
|
||||
- We fully validate the numbers. (Parsers like gason and ultranjson will accept `[0e+]` as valid JSON.)
|
||||
- We validate string content for unescaped characters. (Parsers like fastjson and ultrajson accept unescaped line breaks and tabs in strings.)
|
||||
- We fully validate the white-space characters outside of the strings. Parsers like RapidJSON will accept JSON documents with null characters outside of strings.
|
||||
|
||||
## Architecture
|
||||
|
||||
|
|
|
@ -74,6 +74,10 @@ int json_parse_implementation(const uint8_t *buf, size_t len, ParsedJson &pj, bo
|
|||
}
|
||||
|
||||
// Parse a document found in buf.
|
||||
//
|
||||
// The content should be a valid JSON document encoded as UTF-8. If there is a UTF-8 BOM, the caller
|
||||
// is responsible for omitting it, UTF-8 BOM are discouraged.
|
||||
//
|
||||
// You need to preallocate ParsedJson with a capacity of len (e.g., pj.allocateCapacity(len)).
|
||||
//
|
||||
// The function returns simdjson::SUCCESS (an integer = 0) in case of a success or an error code from
|
||||
|
@ -94,6 +98,10 @@ inline int json_parse(const uint8_t *buf, size_t len, ParsedJson &pj, bool reall
|
|||
}
|
||||
|
||||
// Parse a document found in buf.
|
||||
//
|
||||
// The content should be a valid JSON document encoded as UTF-8. If there is a UTF-8 BOM, the caller
|
||||
// is responsible for omitting it, UTF-8 BOM are discouraged.
|
||||
//
|
||||
// You need to preallocate ParsedJson with a capacity of len (e.g., pj.allocateCapacity(len)).
|
||||
//
|
||||
// The function returns simdjson::SUCCESS (an integer = 0) in case of a success or an error code from
|
||||
|
@ -131,6 +139,10 @@ inline int json_parse(const std::string &s, ParsedJson &pj) {
|
|||
}
|
||||
|
||||
// Parse a document found in in string s.
|
||||
//
|
||||
// The content should be a valid JSON document encoded as UTF-8. If there is a UTF-8 BOM, the caller
|
||||
// is responsible for omitting it, UTF-8 BOM are discouraged.
|
||||
//
|
||||
// You need to preallocate ParsedJson with a capacity of len (e.g., pj.allocateCapacity(len)).
|
||||
//
|
||||
// The function returns simdjson::SUCCESS (an integer = 0) in case of a success or an error code from
|
||||
|
@ -150,8 +162,11 @@ inline int json_parse(const padded_string &s, ParsedJson &pj) {
|
|||
// If reallocifneeded is true (default) then a temporary buffer is created when needed during processing
|
||||
// (a copy of the input string is made).
|
||||
//
|
||||
// the input buf should be readable up to buf + len + SIMDJSON_PADDING if reallocifneeded is false,
|
||||
// The input buf should be readable up to buf + len + SIMDJSON_PADDING if reallocifneeded is false,
|
||||
// all bytes at and after buf + len are ignored (can be garbage).
|
||||
//
|
||||
// The content should be a valid JSON document encoded as UTF-8. If there is a UTF-8 BOM, the caller
|
||||
// is responsible for omitting it, UTF-8 BOM are discouraged.
|
||||
//
|
||||
// This is a convenience function which calls json_parse.
|
||||
WARN_UNUSED
|
||||
|
@ -162,9 +177,14 @@ WARN_UNUSED
|
|||
// by calling pj.isValid(). This does the memory allocation needed for ParsedJson.
|
||||
// If reallocifneeded is true (default) then a temporary buffer is created when needed during processing
|
||||
// (a copy of the input string is made).
|
||||
//
|
||||
// The input buf should be readable up to buf + len + SIMDJSON_PADDING if reallocifneeded is false,
|
||||
// all bytes at and after buf + len are ignored (can be garbage).
|
||||
//
|
||||
//
|
||||
// The content should be a valid JSON document encoded as UTF-8. If there is a UTF-8 BOM, the caller
|
||||
// is responsible for omitting it, UTF-8 BOM are discouraged.
|
||||
//
|
||||
// This is a convenience function which calls json_parse.
|
||||
inline ParsedJson build_parsed_json(const char * buf, size_t len, bool reallocifneeded = true) {
|
||||
return build_parsed_json(reinterpret_cast<const uint8_t *>(buf), len, reallocifneeded);
|
||||
|
@ -182,6 +202,9 @@ ParsedJson build_parsed_json(const char *buf) = delete;
|
|||
//
|
||||
// A temporary buffer is created when needed during processing
|
||||
// (a copy of the input string is made).
|
||||
//
|
||||
// The content should be a valid JSON document encoded as UTF-8. If there is a UTF-8 BOM, the caller
|
||||
// is responsible for omitting it, UTF-8 BOM are discouraged.
|
||||
//
|
||||
// This is a convenience function which calls json_parse.
|
||||
WARN_UNUSED
|
||||
|
@ -194,6 +217,9 @@ inline ParsedJson build_parsed_json(const std::string &s) {
|
|||
// You need to preallocate ParsedJson with a capacity of len (e.g., pj.allocateCapacity(len)).
|
||||
// Return SUCCESS (an integer = 0) in case of a success. You can also check validity
|
||||
// by calling pj.isValid(). The same ParsedJson can be reused for other documents.
|
||||
//
|
||||
// The content should be a valid JSON document encoded as UTF-8. If there is a UTF-8 BOM, the caller
|
||||
// is responsible for omitting it, UTF-8 BOM are discouraged.
|
||||
//
|
||||
// This is a convenience function which calls json_parse.
|
||||
WARN_UNUSED
|
||||
|
|
Loading…
Reference in New Issue