diff --git a/HACKING.md b/HACKING.md index 1fbc3b83..df2fe9e0 100644 --- a/HACKING.md +++ b/HACKING.md @@ -324,10 +324,9 @@ merely parse the document. This may change in the future. - We parse integers and floating-point numbers as separate types which allows us to support large signed 64-bit integers in [-9223372036854775808,9223372036854775808), like a Java `long` or a C/C++ `long long` and large unsigned integers up to the value 18446744073709551615. Among the parsers that differentiate between integers and floating-point numbers, not all support 64-bit integers. (For example, sajson rejects JSON files with integers larger than or equal to 2147483648. RapidJSON will parse a file containing an overly long integer like 18446744073709551616 as a floating-point number.) When we cannot represent exactly an integer as a signed or unsigned 64-bit value, we reject the JSON document. - We support the full range of 64-bit floating-point numbers (binary64). The values range from ` std::numeric_limits::lowest()` to `std::numeric_limits::max()`, so from -1.7976e308 all the way to 1.7975e308. Extreme values (less or equal to -1e308, greater or equal to 1e308) are rejected: we refuse to parse the input document. - We test for accurate float parsing with a perfect accuracy (ULP 0). Many parsers offer only approximate floating parsing. For example, RapidJSON also offers the option of accurate float parsing (`kParseFullPrecisionFlag`) but it comes at a significant performance penalty compared to the default settings. By default, RapidJSON tolerates an error of 3 ULP. -- We do full UTF-8 validation as part of the parsing. (Parsers like fastjson, gason and dropbox json11 do not do UTF-8 validation. The sajson parser does incomplete UTF-8 validation, accepting code point -sequences like 0xb1 0x87.) -- We fully validate the numbers. (Parsers like gason and ultranjson will accept `[0e+]` as valid JSON.) -- We validate string content for unescaped characters. (Parsers like fastjson and ultrajson accept unescaped line breaks and tabs in strings.) +- The simdjson library does full UTF-8 validation as part of the parsing. Parsers like fastjson, gason and dropbox json11 do not do UTF-8 validation. The sajson parser does incomplete UTF-8 validation, accepting code point sequences like 0xb1 0x87. +- We fully validate the numbers. Parsers like gason and ultranjson will accept `[0e+]` as valid JSON. +- We validate string content for unescaped characters. Parsers like fastjson and ultrajson accept unescaped line breaks and tabs in strings. - We fully validate the white-space characters outside of the strings. Parsers like RapidJSON will accept JSON documents with null characters outside of strings. ### Architecture