This adds new tests regarding ordering. (#1233)

* This adds new tests regarding ordering.

* Updating the documentation with more examples.

* Adding compilation tests.

* Pruning code for exceptions.

* Guarding exceptionless.
This commit is contained in:
Daniel Lemire 2020-10-15 16:41:14 -04:00 committed by GitHub
parent 001be23258
commit 3cd98df30d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 112 additions and 3 deletions

View File

@ -466,7 +466,7 @@ in production systems:
if it was `nullptr` but did not care what the actual value was--it will iterate. The destructor automates
the iteration.
### Applicability and Limitations of the On Demand Approach
### Limitations of the On Demand Approach
We expect that the On Demand approach has many of the performance benefits of the schema-based approach, while providing a flexibility that is similar to that of the DOM-based approach. However, there are some limitations.
@ -480,12 +480,23 @@ Cons of the On Demand approach:
There are currently additional technical limitations which we expect to resolve in future releases of the simdjson library:
* We intend to help users who wish to use the On Demand API but require support for order-insensitive semantics, but in our current implementation support for out-of-order keys (if needed) must be provided by the programmer.
* The simdjson library offers runtime dispatching which allows you to compile one binary and have it run at full speed on different processors, taking advantage of the specific features of the processor. The On Demand API does not have runtime dispatch support at this time. To benefit from the On Demand API, you must compile your code for a specific processor. E.g., if your processor supports AVX2 instructions, you should compile your binary executable with AVX2 instruction support (by using your compiler's commands). If you are sufficiently technically proficient, you can implement runtime dispatching within your application, by compiling your On Demand code for different processors.
* There is an initial phase which scans the entire document quickly, irrespective of the size of the document. We plan to break this phase into distinct steps for large files in a future release as we have done with other components of our API (e.g., `parse_many`).
* The On Demand API does not support JSON Pointer. This capability is currently limited to our core API.
* We intend to help users who wish to use the On Demand API but require support for order-insensitive semantics, but in our current implementation support for out-of-order keys (if needed) must be provided by the programmer. Currently, one might proceed in the following manner as a fallback measure if keys can appear in any order:
```C++
for (ondemand::object my_object : doc["mykey"]) {
for (auto field : my_object) {
if (field.key() == "key_value1") { process1(field.value()); }
else if (field.key() == "key_value2") { process2(field.value()); }
else if (field.key() == "key_value3") { process3(field.value()); }
}
}
```
Hence, at this time we recommend the On Demand API in the following cases:
### Applicability of the On Demand Approach
At this time we recommend the On Demand API in the following cases:
1. The 64-bit hardware (CPU) used to run the software is known at compile time. If you need runtime dispatching because you cannot be certain of the hardware used to run your software, you will be better served with the core simdjson API. (This only applies to x64 (AMD/Intel). On 64-bit ARM hardware, runtime dispatching is unnecessary.)
2. The used parts of JSON files do not need to be validated and the layout of the nodes is in a known order. If you are receiving JSON from other systems, you might be better served with core simdjson API as it fully validates the JSON inputs and allows you to navigate through the document at will.

View File

@ -21,6 +21,28 @@
using namespace simdjson;
using namespace simdjson::builtin;
#if SIMDJSON_EXCEPTIONS
// bogus functions for compilation tests
void process1(int ) {}
void process2(int ) {}
void process3(int ) {}
// Do not run this, it is only meant to compile
void compilation_test_1() {
const padded_string bogus = ""_padded;
ondemand::parser parser;
auto doc = parser.iterate(bogus);
for (ondemand::object my_object : doc["mykey"]) {
for (auto field : my_object) {
if (field.key() == "key_value1") { process1(field.value()); }
else if (field.key() == "key_value2") { process2(field.value()); }
else if (field.key() == "key_value3") { process3(field.value()); }
}
}
}
#endif
#define ONDEMAND_SUBTEST(NAME, JSON, TEST) \
{ \
std::cout << "- Subtest " << (NAME) << " - JSON: " << (JSON) << " ..." << std::endl; \
@ -637,6 +659,81 @@ namespace dom_api_tests {
}
}
namespace ordering_tests {
using namespace std;
using namespace simdjson;
using namespace simdjson::dom;
#if SIMDJSON_EXCEPTIONS
auto json = "{\"coordinates\":[{\"x\":1.1,\"y\":2.2,\"z\":3.3}]}"_padded;
bool in_order() {
TEST_START();
ondemand::parser parser{};
auto doc = parser.iterate(json);
double x{0};
double y{0};
double z{0};
for (ondemand::object point_object : doc["coordinates"]) {
x += double(point_object["x"]);
y += double(point_object["y"]);
z += double(point_object["z"]);
}
return (x == 1.1) && (y == 2.2) && (z == 3.3);
}
bool out_of_order() {
TEST_START();
ondemand::parser parser{};
auto doc = parser.iterate(json);
double x{0};
double y{0};
double z{0};
for (ondemand::object point_object : doc["coordinates"]) {
z += double(point_object["z"]);
try {
x += double(point_object["x"]);
return false;
} catch(simdjson_error&) {}
try {
y += double(point_object["y"]);
return false;
} catch(simdjson_error&) {}
}
return (x == 0) && (y == 0) && (z == 3.3);
}
bool robust_order() {
TEST_START();
ondemand::parser parser{};
auto doc = parser.iterate(json);
double x{0};
double y{0};
double z{0};
for (ondemand::object point_object : doc["coordinates"]) {
for (auto field : point_object) {
if (field.key() == "z") { z += double(field.value()); }
else if (field.key() == "x") { x += double(field.value()); }
else if (field.key() == "y") { y += double(field.value()); }
}
}
return (x == 1.1) && (y == 2.2) && (z == 3.3);
}
#endif
bool run() {
return
#if SIMDJSON_EXCEPTIONS
in_order() &&
out_of_order() &&
robust_order() &&
#endif
true;
}
}
namespace twitter_tests {
using namespace std;
using namespace simdjson;
@ -1251,6 +1348,7 @@ int main(int argc, char *argv[]) {
// twitter_tests::run() &&
// number_tests::run() &&
error_tests::run() &&
ordering_tests::run() &&
true
) {
std::cout << "Basic tests are ok." << std::endl;