Fix for issue 1246. We document the relationship between parser instances and elements (#1250)

* Fix for issue 1246.

* Adopting John's wording.
This commit is contained in:
Daniel Lemire 2020-10-26 08:40:45 -04:00 committed by GitHub
parent 6a86ef5a7d
commit a75c07065f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 41 additions and 1 deletions

View File

@ -112,6 +112,9 @@ dom::element doc = parser.parse("[1,2,3]"_padded); // parse a string, the _padde
```
The parsed document resulting from the `parser.load` and `parser.parse` calls depends on the `parser` instance. Thus the `parser` instance must remain in scope. Furthermore, you must have at most one parsed document in play per `parser` instance.
You cannot copy a `parser` instance, you may only move it.
If you need to keep a document around long term, you can keep or move the parser instance. Note that moving a parser instance, or keeping one in a movable data structure like vector or map, can cause any outstanding `element`, `object` or `array` instances to be invalidated. If you need to store a parser in a movable data structure, you should use a `std::unique_ptr` to avoid this invalidation(e.g., `std::unique_ptr<dom::parser> parser(new dom::parser{})`).
During the`load` or `parse` calls, neither the input file nor the input string are ever modified. After calling `load` or `parse`, the source (either a file or a string) can be safely discarded. All of the JSON data is stored in the `parser` instance. The parsed document is also immutable in simdjson: you do not modify it by accessing it.

View File

@ -93,6 +93,9 @@ dom::element doc = parser.parse("[1,2,3]"_padded); // parse a string, the _padde
```
The parsed document resulting from the `parser.load` and `parser.parse` calls depends on the `parser` instance. Thus the `parser` instance must remain in scope. Furthermore, you must have at most one parsed document in play per `parser` instance.
You cannot copy a `parser` instance, you may only move it.
If you need to keep a document around long term, you can keep or move the parser instance. Note that moving a parser instance, or keeping one in a movable data structure like vector or map, can cause any outstanding `element`, `object` or `array` instances to be invalidated. If you need to store a parser in a movable data structure, you should use a `std::unique_ptr` to avoid this invalidation(e.g., `std::unique_ptr<dom::parser> parser(new dom::parser{})`).
During the`load` or `parse` calls, neither the input file nor the input string are ever modified. After calling `load` or `parse`, the source (either a file or a string) can be safely discarded. All of the JSON data is stored in the `parser` instance. The parsed document is also immutable in simdjson: you do not modify it by accessing it.

View File

@ -29,6 +29,15 @@ static constexpr size_t DEFAULT_BATCH_SIZE = 1000000;
* as well as memory for a single document. The parsed document is overwritten on each parse.
*
* This class cannot be copied, only moved, to avoid unintended allocations.
*
* @note Moving a parser instance may invalidate "dom::element" instances. If you need to
* preserve both the "dom::element" instances and the parser, consider wrapping the parser
* instance in a std::unique_ptr instance:
*
* std::unique_ptr<dom::parser> parser(new dom::parser{});
* auto error = parser->load(f).get(root);
*
* You can then move std::unique_ptr safely.
*
* @note This is not thread safe: one parser cannot produce two documents at the same time!
*/
@ -79,6 +88,10 @@ public:
* documents because it reuses the same buffers, but you *must* use the document before you
* destroy the parser or call parse() again.
*
* Moving the parser instance is safe, but it invalidates the element instances. You may store
* the parser instance without moving it by wrapping it inside an `unique_ptr` instance like
* so: `std::unique_ptr<dom::parser> parser(new dom::parser{});`.
*
* ### Parser Capacity
*
* If the parser's current capacity is less than the file length, it will allocate enough capacity
@ -108,6 +121,10 @@ public:
* The JSON document still lives in the parser: this is the most efficient way to parse JSON
* documents because it reuses the same buffers, but you *must* use the document before you
* destroy the parser or call parse() again.
*
* Moving the parser instance is safe, but it invalidates the element instances. You may store
* the parser instance without moving it by wrapping it inside an `unique_ptr` instance like
* so: `std::unique_ptr<dom::parser> parser(new dom::parser{});`.
*
* ### REQUIRED: Buffer Padding
*

View File

@ -334,7 +334,23 @@ namespace parse_api_tests {
const padded_string BASIC_JSON = "[1,2,3]"_padded;
const padded_string BASIC_NDJSON = "[1,2,3]\n[4,5,6]"_padded;
const padded_string EMPTY_NDJSON = ""_padded;
bool parser_moving_parser() {
std::cout << "Running " << __func__ << std::endl;
typedef std::tuple<std::string, std::unique_ptr<parser>,element> simdjson_tuple;
std::vector<simdjson_tuple> results;
std::vector<std::string> my_data = {"[1,2,3]", "[1,2,3]", "[1,2,3]"};
for (std::string s : my_data) {
std::unique_ptr<dom::parser> parser(new dom::parser{});
element root;
ASSERT_SUCCESS( parser->parse(s).get(root) );
results.emplace_back(s, std::move(parser), root);
}
for (auto &t : results) {
std::cout << "reserialized: " << simdjson::to_string(std::get<2>(t)) << " ...\n";
}
return true;
}
bool parser_parse() {
std::cout << "Running " << __func__ << std::endl;
dom::parser parser;
@ -502,7 +518,8 @@ namespace parse_api_tests {
#endif
bool run() {
return parser_parse() &&
return parser_moving_parser() &&
parser_parse() &&
parser_parse_many() &&
parser_parse_many_deprecated() &&
parser_parse_many_empty() &&