simdjson/benchmark/parse_stream.cpp

192 lines
6.2 KiB
C++
Raw Normal View History

Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
#include <algorithm>
#include <chrono>
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
#include <iostream>
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
#include <map>
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
#include <vector>
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
2020-03-06 03:05:37 +08:00
#include "simdjson.h"
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
#define NB_ITERATION 20
#define MIN_BATCH_SIZE 10000
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
#define MAX_BATCH_SIZE 10000000
bool test_baseline = false;
bool test_per_batch = true;
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
bool test_best_batch = false;
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
bool compare(std::pair<size_t, double> i, std::pair<size_t, double> j) {
return i.second > j.second;
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
}
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
int main(int argc, char *argv[]) {
if (argc <= 1) {
std::cerr << "Usage: " << argv[0] << " <jsonfile>" << std::endl;
exit(1);
}
const char *filename = argv[1];
auto[p, err] = simdjson::padded_string::load(filename);
if (err) {
std::cerr << "Could not load the file " << filename << std::endl;
return EXIT_FAILURE;
}
if (test_baseline) {
std::wclog << "Baseline: Getline + normal parse... " << std::endl;
std::cout << "Gigabytes/second\t"
<< "Nb of documents parsed" << std::endl;
for (auto i = 0; i < 3; i++) {
// Actual test
simdjson::dom::parser parser;
simdjson::error_code alloc_error = parser.allocate(p.size());
if (alloc_error) {
std::cerr << alloc_error << std::endl;
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
return EXIT_FAILURE;
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
}
std::istringstream ss(std::string(p.data(), p.size()));
auto start = std::chrono::steady_clock::now();
int count = 0;
std::string line;
int parse_res = simdjson::SUCCESS;
while (getline(ss, line)) {
// TODO we're likely triggering simdjson's padding reallocation here. Is
// that intentional?
parser.parse(line);
count++;
}
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> secs = end - start;
double speedinGBs = static_cast<double>(p.size()) /
(static_cast<double>(secs.count()) * 1000000000.0);
std::cout << speedinGBs << "\t\t\t\t" << count << std::endl;
if (parse_res != simdjson::SUCCESS) {
std::cerr << "Parsing failed" << std::endl;
exit(1);
}
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
}
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
}
std::map<size_t, double> batch_size_res;
if (test_per_batch) {
std::wclog << "parse_many: Speed per batch_size... from " << MIN_BATCH_SIZE
<< " bytes to " << MAX_BATCH_SIZE << " bytes..." << std::endl;
std::cout << "Batch Size\t"
<< "Gigabytes/second\t"
<< "Nb of documents parsed" << std::endl;
for (size_t i = MIN_BATCH_SIZE; i <= MAX_BATCH_SIZE;
i += (MAX_BATCH_SIZE - MIN_BATCH_SIZE) / 100) {
batch_size_res.insert(std::pair<size_t, double>(i, 0));
int count;
for (size_t j = 0; j < 5; j++) {
// Actual test
simdjson::dom::parser parser;
simdjson::error_code error;
auto start = std::chrono::steady_clock::now();
count = 0;
simdjson::dom::document_stream docs;
if ((error = parser.parse_many(p, i).get(docs))) {
std::wcerr << "Parsing failed with: " << error << std::endl;
exit(1);
}
for (auto result : docs) {
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
error = result.error();
if (error) {
std::wcerr << "Parsing failed with: " << error << std::endl;
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
exit(1);
}
count++;
2020-03-06 03:05:37 +08:00
}
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> secs = end - start;
double speedinGBs = static_cast<double>(p.size()) /
(static_cast<double>(secs.count()) * 1000000000.0);
if (speedinGBs > batch_size_res.at(i))
batch_size_res[i] = speedinGBs;
}
std::cout << i << "\t\t" << std::fixed << std::setprecision(3)
<< batch_size_res.at(i) << "\t\t\t\t" << count << std::endl;
2020-03-06 03:05:37 +08:00
}
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
}
size_t optimal_batch_size{};
double best_speed{};
if (test_per_batch) {
std::pair<size_t, double> best_results;
best_results =
(*min_element(batch_size_res.begin(), batch_size_res.end(), compare));
optimal_batch_size = best_results.first;
best_speed = best_results.second;
} else {
optimal_batch_size = MIN_BATCH_SIZE;
}
std::wclog << "Seemingly optimal batch_size: " << optimal_batch_size << "..."
<< std::endl;
std::wclog << "Best speed: " << best_speed << "..." << std::endl;
if (test_best_batch) {
std::wclog << "Starting speed test... Best of " << NB_ITERATION
<< " iterations..." << std::endl;
std::vector<double> res;
for (int i = 0; i < NB_ITERATION; i++) {
// Actual test
simdjson::dom::parser parser;
simdjson::error_code error;
auto start = std::chrono::steady_clock::now();
// This includes allocation of the parser
simdjson::dom::document_stream docs;
if ((error = parser.parse_many(p, optimal_batch_size).get(docs))) {
std::wcerr << "Parsing failed with: " << error << std::endl;
exit(1);
}
for (auto result : docs) {
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
error = result.error();
if (error) {
std::wcerr << "Parsing failed with: " << error << std::endl;
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
exit(1);
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
}
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
}
auto end = std::chrono::steady_clock::now();
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
std::chrono::duration<double> secs = end - start;
res.push_back(secs.count());
2020-03-06 03:05:37 +08:00
}
Streams of JSON documents + Large files (>4GB) (#350) (#364) * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * Fix for https://github.com/lemire/simdjson/issues/345 * Follow up test and fix for https://github.com/lemire/simdjson/issues/345 (#347) * Final (?) fix for https://github.com/lemire/simdjson/issues/345 * Verbose basictest * Being more forgiving of powers of ten. * Let us zero the tail end. * add basic fuzzers (#348) * add basic fuzzing using libFuzzer * let cmake respect cflags, otherwise the fuzzer flags go unnoticed also, integrates badly with oss-fuzz * add new fuzzer for minification, simplify the old one * add fuzzer for the dump example * clang format * adding Paul Dreik * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * type * minor fixes and cleaning. * Fixing issue 351 (#352) * Fixing issues 351 and 353 * minor fixes and cleaning. * removing warnings * removing some copies * Fix ARM compile errors on g++ 7.4 (#354) * Fix ARM compilation errors * Update singleheader * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * fix integer overflow in subnormal_power10 (#355) detected by oss-fuzz https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18714 * Adding new test file, following https://github.com/lemire/simdjson/pull/355 * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * merged main into branch * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * merging main * rough prototype working. Needs more test and fine tuning. * prototype working on large files. * prototype working on large files. * Adding benchmarks * jsonstream API adjustment * minor fixes and cleaning. * minor fixes and cleaning. * removing warnings * removing some copies * runtime dispatch error fix * makefile linking src/jsonstream.cpp * fixing arm stage 1 headers * fixing stage 2 headers * fixing stage 1 arm header * making jsonstream portable * cleaning imports * including <algorithms> for windows compiler * cleaning benchmark imports * adding jsonstream to amalgamation * bug fix where JsonStream would bug on rare cases. * Addind a JsonStream Demo to Amalgamation * rough prototype working. Needs more test and fine tuning. * minor fixes and cleaning. * adding jsonstream to amalgamation * merged main into branch * Addind a JsonStream Demo to Amalgamation * merging main * merging main * make file fix
2019-11-09 06:39:45 +08:00
Using a worker instead of a thread per batch (#920) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
2020-06-13 04:51:18 +08:00
double min_result = *min_element(res.begin(), res.end());
double speedinGBs =
static_cast<double>(p.size()) / (min_result * 1000000000.0);
std::cout << "Min: " << min_result << " bytes read: " << p.size()
<< " Gigabytes/second: " << speedinGBs << std::endl;
}
#ifdef SIMDJSON_THREADS_ENABLED
// Multithreading probably does not help matters for small files (less than 10
// MB).
if (p.size() < 10000000) {
std::cout << std::endl;
std::cout << "Warning: your file is small and the performance results are "
"probably meaningless"
<< std::endl;
std::cout << "as far as multithreaded performance goes." << std::endl;
std::cout << std::endl;
std::cout
<< "Try to concatenate the file with itself to generate a large one."
<< std::endl;
std::cout << "In bash: " << std::endl;
std::cout << "for i in {1..1000}; do cat '" << filename
<< "' >> bar.ndjson; done" << std::endl;
std::cout << argv[0] << " bar.ndjson" << std::endl;
}
#endif
return 0;
2020-03-14 02:41:19 +08:00
}