simdjson

Commit Graph

Author	SHA1	Message	Date
John Keiser	aa1eabbb56	Add benchmark that stops early	2020-12-06 15:23:51 -08:00
Paul Dreik	f62ca21dd1	enable boost json (#1292 ) * bump boost.json and see if it works in simdjson CI * enable boost json * clean up * add boost json to deps * use boost if std::string_view is available * add build with c++20 * use docker image which has the proper libc++ installed	2020-11-10 13:55:04 -05:00
friendlyanon	c805fc28a4	Remove git modules (#1258 ) * Bump minimum CMake version * Remove unnecessary git checks * Move benchmark options where they are used * Declare helper functions for dependencies The custom solution here is tailored for fast configure times, but only works for dependencies on Github. * Import dependencies using the declared commands * Remove git submodules * Call target_link_libraries properly target_link_libraries must not be called without a requirement specifier. * Fix includes for competition Co-authored-by: friendlyanon <friendlyanon@users.noreply.github.com>	2020-11-04 13:34:29 -05:00
Daniel Lemire	218c274090	Updating main branch for legacy libc++ support (#1288 ) * Updating main branch for legacy libc++ support * Adopting * Removing unnecessary math header. * Updating the single-header files so we can pass the new tests. * Portable infinite-value detection is hard. * Working toward disabling boost json selectively. * Selectively disabling Boost JSON * More work toward selectively disabling boost json.	2020-11-04 12:24:42 -05:00
Paul Dreik	af4db55e66	remove trailing whitespace (#1284 )	2020-11-03 21:48:09 +01:00
Paul Dreik	f93fb21c95	optionally disable deprecated apis (#1271 ) Introduce cmake option SIMDJSON_DISABLE_DEPRECATED_API (default Off) which turns off deprecated simdjson api functions by setting the macro SIMDJSON_DISABLE_DEPRECATED_API. For non-cmake users, users will have to set SIMDJSON_DISABLE_DEPRECATED_API by some other means to disable the api. Closes #1264	2020-11-01 06:38:52 +01:00
Danila Kutenin	f46a0f64f2	PPC64 support (#1254 ) * Initial PPC64 support * Add travis CI * Fix outdated cmake version for travis * Fix indendtation * Try another workaround for outdated cmake in travis * Try beta cmake * Add dash before beta * Use builtin snaps * Use cmake as rocksdb * Test cmake on bionic * Remove unnecessary things from travis * Remove unnecessary things from travis * Another try of compiler install * Add all major compilers * Add all major compilers * Add all major compilers * Tweak travis a bit * Typo * More robust travis * Typos typos typos * Add fewer compilers, add non specific build for clang and gcc, should be the final config * CMAKE_FLAGS is in incorrect place * Remove default implementation * Limit build thread number * Fall back prefix_xor to a usual implementation, no performance boost is noticed * Test for power9 as it is the main architecture for OpenPOWER right now * Add to documentation to build with power9 as the implementation is compatible but compiler optimizations is not * Replace ARM with PPC in the comment	2020-10-27 18:43:39 -04:00
Daniel Lemire	14039d05a9	Adding a new benchmark for ondemand: distinct user id (#1239 ) * Adding a distinct user id benchmark * reenabling everything * Removing an unnecessary "value()". * Better tests of the examples and some fixes. * Guarding exception code.	2020-10-23 08:47:01 -04:00
Daniel Lemire	c592da4937	Adds yyjson to our internal benchmarks. (#1244 )	2020-10-21 16:23:20 -04:00
Daniel Lemire	3e8e797bc2	Typo.	2020-10-19 17:30:52 -04:00
Paul Dreik	7bf391c54a	fix potential use of uninitialized value warning, avoid casting away const This fixes a "potentially use of uninitialized value" warning, as well as a cstyle cast to non-const.	2020-10-16 22:14:42 +02:00
Daniel Lemire	07a6e098c8	This would allow users to find out what builtin is. (#1227 ) * This would allow users to find out what builtin is. * Trying another approach. * Added instructions. * Cleaning up the printout. * Let us be less invasive. * Adding a comment.	2020-10-15 21:58:42 -04:00
Daniel Lemire	e4897d6b54	We have hardcoded 32 (#1236 )	2020-10-15 21:57:10 -04:00
Daniel Lemire	bb2bc98a22	Fix issue https://github.com/simdjson/simdjson/issues/1127 (#1224 )	2020-10-13 09:18:54 -04:00
Paul Dreik	1d9926698e	update how boost.json is invoked, fix missing separators (#1203 ) * initial try at adding boost json to the benchmark * clean up * qualify memcpy etc. with std:: * clang format * extra space * update benchmark with help from Vinnie Falco from Boost.json * add missing separators	2020-10-09 18:22:37 -04:00
John Keiser	5b926b8196	Support array iteration over document	2020-10-06 11:29:45 -07:00
John Keiser	cae91983ec	Fix issue with early destruction	2020-10-06 11:29:45 -07:00
John Keiser	3190ef0c1f	Check benchmark results in release builds	2020-10-06 11:29:45 -07:00
John Keiser	c7c1372833	Allow reuse of value to try multiple types	2020-10-06 11:29:45 -07:00
John Keiser	6d978c383a	Kinder, gentler implementation selection - Allow user to specify SIMDJSON_BUILTIN_IMPLEMENTATION - Make cmake -DSIMDJSON_IMPLEMENTATION=haswell only specify haswell - Move negative implementation selection to -DSIMDJSON_EXCLUDE_IMPLEMENTATION - Automatically select SIMDJSON_BUILTIN_IMPLEMENTATION if SIMDJSON_IMPLEMENTATION is set - Move implementation enablement mostly to implementation files - Make implementation enablement and selection simpler and more robust - Fix bug where programs linked against simdjson were not passed SIMDJSON_XXX_IMPLEMENTATION or SIMDJSON_EXCEPTIONS	2020-10-06 11:29:45 -07:00
John Keiser	b70e85fd10	Only include source in bench_sax	2020-10-06 11:29:45 -07:00
John Keiser	30fe86ed32	Use simdjson::builtin instead of haswell/begin+end	2020-10-04 12:47:30 -07:00
John Keiser	b4df0e7c9e	Fix domnoexcept to actually be noexcept	2020-10-04 12:47:30 -07:00
John Keiser	6b219e3e25	Use ::stage2 where it's needed	2020-10-04 12:47:30 -07:00
John Keiser	baf6607e74	Make ondemand build without #include "simdjson.cpp"	2020-10-04 12:47:30 -07:00
John Keiser	a700848bae	Move ondemand implementation to include/	2020-10-04 12:47:30 -07:00
John Keiser	b234d74f43	Remove unnamed namespace from ondemand	2020-10-04 12:47:30 -07:00
John Keiser	49faf7af1a	Make simdjson_result implementation-specific	2020-10-04 12:47:30 -07:00
John Keiser	985b52331a	Require object to be exact and in order	2020-10-04 12:47:30 -07:00
John Keiser	021dded9dd	Add Kostya benchmarks	2020-10-04 12:47:30 -07:00
John Keiser	8fd0cdc732	Iterate value without going through indirection Avoids issues with value being released early	2020-10-04 12:47:30 -07:00
John Keiser	fe7a4d42d3	Fix top level values	2020-10-04 12:47:30 -07:00
John Keiser	e89d6353af	Add a "sum" benchmark with no appending to vector	2020-10-04 12:47:30 -07:00
John Keiser	c5bb74d184	Pave the way for non-record-based benchmarks	2020-10-04 12:47:30 -07:00
Daniel Lemire	874349c928	Making the code cleaner.	2020-10-04 12:47:30 -07:00
Daniel Lemire	157604b3a5	I think that this is better (fairer) code.	2020-10-04 12:47:30 -07:00
John Keiser	b935544d65	Make benchmark output easier to follow	2020-10-04 12:47:30 -07:00
Daniel Lemire	03271df579	This adds a frequency column (useful because if the frequency tanks, then other numbers are suspect).	2020-10-04 12:47:30 -07:00
John Keiser	0633d3a07d	Make branch miss numbers integers	2020-10-04 12:47:30 -07:00
John Keiser	045377a594	Fix errors with g++	2020-10-04 12:47:30 -07:00
John Keiser	b5c8030f19	Fix LargeRandom<OnDemand>	2020-10-04 12:47:30 -07:00
John Keiser	f75e856d2b	Compare records to ensure benchmarks work	2020-10-04 12:47:30 -07:00
John Keiser	44d689bc6e	Make instructions / cycle counters more useful	2020-10-04 12:47:30 -07:00
John Keiser	9e433c2f19	Move benchmarks into their own directories	2020-10-04 12:47:30 -07:00
John Keiser	4d89076bdc	Check for EOF when skipping containers Revert that? Or not	2020-10-04 12:47:30 -07:00
John Keiser	283ac3191f	Rename parse->iterate, add iterate_raw	2020-10-04 12:47:29 -07:00
John Keiser	4dd0c80dad	Move current_string_buf_loc to json_iterator	2020-10-04 12:47:29 -07:00
John Keiser	3b53c6ca47	Use json_iterator as shared state instead of document	2020-10-04 12:47:29 -07:00
John Keiser	a90b8fb449	Remove depth tracking from ondemand api	2020-10-04 12:47:29 -07:00
John Keiser	98be2c91df	Fix SAX benchmarks to actually push to vector	2020-10-04 12:47:29 -07:00
John Keiser	2657e5e226	Fix points SAX to actually record points	2020-10-04 12:47:29 -07:00
John Keiser	cfcb0d4fb7	Use json_iterator in array/object	2020-10-04 12:47:29 -07:00
John Keiser	4065529bdf	Don't try to compile Haswell benchmarks on ARM	2020-10-04 12:47:29 -07:00
John Keiser	6be2db8c42	Fix SAX benchmark to actually add tweets	2020-10-04 12:47:29 -07:00
John Keiser	5cf68416d8	Don't bother comparing field names in parserandom	2020-10-04 12:47:29 -07:00
John Keiser	ebcb3c6b3b	On-demand parse implementation	2020-10-04 12:47:29 -07:00
Paul Dreik	04267e0f6b	add boost.json to benchmark (#1202 ) Add boost.json to the benchmark. It was accepted into boost 20201003, see https://lists.boost.org/Archives/boost/2020/10/250129.php. The upstream repo is (expected to eventually be migrated to boost): https://github.com/CPPAlliance/json	2020-10-04 10:00:09 +02:00
Daniel Lemire	a540e6afc5	Testing on minimalist alpine (linux) images (#1200 ) * Tweaking header includes to make it safer. * Adding the actual tests. * Fixing my syntax.	2020-10-02 13:32:09 -04:00
Daniel Lemire	9865bb6904	Make it possible to check that an implementation is supported at runtime (#1197 ) * Make it possible to check that an implementation is supported at runtime. * add CI fuzzing on arm 64 bit This adds fuzzing on drone.io arm64 For some reason, leak detection had to be disabled. If it is enabled, the fuzzer falsely reports a crash at the end of fuzzing. Closes: #1188 * Guarding the implementation accesses. * Better doc. * Updating cxxopts. * Make it possible to check that an implementation is supported at runtime. * Guarding the implementation accesses. * Better doc. * Updating cxxopts. * We need to accomodate cxxopts Co-authored-by: Paul Dreik <github@pauldreik.se>	2020-10-02 11:04:51 -04:00
Daniel Lemire	60c139a844	Faster and more correct serialization (#1168 ) * Adding new files. * Better. * Fixing minifier and adding tests. * Adding benchmarks. * Including the array header. * Replacing old stream-based code by the new code. * Doubling up the itoa. * Hidden away to_chars in internal namespace. * Removing the repetitions. * Documented the atoi functions. * Tuning the escape sequences. * Moving the operators off the main namespace. * Added more tests. * Tweaking the implementation so that it works with and without exp. * The string_builder template and mini_formatter class are not part of our public API and are subject to change at any time! * Adding a benchmark and some optimization. * Cleaning. * Strictly speaking, this header is needed.	2020-09-23 10:00:39 -04:00
Daniel Lemire	f410213003	Improve documentation on padding - Improves and clarifies the documentation on padding. - Use std:: prefix for memcpy, strlen etc. Related to issues #1175 and #1178	2020-09-23 09:07:14 +02:00
Daniel Lemire	4c11652808	This must be a typo (#1140 )	2020-08-28 20:35:13 -04:00
John Keiser	b2779c35df	Fix issue with unsupported unreachable on Windows	2020-08-18 21:35:12 -07:00
John Keiser	18564f1ae2	Don't benchmark unless haswell is available	2020-08-18 21:25:03 -07:00
John Keiser	638f1deb62	Add DOM tweet reader for comparison	2020-08-18 21:25:03 -07:00
John Keiser	7e74d30f45	[WIP] tweet reader SAX benchmark	2020-08-18 21:25:03 -07:00
Daniel Lemire	8a8eea53a2	Prefixing macros (issue 1035) (#1124 ) * Renaming partially done. * More prefixing. * I thought that this was fixed. * Missed one. * Missed a few. * Missed another one. * Minor fixes.	2020-08-18 18:25:36 -04:00
Daniel Lemire	501fed6c4f	This would disable bash scripts under FreeBSD. (#1118 ) * This would disable bash scripts under FreeBSD. * Let us also disable GIT. * Let us try to just disable GIT * Nope. We must have both bash and git disabled.	2020-08-17 11:50:57 -04:00
Daniel Lemire	2f92a34bb7	Turns out that passing dom::element by reference can be a performance killer. (#1086 ) * Turns out that passing dom::element by reference can be a performance killer. * Tweaking.	2020-08-01 10:31:47 -04:00
Daniel Lemire	84dc398d32	Adding a couple of tests.	2020-07-31 15:29:10 -04:00
Daniel Lemire	f80668e87f	This removes the crazy alignment requirements. (#1073 ) * This removes the crazy alignment requirements.	2020-07-27 16:19:01 -04:00
Daniel Lemire	af18d5ed81	This adds a validation benchmark (#1040 )	2020-07-20 18:56:39 -04:00
Daniel Lemire	d0ce2f0b5a	Fixing clang under visual studio (#1028 ) * Lots of fixes * Removing some lambdas * Removing some functional programming. Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-07-06 18:58:19 -04:00
Daniel Lemire	29e744fdbb	Adding warning message.	2020-06-24 19:23:02 -04:00
Daniel Lemire	515b87bcbe	Disabling perfcheck for ninja	2020-06-24 18:45:47 -04:00
Daniel Lemire	5b4acf14ea	Removing space.	2020-06-24 16:51:28 -04:00
Daniel Lemire	5fc6cb15b8	This should make things even more robust. If .git is not found, just disable all git work.	2020-06-24 16:12:19 -04:00
Daniel Lemire	f6e9a8eee4	Making the cmake more verbose so we can figure out what is happening.	2020-06-24 15:44:22 -04:00
Daniel Lemire	cb8a9ef2c0	This removes git as a dependency	2020-06-24 15:13:47 -04:00
John Keiser	1ff55c2729	Replace auto [x,error] with .get() everywhere	2020-06-21 16:26:59 -07:00
John Keiser	6fa5abcd7e	Replace x.get<T>() with x.get(v) or T(x)	2020-06-21 14:36:38 -07:00
John Keiser	9899e5021d	Allow use of document_stream with tie()	2020-06-20 21:15:05 -07:00
John Keiser	a7fc7d4ffb	Switch from get(v,e) to e = get(v)	2020-06-20 17:57:09 -07:00
John Keiser	f336103f63	Convert tools/docs/benchmarks to bool get() idiom	2020-06-20 17:55:46 -07:00
John Keiser	56e2b38048	Add bool result from tie()/get(), get<T>(T&,error_code&)	2020-06-20 17:55:46 -07:00
John Keiser	7339f67dd7	Merge pull request #462 from simdjson/jkeiser/if-backslash Wrap backslash processing in a branch	2020-06-17 07:07:58 -07:00
Daniel Lemire	7ea05d038e	New API traversal tests. (#931 ) Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-15 13:15:52 -04:00
Daniel Lemire	33930ff046	Adding link.	2020-06-15 13:07:53 -04:00
John Keiser	fd44c2a2ff	Merge pull request #927 from simdjson/dlemire/exposingthestringminifier Exposing the string minifier.	2020-06-13 07:47:20 -07:00
John Keiser	a86a82b39c	Rename minify class to minifier so the minify() method is cleared up	2020-06-12 17:05:25 -07:00
Daniel Lemire	d1a54249e7	New API traversal tests.	2020-06-12 17:42:57 -04:00
Daniel Lemire	4dfbf98e4e	Using a worker instead of a thread per batch (#920 ) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-12 16:51:18 -04:00
Daniel Lemire	1b6258ec8c	Added std::minify	2020-06-12 16:37:41 -04:00
John Keiser	7c6723d912	Print progress bar even if there is only one file	2020-06-12 10:01:19 -07:00
Daniel Lemire	a6e4933d93	Exposing the string minifier.	2020-06-11 13:07:18 -04:00
John Keiser	ae6dddfff4	Merge pull request #903 from simdjson/jkeiser/dom-parser-implementation Move parser state to implementation-specific class	2020-06-04 13:09:57 -07:00
John Keiser	1aab4752e2	Store all parser state in the implementation	2020-06-01 12:15:54 -07:00
John Keiser	6a71b24495	Reuse stored buf and len from parser	2020-06-01 12:14:09 -07:00
John Keiser	a3a9bde83e	Move DOM parsing into concrete interface implementation	2020-06-01 12:14:09 -07:00
Daniel Lemire	2fe2dd170b	The "competition tests" are being made portable (#907 ) * More portable competition * This will enable SIMDJSON_COMPETITION everywhere by default. * Minor fixes	2020-05-31 20:34:06 -04:00

1 2 3 4 5 ...

365 Commits