simdjson

Commit Graph

Author	SHA1	Message	Date
Daniel Lemire	8eed8f5155	Document stream: truncate final unfinished document and give access to the number of truncated bytes. (#1534 ) * Truncate final unclosed string. * Adding more precise remarks. * Better documentation and more robust code. * ARM + PPC corrections. * Patching ARM implementation with new stage1_mode parameter. * Fixed most problems. * Correcting white spaces and adding a remark. * This adds the truncated_bytes() method to the stream instances.	2021-04-23 09:24:00 -04:00
Daniel Lemire	d6f33e4830	This adds a little test to see if we can compiler with very strict flags (conventional casts) (#1417 ) * This adds a little test to see if we can compiler with very strict flags. * Trimming a leftover old-style cast. * More cleaning. * A few more pedantic casts.	2021-01-27 18:37:30 -05:00
John Keiser	62ded15cd8	Rename tweets/text/points -> result	2021-01-05 11:55:57 -08:00
Paul Dreik	af4db55e66	remove trailing whitespace (#1284 )	2020-11-03 21:48:09 +01:00
Daniel Lemire	8a8eea53a2	Prefixing macros (issue 1035) (#1124 ) * Renaming partially done. * More prefixing. * I thought that this was fixed. * Missed one. * Missed a few. * Missed another one. * Minor fixes.	2020-08-18 18:25:36 -04:00
Daniel Lemire	f80668e87f	This removes the crazy alignment requirements. (#1073 ) * This removes the crazy alignment requirements.	2020-07-27 16:19:01 -04:00
John Keiser	a7fc7d4ffb	Switch from get(v,e) to e = get(v)	2020-06-20 17:57:09 -07:00
John Keiser	1aab4752e2	Store all parser state in the implementation	2020-06-01 12:15:54 -07:00
John Keiser	6a71b24495	Reuse stored buf and len from parser	2020-06-01 12:14:09 -07:00
John Keiser	a3a9bde83e	Move DOM parsing into concrete interface implementation	2020-06-01 12:14:09 -07:00
John Keiser	5312fd30e5	Fix CRT_SECURE warnings in clang	2020-05-04 11:36:00 -07:00
Daniel Lemire	fa4ce6a8bc	There is confusion between gigabytes and gigibytes. Let us standardize throughout. (#838 ) * There is confusion between gigabytes and gigibytes. * Trying to be consistent.	2020-05-01 12:16:18 -04:00
John Keiser	92d7af0881	Don't include benchmark overhead in documents/s	2020-04-28 13:15:01 -07:00
John Keiser	0e6ea76e88	Make checkperf work on Windows (#799 ) * Make command line arguments work for Windows * Run checkperf on Windows	2020-04-27 14:20:05 -04:00
Daniel Lemire	0d1c574cb1	A few more changes... (#775 ) * More nitpicking.	2020-04-23 11:36:52 -04:00
John Keiser	d4a37f6ef5	Enable conversion warnings on Linux and Windows	2020-04-22 14:21:30 -07:00
Daniel Lemire	21dce6cca9	Displaying the numbers of documents parsed per second (#652 ) * Some users are interested, as a metric, in the number of documents parsed per second. Obviously, this means reusing the same parser again and again. * Adding a sentence * This update the parsingcompetition benchmark so that it displays the number of documents parsed per second.	2020-03-30 17:51:03 -04:00
John Keiser	d93af1161d	Remove set_capacity, replace with allocate Makes allocation point more predictable	2020-03-30 13:49:54 -07:00
John Keiser	434776db1a	Deprecate more things	2020-03-30 13:48:43 -07:00
John Keiser	03746b966b	Move document/element/etc. under dom	2020-03-28 13:42:21 -07:00
Daniel Lemire	6cefeb338b	std::tie does not work on some compilers (#567 ) * std::tie workaround. * Cleaner solution	2020-03-19 16:56:45 -04:00
John Keiser	8e2c06cb0e	Compile with -fno-exceptions	2020-03-17 13:54:37 -07:00
John Keiser	1a5d8f1957	Add tests for SIMDJSON_EXCEPTIONS=0, add `tie()` support	2020-03-17 13:54:37 -07:00
John Keiser	e4e89fe27a	Fix parse benchmarker (#554 ) * Fix parse benchmarker * Make CI fail when parse doesn't work	2020-03-13 16:19:21 -04:00
John Keiser	40c6213d7e	Add parser.load() and load_many() to load files	2020-03-11 17:19:41 -07:00
John Keiser	d140bc23f5	Automatically allocate memory as needed in parse	2020-03-11 16:14:54 -07:00
John Keiser	31e8a12e88	Make error_message(error_code) return C string - Also move all error message logic to include inline	2020-03-06 15:41:51 -08:00
John Keiser	b3ea8c406e	Add simdjson.cpp for unified use (#515 )	2020-03-04 10:12:27 -08:00
John Keiser	99667f7c55	Create top level simdjson.h (#515 ) - Allows everyone to #include the same way, singleheader or not.	2020-03-04 10:12:27 -08:00
John Keiser	910f272467	Add parser implementation interface and selection API (#501 ) * Make architecture implementations virtual functions - Easier to add new architectures (add implementation to implementation.cpp) - Easier to add new algorithms / functions to architecture selection (add to implementation.h, implement) - Automatically select best implementation in static initialization - Allow user to explicitly select implementation with a string (i.e. parameter) - Allow user to inspect current implementation name/description - Allow user to list available implementations - Eliminate architecture enum and architecture-based templating - Add noexcept in non-inline functions * Move implementation static methods to their own classes * Detect best supported implementation on first use * available_implementationsI() -> available_implementations	2020-02-21 16:34:27 -05:00
John Keiser	8e7d1a5f09	Separate document state from ParsedJson This creates a "document" class with only user-facing document state (no parser internals). - document: user-facing document state - document::iterator: iterator (equivalent of ParsedJsonIterator) - document::parser: parser state plus a "docked" document we parse into (equivalent of ParsedJson) Usage: ```c++ auto doc = simdjson::document::parse(buf, len); // less efficient but simplest ``` ```c++ simdjson::document::parser parser; // reusable parser parser.allocate_capacity(len); simdjson::document* doc = parser.parse(buf, len); // pointer to doc inside parser doc = parser.parse(buf2, len); // reuses all buffers and overwrites doc; more efficient ```	2020-02-07 10:02:36 -08:00
Daniel Lemire	4518f1fba1	Some minor nitpicking.	2020-02-07 10:41:45 -05:00
John Keiser	6978a0b8d4	Benchmark escapes (#464 ) * Add escapes as a feature we benchmark * Don't print effectiveness metric unless verbose is on	2020-01-27 09:58:14 -05:00
Daniel Lemire	f87e64f988	Add option to make buffers hot and remove recent benchmarking changes (#443 ) * This revert the code back to how it was prior to the silly "run two stages" routine and instead adds an option to benchmark the code over hot buffers. It turns out that it can be expensive, when the files are large, to allocate the pages.	2020-01-15 19:48:00 -05:00
Daniel Lemire	f97b655f02	Instead of emulating the whole parsing as stage 1 + stage 2, let us benchmark the real thing. (#441 ) * Instead of emulating the whole parsing as stage 1 + stage 2, let us benchmark the real thing. * Adding explicit constructor. * Adding warning to the benchmark user. * Making re-running optional.	2020-01-11 10:14:22 -05:00
John Keiser	3b9e6bff3c	Print stage 2 information in feature benchmarker	2020-01-02 17:23:21 -07:00
Daniel Lemire	b2ebdb0d07	I think we can align the numbers better (so it is prettier). (#399 ) * I think we can align the numbers better (so it is prettier). * Remove space before %, align third line better Co-authored-by: John Keiser <john@johnkeiser.com>	2019-12-20 19:58:49 -05:00
John Keiser	e2f349e7bd	Measure impact of utf-8 blocks and structurals per block directly	2019-12-17 11:41:13 -08:00

38 Commits