simdjson

Commit Graph

Author	SHA1	Message	Date
Daniel Lemire	14ceacac73	Tweaking.	2020-06-17 13:27:17 -04:00
Daniel Lemire	16f41ea059	Added a word.	2020-06-14 18:48:42 -04:00
Daniel Lemire	0a7270fc29	More tweaks.	2020-06-14 18:47:22 -04:00
Daniel Lemire	23fbd9d004	Some tweaks.	2020-06-14 18:28:09 -04:00
John Keiser	fd44c2a2ff	Merge pull request #927 from simdjson/dlemire/exposingthestringminifier Exposing the string minifier.	2020-06-13 07:47:20 -07:00
John Keiser	a86a82b39c	Rename minify class to minifier so the minify() method is cleared up	2020-06-12 17:05:25 -07:00
Daniel Lemire	89b059b1ea	Testing with GCC 10 and clang 10 (#926 ) * Testing with GCC 10 and clang 10 * Fixing spurious space * gcc10 does not need the cmake installation. * We don't want to run the perf test on ARM. I ignore them systematically. ARM performance should be assessed manually. * Switching to GCC 10 and Clang 10 * Disabling some tests under sanitizers when they involve rapidjson or other parsers. Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-12 17:58:53 -04:00
Daniel Lemire	bd2d0f769f	One unlikely too many (#930 )	2020-06-12 17:58:10 -04:00
Daniel Lemire	d830422489	Put back the amalgamation files and add tests (#929 ) Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-12 17:57:45 -04:00
Daniel Lemire	4dfbf98e4e	Using a worker instead of a thread per batch (#920 ) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-12 16:51:18 -04:00
Daniel Lemire	1b6258ec8c	Added std::minify	2020-06-12 16:37:41 -04:00
Daniel Lemire	be707dbb6f	Added a remark	2020-06-12 16:07:34 -04:00
John Keiser	1febf2ec83	Merge pull request #919 from simdjson/jkeiser/move-current-loc [4/4] Stop persisting current_loc (+2% parse throughput)	2020-06-12 09:55:04 -07:00
John Keiser	fe69928764	Merge pull request #918 from simdjson/jkeiser/remove-iterator-variables [3/4] Remove unneeded structural_iterator variables	2020-06-12 09:52:35 -07:00
John Keiser	bbd61eb13f	Let tape writing be put in a register	2020-06-12 09:18:20 -07:00
John Keiser	e15e1e253d	peek_char -> peek_next_char	2020-06-12 09:10:16 -07:00
Daniel Lemire	45e2178ada	Duh.	2020-06-11 17:20:28 +00:00
Daniel Lemire	a6e4933d93	Exposing the string minifier.	2020-06-11 13:07:18 -04:00
Daniel Lemire	98599e0972	Remove the circleci badge since it may appear to fail due to perfdiff	2020-06-11 11:37:53 -04:00
John Keiser	b4837f2e2f	Merge pull request #915 from simdjson/jkeiser/stage2-common [2/4] Use same state machine for stage 2 streaming and non-streaming	2020-06-10 08:37:08 -07:00
John Keiser	ea08e7d192	Remove unused extra copy of find_next_document_index	2020-06-09 17:52:13 -07:00
John Keiser	d178e089a6	Stop caching current structural, keep current index around instead of next	2020-06-08 15:21:54 -07:00
John Keiser	5f00b37e21	Stop caching the buffer index	2020-06-08 15:21:54 -07:00
John Keiser	8a8792d47f	Remove most uses of current_char()	2020-06-08 15:21:54 -07:00
John Keiser	59d9bc9e48	Store the pointer to the next structural instead of base structural_indexes and an index	2020-06-08 15:21:54 -07:00
John Keiser	8793dd3ceb	Don't store len locally	2020-06-08 15:21:54 -07:00
John Keiser	48062380fa	Move parser to structural_iterator	2020-06-08 15:21:54 -07:00
John Keiser	3636aa5522	Extend structural_parser from structural_iterator	2020-06-08 15:21:54 -07:00
John Keiser	a1aea4588f	Move document stream state to implementation	2020-06-08 15:21:54 -07:00
John Keiser	1d4fffb799	Fix fallback implementation	2020-06-08 15:21:52 -07:00
John Keiser	6f90f5dc5f	Remove templating from finish() method	2020-06-08 15:20:56 -07:00
John Keiser	9dd6972d26	Remove impossible checks, add EMPTY check to normal parser	2020-06-08 15:20:56 -07:00
John Keiser	d731a7d52c	Privatize structural_parser	2020-06-08 15:20:56 -07:00
John Keiser	059468b74e	Eliminate streaming_structural_parser subclass with templates	2020-06-08 15:20:56 -07:00
John Keiser	5e69fb782a	Call a function to parse structurals	2020-06-08 15:20:56 -07:00
John Keiser	a5beffda78	Remove streaming_structural_parser.h	2020-06-08 15:20:56 -07:00
John Keiser	7de7ce5fdc	Move document stream state to implementation	2020-06-08 15:20:56 -07:00
John Keiser	383e8c7f68	Merge pull request #913 from simdjson/jkeiser/internal-streaming [1/4] Simplify parse_many() and fix bugs	2020-06-08 15:19:27 -07:00
John Keiser	0dbda65e44	Fix fallback implementation	2020-06-08 14:52:23 -07:00
John Keiser	fe01da077e	Make threaded version work again	2020-06-07 16:21:00 -07:00
John Keiser	d43a4e9df9	Remove SUCCESS_AND_HAS_MORE (internal only value)	2020-06-07 16:20:55 -07:00
John Keiser	3e226795f0	Run all passing json against parse_many. Empty documents pass, too.	2020-06-07 16:20:51 -07:00
John Keiser	c4a0fe1606	Add tests for parse_many() errors	2020-06-07 16:20:46 -07:00
John Keiser	ef63a84a3e	Move document stream state to implementation	2020-06-07 16:20:44 -07:00
John Keiser	8c16ba372e	Acknowledge that we always have a remainder	2020-06-06 16:46:38 -07:00
John Keiser	9be4a17687	Separate definition from declaration, arrange top down	2020-06-06 16:46:38 -07:00
Furkan	89332e1696	Temporary fix to #914 (#917 )	2020-06-05 21:01:41 -04:00
John Keiser	8a56129def	Merge pull request #916 from simdjson/jkeiser/issue906stage2 Move unclosed array check to stage 2	2020-06-05 14:23:44 -07:00
John Keiser	ed0c815735	Move unclosed array check to stage 2	2020-06-05 12:39:13 -07:00
Daniel Lemire	7a69da16e4	Fixing issue 906 (#912 ) * Fixing issue 906 * Safe patching. * Now with explanations. * Bumping up memory allocation. * Putting the patch back. * fallback fixes. Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-05 15:37:09 -04:00

1 2 3 4 5 ...

1368 Commits All Branches Search

1368 Commits

All Branches