simdjson

Commit Graph

Author	SHA1	Message	Date
Daniel Lemire	4474f8ef18	Cleaning a bit the examples.	2020-06-17 16:24:55 +00:00
John Keiser	76c9f4f5a6	Merge pull request #941 from simdjson/jkeiser/forgot Remove unnecessary functions	2020-06-17 09:09:28 -07:00
Daniel Lemire	942ef3b7f2	Merge pull request #939 from simdjson/dlemire/lookup3 Introducing lookup3 (UTF-8 validation).	2020-06-17 11:19:09 -04:00
Daniel Lemire	0b9df6d8c4	It turns out that we need fairly complicated logic.	2020-06-17 15:17:10 +00:00
Daniel Lemire	b5ea504ad2	Tweaks doxygen so that we have a better main page.	2020-06-17 11:07:21 -04:00
Daniel Lemire	803b0c4bdb	Light touch.	2020-06-17 11:00:13 -04:00
Daniel Lemire	6537d0dc76	Avoiding the unused errors.	2020-06-17 14:19:58 +00:00
John Keiser	f8f36c085c	Remove unnecessary functions	2020-06-17 07:11:53 -07:00
John Keiser	7339f67dd7	Merge pull request #462 from simdjson/jkeiser/if-backslash Wrap backslash processing in a branch	2020-06-17 07:07:58 -07:00
Daniel Lemire	0d4e501239	Fixing the bug.	2020-06-17 10:06:16 -04:00
Daniel Lemire	8d609607e2	Verifying the bug.	2020-06-16 20:04:09 -04:00
Daniel Lemire	71a889ed73	Introducing lookup3 (UTF-8 validation).	2020-06-16 19:08:25 -04:00
Daniel Lemire	27a75a9085	Tweaking.	2020-06-15 17:54:34 -04:00
Daniel Lemire	954d6c326d	New examples.	2020-06-15 17:45:15 -04:00
Daniel Lemire	7ea05d038e	New API traversal tests. (#931 ) Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-15 13:15:52 -04:00
Daniel Lemire	33930ff046	Adding link.	2020-06-15 13:07:53 -04:00
Daniel Lemire	16f41ea059	Added a word.	2020-06-14 18:48:42 -04:00
Daniel Lemire	0a7270fc29	More tweaks.	2020-06-14 18:47:22 -04:00
Daniel Lemire	23fbd9d004	Some tweaks.	2020-06-14 18:28:09 -04:00
John Keiser	610c79fbf3	Don't use backslash branch on ARM	2020-06-13 07:51:28 -07:00
John Keiser	fd44c2a2ff	Merge pull request #927 from simdjson/dlemire/exposingthestringminifier Exposing the string minifier.	2020-06-13 07:47:20 -07:00
John Keiser	a86a82b39c	Rename minify class to minifier so the minify() method is cleared up	2020-06-12 17:05:25 -07:00
Daniel Lemire	89b059b1ea	Testing with GCC 10 and clang 10 (#926 ) * Testing with GCC 10 and clang 10 * Fixing spurious space * gcc10 does not need the cmake installation. * We don't want to run the perf test on ARM. I ignore them systematically. ARM performance should be assessed manually. * Switching to GCC 10 and Clang 10 * Disabling some tests under sanitizers when they involve rapidjson or other parsers. Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-12 17:58:53 -04:00
Daniel Lemire	bd2d0f769f	One unlikely too many (#930 )	2020-06-12 17:58:10 -04:00
Daniel Lemire	d830422489	Put back the amalgamation files and add tests (#929 ) Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-12 17:57:45 -04:00
Daniel Lemire	d1a54249e7	New API traversal tests.	2020-06-12 17:42:57 -04:00
Daniel Lemire	4dfbf98e4e	Using a worker instead of a thread per batch (#920 ) In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>	2020-06-12 16:51:18 -04:00
Daniel Lemire	1b6258ec8c	Added std::minify	2020-06-12 16:37:41 -04:00
Daniel Lemire	be707dbb6f	Added a remark	2020-06-12 16:07:34 -04:00
John Keiser	664b03bb13	Short circuit find escapes if there is a backslash	2020-06-12 10:10:35 -07:00
John Keiser	7c6723d912	Print progress bar even if there is only one file	2020-06-12 10:01:19 -07:00
John Keiser	1febf2ec83	Merge pull request #919 from simdjson/jkeiser/move-current-loc [4/4] Stop persisting current_loc (+2% parse throughput)	2020-06-12 09:55:04 -07:00
John Keiser	fe69928764	Merge pull request #918 from simdjson/jkeiser/remove-iterator-variables [3/4] Remove unneeded structural_iterator variables	2020-06-12 09:52:35 -07:00
John Keiser	bbd61eb13f	Let tape writing be put in a register	2020-06-12 09:18:20 -07:00
John Keiser	e15e1e253d	peek_char -> peek_next_char	2020-06-12 09:10:16 -07:00
Daniel Lemire	45e2178ada	Duh.	2020-06-11 17:20:28 +00:00
Daniel Lemire	a6e4933d93	Exposing the string minifier.	2020-06-11 13:07:18 -04:00
Daniel Lemire	98599e0972	Remove the circleci badge since it may appear to fail due to perfdiff	2020-06-11 11:37:53 -04:00
John Keiser	b4837f2e2f	Merge pull request #915 from simdjson/jkeiser/stage2-common [2/4] Use same state machine for stage 2 streaming and non-streaming	2020-06-10 08:37:08 -07:00
John Keiser	ea08e7d192	Remove unused extra copy of find_next_document_index	2020-06-09 17:52:13 -07:00
John Keiser	d178e089a6	Stop caching current structural, keep current index around instead of next	2020-06-08 15:21:54 -07:00
John Keiser	5f00b37e21	Stop caching the buffer index	2020-06-08 15:21:54 -07:00
John Keiser	8a8792d47f	Remove most uses of current_char()	2020-06-08 15:21:54 -07:00
John Keiser	59d9bc9e48	Store the pointer to the next structural instead of base structural_indexes and an index	2020-06-08 15:21:54 -07:00
John Keiser	8793dd3ceb	Don't store len locally	2020-06-08 15:21:54 -07:00
John Keiser	48062380fa	Move parser to structural_iterator	2020-06-08 15:21:54 -07:00
John Keiser	3636aa5522	Extend structural_parser from structural_iterator	2020-06-08 15:21:54 -07:00
John Keiser	a1aea4588f	Move document stream state to implementation	2020-06-08 15:21:54 -07:00
John Keiser	1d4fffb799	Fix fallback implementation	2020-06-08 15:21:52 -07:00
John Keiser	6f90f5dc5f	Remove templating from finish() method	2020-06-08 15:20:56 -07:00

... 3 4 5 6 7 ...

1587 Commits All Branches Search

1587 Commits

All Branches