flushing out

2018-03-23 10:00:21 -04:00 · 2018-03-23 10:00:21 -04:00 · e6e3f42491
parent bc1331283a
commit e6e3f42491
1 changed files with 26 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -1,27 +1,41 @@
-# simdjson
+# simdjson : Parsing gigabytes of JSON per second

-A *research* library. The purpose of this repository is to support an eventual paper.
+A *research* library. The purpose of this repository is to support an eventual paper. The immediate goal is not to produce a library that would be used in production systems. Of course, the end game is, indeed, to have an impact on production system.

 Goal: Speed up the parsing of JSON per se. No materialization.

-Parsing gigabytes of JSON per second
+## Architecture

+The parser works in three stages:

-## Todo
+- Stage 1. Identifies quickly structure elements, strings, and so forth. Currently, there is no validation (JSON is assumed to be correct).
+- Stage 2. Involves the "flattening" of the data from stage 1, that is, convert bitsets into arrays of indexes.
+- Stage 3. (Structure building) Involves constructing a "tree" of sort to navigate through the data.
+- Stage 4. (Currently unimplemented) Iterate throw the structure without "stalling" (fighting back against latency)
+
+## Todo (Priority)
+
+Geoff is unhappy with stage 3. He writes:

- Write unit tests
- Write bona fide, accurate benchmarks (with fair comparisons using good alternatives). Geoff wrote:
 > 44% of the time is in the tree construction. Of the remaining 56%, pretty much half is in the code to discover structural characters and half is in the naive code to flatten that out into a vector of u32 offsets (the 'iterate over set bits' problem).
- Document better the code, make the code easier to use
- Add some measure of error handling (maybe optional)
- Geoff wrote:
-> A future goal for Stage 3 will also be to thread together vectors or lists of similar structural elements (e.g. strings, objects, numbers, etc). A putative 'stage 4' will then be able to iterate in parallel fashion over these vectors (branchlessly, or at least without a pipeline-killing "Giant What I Am Doing Anyway" switch at the top) and transform them into more usable values. Some of this code is also inherently interesting (de-escaping strings, high speed atoi - an old favorite).
- Geoff wrote:
+
 > I'm focusing on the tree construction at the moment. I think we can abstract the structural characters to 3 operations during that stage (UP, DOWN, SIDEWAYS), batch them, and build out tree structure in bulk with data-driven SIMD operations rather than messing around with branches. It's probably OK to have a table with 3^^5 or even 3^^6 entries, and it's still probably OK to have some hard cases eliminated and handled on a slow path (e.g. someone hits you with 6 scope closes in a row, forcing you to pop 24 bytes into your tree construction stack). (...) I spend some time with SIMD and found myself going in circles. There's an utterly fantastic solution in there somewhere involving turning the tree building code into a transformation over multiple abstracted symbols (e.g. UP/UP/SIDEWAYS/DOWN). It looked great until I smacked up against the fact that the oh-so-elegant solution I had sketched out had, on its critical path, an unaligned store followed by load partially overlapping with that store, to maintain the stack of 'up' pointers. Ugh. (...) So I worked through about 6 alternate solutions of various levels of pretentiousness none of which made me particularly happy.

+> The structure building is too slow for my taste. I'm not sure I want too much richer functionality in it. Many of the transformations which I would like to do seem better done on the tree (i.e. pruning out every second " character). But this code takes 44% of our time; it's outrageous.


+Of course, stage 4 is totally unimplemented so it might be a priority as well:

+> A future goal for Stage 3 will also be to thread together vectors or lists of similar structural elements (e.g. strings, objects, numbers, etc). A putative 'stage 4' will then be able to iterate in parallel fashion over these vectors (branchlessly, or at least without a pipeline-killing "Giant What I Am Doing Anyway" switch at the top) and transform them into more usable values. Some of this code is also inherently interesting (de-escaping strings, high speed atoi - an old favorite).
+
+
+## Todo (Secondary)
+
+- Build up a paper (use overleaf.com)
+- Write unit tests
+- Write bona fide, accurate benchmarks (with fair comparisons using good alternatives).
+- Document better the code, make the code easier to use
+- Add some measure of error handling (maybe optional)

 ## References

@ -83,4 +97,4 @@ containing structural element ("up").
 - It seems that the use of  ``bzero`` is discouraged.
 - Per input byte,  multiple bytes are allocated which could potentially be a problem when processing a very large document, hence one might want to be more incremental in practice maybe to minimize memory usage. For really large documents, there might be caching issues as well.
 - The ``clmul`` thing is tricky but nice. (Geoff's remark:  find the spaces between quotes, is actually a ponderous way of doing parallel prefix over XOR, which a mathematically adept person would have realized could be done with clmul by -1. Not me, I had to look it up: http://bitmath.blogspot.com.au/2016/11/parallel-prefixsuffix-operations.html.)
- It is possible, though unlikely, that parallelizing the bitset decoding could be useful (https://lemire.me/blog/2018/03/08/iterating-over-set-bits-quickly-simd-edition/), and there is VCOMPRESSB (AVX-512)
+- It is possible, though maybe unlikely, that parallelizing the bitset decoding could be useful (https://lemire.me/blog/2018/03/08/iterating-over-set-bits-quickly-simd-edition/), and there is VCOMPRESSB (AVX-512)