Update rapidjsondesign.md

This commit is contained in:
geofflangdale 2018-04-27 18:59:43 +10:00 committed by GitHub
parent 8d3c10ca74
commit fd56854c3f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 13 additions and 0 deletions

View File

@ -31,3 +31,16 @@ What RapidJSON does not do:
- Lazy evaluation (it figures out the content of strings and numbers)
- It does not do computed gotos
Geoff's Loose Theory on Why This Is Better Than You'd Think
The if there was such a thing as the high church of branch-free programming would allege the above code is a branch-miss nightmare. However, there are very likely regular patterns among our results here both due to the JSON grammar - if you just took a ParseObject you're going to be doing a ParseString pretty soon now - and due to the fact that many of the large scale inputs we have are very regular (i.e. contained objects are close to having a "schema" even though that's not a formal part of JSON, and dynamically we may do reasonably well).
Even if we take a branch miss per structural character we are looking at 14 cycles or so of penalty - which given a structural character rate of 0.1 gets us to 1.4. In fact, they do better than that (at least at a high level) as they may not be taking unpredictable branches on all structural characters - they may do very well with processing ":" characters in objects as well as "," characters in objects and arrays, and likely handle the second quote character in a string without much problem. So the penalty of branchiness is perhaps not enormous.
Also in their favor is that that SIMD scanning or simple input traversal is always appropriate to the actual input, while we are having to do "everything everywhere" (we are looking for odd-length backslash sequences outside of strings, and for structural characters inside of strings).
To investigate:
- At 4 cycles per byte, this result is quite a bit faster than most benchmarks. The Mison paper and all the JSON benchmarks I've dug up suggest 100-300 Megabytes per second, but 4 cycles or 4.7 cycles per byte would be close to 700-800MBps on most systems. This this because those benchmarks are more onerous than what RapidJSON is doing in our context?