Update Notes.md
This commit is contained in:
parent
080a636638
commit
17f8a4e6e7
20
Notes.md
20
Notes.md
|
@ -43,8 +43,7 @@ over bit vectors to discover backslash sequences and quote pairs; we introduce b
|
||||||
these properties.
|
these properties.
|
||||||
|
|
||||||
We also make use of our ability to quickly detect whitespace in this early stage. We can use another bit-vector based
|
We also make use of our ability to quickly detect whitespace in this early stage. We can use another bit-vector based
|
||||||
transformation to discover locations in our data that follow a structural character or quote followed by zero or more
|
transformation to discover locations in our data that follow a structural character or quote or whitespace and are not whitesapce. Excluding locations within strings, and the structural characters we have already discovered,
|
||||||
characters of whitespace; excluding locations within strings, and the structural characters we have already discovered,
|
|
||||||
these locations are the only place that we can expect to see the starts of the JSON 'atoms'. These locations are thus
|
these locations are the only place that we can expect to see the starts of the JSON 'atoms'. These locations are thus
|
||||||
treated as 'structural' ('pseudo-structural characters').
|
treated as 'structural' ('pseudo-structural characters').
|
||||||
|
|
||||||
|
@ -77,17 +76,10 @@ for each bitmap.
|
||||||
|
|
||||||
### Stage 3: Operation over indices
|
### Stage 3: Operation over indices
|
||||||
|
|
||||||
The indices form a relatively concise map of structurally important parts of our JSON input. However, since JSON is
|
This now works over a dual structure.
|
||||||
recursively defined, we may nest structures (JSON "objects" and "arrays") inside other JSON structures. It is important
|
|
||||||
to be able to quickly traverse portions of our JSON structure at any given level - it is trivial for us to move around
|
|
||||||
in a way that follows the input text, but skipping to the next item at a given level may involve searching hundreds of
|
|
||||||
bytes of text).
|
|
||||||
|
|
||||||
We can construct a simple data structure that allows us to thread together such structures relatively simply; at this
|
1. The "state machine", whose role it is to validate the sequence of structural characters and ensure that the input is at least generally structured like valid JSON (after this stage, the only errors permissible should be malformed atoms and numbers). If and only if the "state machine" reached all accept states, then,
|
||||||
stage this code is not branch-free. We use an implicit 'stack' structure by virtue of threading together 'up-level
|
|
||||||
pointers' within the structure as we build it (these are pointers that, for each item in the structure we have seen
|
|
||||||
already, tell us which item in the structure that contains this one); to pop up a level, we simply follow one layer
|
|
||||||
of 'up-level pointers'.
|
|
||||||
|
|
||||||
An equivalent operation requiring an external data structure would be to maintain a stack that essentially describes
|
2. The "tape machine" will have produced valid output. The tape machine works blindly over characters writing records to tapes. These records create a lean but somewhat traversable linked structure that, for valid inputs, should represent what we need to know about the JSON input.
|
||||||
all current levels of our structure as we traverse it; this may have performance advantages.
|
|
||||||
|
FIXME: a lot more detail is required on the operation of both these machines.
|
||||||
|
|
Loading…
Reference in New Issue