* Allow -f
* Support parse -s (force sse)
* Simplify flatten_bits
- Add directly to base instead of storing variable
- Don't modify base_ptr after beginning of function
- Eliminate base variable and increment base_ptr instead
* De-unroll the flatten_bits loops
* Decrease dependencies in stage 1
- Do all finalize_structurals work before computing the quote mask; mask
out the quote mask later
- Join find_whitespace_and_structurals and finalize_structurals into
single find_structurals call, to reduce variable leakage
- Rework pseudo_pred algorithm to refer to "primitive" for clarity and some
dependency reduction
- Rename quote_mask to in_string to describe what we're trying to
achieve ("mask" could mean many things)
- Break up find_quote_mask_and_bits into find_quote_mask and
invalid_string_bytes to reduce data leakage (i.e. don't expose quote bits
or odd_ends at all to find_structural_bits)
- Genericize overflow methods "follows" and "follows_odd_sequence" for
descriptiveness and possible lifting into a generic simd parsing library
* Mark branches as likely/unlikely
* Reorder and unroll+interleave stage 1 loop
* Nest the cnt > 16 branch inside cnt > 8
* Use generic each/reduce in simdutf8check
* Remove macros from generic simd_input uses
* Use array instead of members to store simd registers
* Default local checkperf to clone from .
* Add -n and -w arguments
* Add Dockerfile that compares perf against master
* Add checkperf to .drone.yml
* Clone from github instead of .git since CI doesn't have .git