respond to @bhamiltoncx comments

This commit is contained in:
parrt 2017-03-29 14:19:37 -07:00
parent 1fcb5951c6
commit 6dd9a3fbe3
2 changed files with 16 additions and 10 deletions

View File

@ -42,7 +42,7 @@ public class WriteBinaryFile {
}; };
public static void main(String[] args) throws IOException { public static void main(String[] args) throws IOException {
Files.write(new File("resources/ips").toPath(), bytes); Files.write(new File("/tmp/ips").toPath(), bytes);
} }
} }
``` ```
@ -50,14 +50,14 @@ public class WriteBinaryFile {
Now we need to create a stream of bytes satisfactory to ANTLR, which is as simple as: Now we need to create a stream of bytes satisfactory to ANTLR, which is as simple as:
```java ```java
ANTLRFileStream bytesAsChar = new ANTLRFileStream("resources/ips", "ISO-8859-1"); CharStream bytesAsChar = CharStreams.fromFileName("/tmp/ips", StandardCharsets.ISO_8859_1);
``` ```
The `ISO-8859-1` encoding is just the 8-bit char encoding for LATIN-1, which effectively tells the stream to treat each byte as a character. That's what we want. Then we have the usual test rig: The `ISO-8859-1` encoding is just the 8-bit char encoding for LATIN-1, which effectively tells the stream to treat each byte as a character. That's what we want. Then we have the usual test rig:
```java ```java
//ANTLRFileStream bytesAsChar = new ANTLRFileStream("resources/ips", "ISO-8859-1"); DEPRECATED in 4.7 //ANTLRFileStream bytesAsChar = new ANTLRFileStream("/tmp/ips", "ISO-8859-1"); DEPRECATED in 4.7
CharStream bytesAsChar = CharStreams.fromFileName("/tmp/ips", StandardCharsets.ISO_8859_1); CharStream bytesAsChar = CharStreams.fromFileName("/tmp/ips", StandardCharsets.ISO_8859_1);
IPLexer lexer = new IPLexer(bytesAsChar); IPLexer lexer = new IPLexer(bytesAsChar);
CommonTokenStream tokens = new CommonTokenStream(lexer); CommonTokenStream tokens = new CommonTokenStream(lexer);
@ -127,7 +127,7 @@ class BinaryANTLRFileStream extends ANTLRFileStream {
The new test code starts out like this: The new test code starts out like this:
```java ```java
ANTLRFileStream bytesAsChar = new BinaryANTLRFileStream("resources/ips"); ANTLRFileStream bytesAsChar = new BinaryANTLRFileStream("/tmp/ips");
IPLexer lexer = new IPLexer(bytesAsChar); IPLexer lexer = new IPLexer(bytesAsChar);
... ...
``` ```

View File

@ -4,10 +4,10 @@ Prior to ANTLR 4.7, generated lexers only supported part of the Unicode standard
long as the input `CharStream` is opened using `CharStreams.fromPath()`, `CharStreams.fromFileName()`, etc... long as the input `CharStream` is opened using `CharStreams.fromPath()`, `CharStreams.fromFileName()`, etc...
or the equivalent method for your runtime's language. or the equivalent method for your runtime's language.
The deprecated `ANTLRInputStream` and `ANTLRFileStream` APIs only support Unicode code points up to `U+FFFF`. The deprecated `ANTLRInputStream` and `ANTLRFileStream` *Java-target* APIs only support Unicode code points up to `U+FFFF`.
A big shout out to Ben Hamilton (github bhamiltoncx) for his superhuman A big shout out to Ben Hamilton (github bhamiltoncx) for his superhuman
efforts across all targets to get true Unicode 3.1 support for U+10FFFF. efforts across all targets to get true support for U+10FFFF code points.
## Example ## Example
@ -61,7 +61,7 @@ Code for **4.6** looked like this:
```java ```java
CharStream input = new ANTLRFileStream("myinputfile") CharStream input = new ANTLRFileStream("myinputfile");
JavaLexer lexer = new JavaLexer(input); JavaLexer lexer = new JavaLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer); CommonTokenStream tokens = new CommonTokenStream(lexer);
``` ```
@ -77,7 +77,7 @@ CommonTokenStream tokens = new CommonTokenStream(lexer);
Or, if you'd like to specify the file encoding: Or, if you'd like to specify the file encoding:
```java ```java
CharStream input = CharStreams.fromFileName("inputfile", StandardCharsets.UTF_16); CharStream input = CharStreams.fromFileName("inputfile", Charset.forName("windows-1252"));
``` ```
### Motivation ### Motivation
@ -112,7 +112,13 @@ of unbuffered input. See the [ANTLR 4 book](https://www.amazon.com/Definitive-AN
useful for processing infinite streams *during the parse* and require that you manually buffer characters. Use `UnbufferedCharStream` and `UnbufferedTokenStream`. useful for processing infinite streams *during the parse* and require that you manually buffer characters. Use `UnbufferedCharStream` and `UnbufferedTokenStream`.
```java ```java
CharStream input = new UnbufferedCharStream(is); CSVLexer lex = new CSVLexer(input); // copy text out of sliding buffer and store in tokens lex.setTokenFactory(new CommonTokenFactory(true)); TokenStream tokens = new UnbufferedTokenStream<CommonToken>(lex); CSVParser parser = new CSVParser(tokens); parser.setBuildParseTree(false); parser.file(); CharStream input = new UnbufferedCharStream(is);
CSVLexer lex = new CSVLexer(input); // copy text out of sliding buffer and store in tokens
lex.setTokenFactory(new CommonTokenFactory(true));
TokenStream tokens = new UnbufferedTokenStream<CommonToken>(lex);
CSVParser parser = new CSVParser(tokens);
parser.setBuildParseTree(false);
parser.file();
``` ```
Your grammar that needs to have embedded actions that access the tokens as they are created, but before they disappear and are garbage collected. For example, Your grammar that needs to have embedded actions that access the tokens as they are created, but before they disappear and are garbage collected. For example,
@ -133,4 +139,4 @@ implementation of `CharStream.getText` in
allows `Token.getText` to be called at any time regardless of the allows `Token.getText` to be called at any time regardless of the
input stream implementation. input stream implementation.
*Currently, only Java and C# have these unbuffered streams implemented*. *Currently, only Java, C++, and C# have these unbuffered streams implemented*.