Merge remote-tracking branch 'upstream/master'

This commit is contained in:
thomasb8138 2017-03-02 22:41:01 +01:00
commit b2b799b425
61 changed files with 1191 additions and 301 deletions

View File

@ -137,4 +137,4 @@ YYYY/MM/DD, github id, Full name, email
2017/02/14, lecode-official, David Neumann, david.neumann@lecode.de
2017/02/14, xied75, Dong Xie, xied75@gmail.com
2017/02/20, Thomasb81, Thomas Burg, thomasb81@gmail.com
2017/02/26, jvasileff, John Vasileff, john@vasileff.com

View File

@ -106,16 +106,12 @@ For gcc and clang it is possible to use the `-fvisibility=hidden` setting to hid
Since C++ has no built-in memory management we need to take extra care. For that we rely mostly on smart pointers, which however might cause time penalties or memory side effects (like cyclic references) if not used with care. Currently however the memory household looks very stable. Generally, when you see a raw pointer in code consider this as being managed elsewehere. You should never try to manage such a pointer (delete, assign to smart pointer etc.).
### Unicode Support
Encoding is mostly an input issue, i.e. when the lexer converts text input into lexer tokens. The parser is completely encoding unaware. However, lexer input in the grammar is defined by character ranges with either a single member (e.g. 'a' or [a] or [abc]), an explicit range (e.g. 'a'..'z' or [a-z]), the full Unicode range (for a wildcard) and the full Unicode range minus a sub range (for negated ranges, e.g. ~[a]). The explicit ranges (including single member ranges) are encoded in the serialized ATN by 16bit numbers, hence cannot reach beyond 0xFFFF (the Unicode BMP), while the implicit ranges can include any value (and hence support the full Unicode set, up to 0x10FFFF).
Encoding is mostly an input issue, i.e. when the lexer converts text input into lexer tokens. The parser is completely encoding unaware.
> An interesting side note here is that the Java target fully supports Unicode as well, despite the inherent limitations from the serialized ATN. That's possible because the Java String class represents characters beyond the BMP as surrogate pairs (two 16bit values) and even reads them as 2 separate input characters. To make this work a character range for an identifier in a grammar must include the surrogate pairs area (for a Java parser).
The C++ target however always expects UTF-8 input (either in a string or via a wide stream) which is then converted to UTF-32 (a char32_t array) and fed to the lexer. ANTLR, when parsing your grammar, limits character ranges explicitly to the BMP currently. So, in order to allow specifying the full Unicode set the C++ target uses a little trick: whenever an explicit character range includes the (unused) codepoint 0xFFFF in a grammar it is silently extended to the full Unicode range. It's clear that this is an all-or-nothing solution. You cannot define a subset of Unicode codepoints > 0xFFFF that way. This can only be solved if ANTLR supports larger character intervals.
The differences in handling characters beyond the BMP leads to a difference between Java and C++ lexers: the character offsets may not concur. This is because Java reads two 16bit values per Unicode char (if that falls into the surrogate area) while a C++ parser only reads one 32bit value. That usually doesn't have practical consequences, but might confuse people when comparing token positions.
The C++ target always expects UTF-8 input (either in a string or stream) which is then converted to UTF-32 (a char32_t array) and fed to the lexer.
### Named Actions
In order to help customizing the generated files there are a number of additional socalled **named actions**. These actions are tight to specific areas in the generated code and allow to add custom (target specific) code. All targets support these actions
In order to help customizing the generated files there are a number of additional socalled **named actions**. These actions are tight to specific areas in the generated code and allow to add custom (target specific) code. All targets support these actions
* @parser::header
* @parser::members

View File

@ -58,13 +58,29 @@ Match that character or sequence of characters. E.g., while or =.</t
<tr>
<td>[char set]</td><td>
Match one of the characters specified in the character set. Interpret x-y as set of characters between range x and y, inclusively. The following escaped characters are interpreted as single special characters: \n, \r, \b, \t, and \f. To get ], \, or - you must escape them with \. You can also use Unicode character specifications: \uXXXX. Here are a few examples:
<p>Match one of the characters specified in the character set. Interpret <tt>x-y</tt> as the set of characters between range <tt>x</tt> and <tt>y</tt>, inclusively. The following escaped characters are interpreted as single special characters: <tt>\n</tt>, <tt>\r</tt>, <tt>\b</tt>, <tt>\t</tt>, <tt>\f</tt>, <tt>\uXXXX</tt>, and <tt>\u{XXXXXX}</tt>. To get <tt>]</tt>, <tt>\</tt>, or <tt>-</tt> you must escape them with <tt>\</tt>.</p>
<p>You can also include all characters matching Unicode properties (general category, boolean, script, or block) with <tt>\p{PropertyName}</tt>. (You can invert the test with <tt>\P{PropertyName}</tt>).</p>
<p>For a list of valid Unicode property names, see <a href="http://unicode.org/reports/tr44/#Properties">Unicode Standard Annex #44</a>. (ANTLR also supports <a href="http://unicode.org/reports/tr44/#General_Category_Values">short and long Unicode general category names</a> like <tt>\p{Lu}</tt>, <tt>\p{Z}</tt>, and <tt>\p{Symbol}</tt>.)</p>
<p>Property names include <a href="http://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt">Unicode block names</a> prefixed with <tt>In</tt> (they overlap with script names) and with spaces changed to <tt>_</tt>. For example: <tt>\p{InLatin_1_Supplement}</tt>, <tt>\p{InYijing_Hexagram_Symbols}</tt>, and <tt>\p{InAncient_Greek_Numbers}</tt>.</p>
<p>Property names are <b>case-insensitive</b>, and <tt>_</tt> and <tt>-</tt> are treated identically</p>
<p>Here are a few examples:</p>
<pre>
WS : [ \n\u000D] -> skip ; // same as [ \n\r]
UNICODE_WS : [\p{White_Space}] -> skip; // match all Unicode whitespace
ID : [a-zA-Z] [a-zA-Z0-9]* ; // match usual identifier spec
UNICODE_ID : [\p{Alpha}] [\p{Alnum}]* ; // match full Unicode alphabetic ids
EMOJI : [\u{1F4A9}\u{1F926}] ; // note Unicode code points > U+FFFF
DASHBRACK : [\-\]]+ ; // match - or ] one or more times
</pre>
</td>

View File

@ -81,7 +81,11 @@ These more or less correspond to `isJavaIdentifierPart` and `isJavaIdentifierSta
ANTLR does not distinguish between character and string literals as most languages do. All literal strings one or more characters in length are enclosed in single quotes such as `;`, `if`, `>=`, and `\'` (refers to the one-character string containing the single quote character). Literals never contain regular expressions.
Literals can contain Unicode escape sequences of the form `\uXXXX`, where XXXX is the hexadecimal Unicode character value. For example, `\u00E8` is the French letter with a grave accent: `’è’`. ANTLR also understands the usual special escape sequences: `\n` (newline), `\r` (carriage return), `\t` (tab), `\b` (backspace), and `\f` (form feed). You can use Unicode characters directly within literals or use the Unicode escape sequences:
Literals can contain Unicode escape sequences of the form `\uXXXX` (for Unicode code points up to `U+FFFF`) or `\u{XXXXXX}` (for all Unicode code points), where `XXXX` is the hexadecimal Unicode code point value.
For example, `\u00E8` is the French letter with a grave accent: `’è’`, and `\u{1F4A9}` is the famous emoji: `’💩’`.
ANTLR also understands the usual special escape sequences: `\n` (newline), `\r` (carriage return), `\t` (tab), `\b` (backspace), and `\f` (form feed). You can use Unicode code points directly within literals or use the Unicode escape sequences:
```
grammar Foreign;

View File

@ -482,11 +482,14 @@ public class BaseCppTest implements RuntimeTestSupport {
String os = System.getProperty("os.name", "generic").toLowerCase(Locale.ENGLISH);
if ((os.indexOf("mac") >= 0) || (os.indexOf("darwin") >= 0)) {
detectedOS = "mac";
} else if (os.indexOf("win") >= 0) {
}
else if (os.indexOf("win") >= 0) {
detectedOS = "windows";
} else if (os.indexOf("nux") >= 0) {
}
else if (os.indexOf("nux") >= 0) {
detectedOS = "linux";
} else {
}
else {
detectedOS = "unknown";
}
}

View File

@ -48,7 +48,6 @@ import java.io.File;
import java.io.FileFilter;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
@ -215,7 +214,8 @@ public class BaseGoTest implements RuntimeTestSupport {
ParserATNFactory f;
if (g.isLexer()) {
f = new LexerATNFactory((LexerGrammar) g);
} else {
}
else {
f = new ParserATNFactory(g);
}
@ -286,7 +286,8 @@ public class BaseGoTest implements RuntimeTestSupport {
ttype = interp.match(input, Lexer.DEFAULT_MODE);
if (ttype == Token.EOF) {
tokenTypes.add("EOF");
} else {
}
else {
tokenTypes.add(lg.typeToTokenList.get(ttype));
}
@ -369,7 +370,8 @@ public class BaseGoTest implements RuntimeTestSupport {
this.stderrDuringParse = null;
if (parserName == null) {
writeLexerTestFile(lexerName, false);
} else {
}
else {
writeParserTestFile(parserName, lexerName, listenerName,
visitorName, parserStartRuleName, debug);
}
@ -752,7 +754,8 @@ public class BaseGoTest implements RuntimeTestSupport {
String parserStartRuleName, boolean debug) {
if (parserName == null) {
writeLexerTestFile(lexerName, debug);
} else {
}
else {
writeParserTestFile(parserName, lexerName, listenerName,
visitorName, parserStartRuleName, debug);
}
@ -776,7 +779,8 @@ public class BaseGoTest implements RuntimeTestSupport {
for (File file : files) {
if (file.isDirectory()) {
eraseDirectory(file);
} else {
}
else {
file.delete();
}
}

View File

@ -45,7 +45,6 @@ import org.stringtemplate.v4.STGroup;
import org.stringtemplate.v4.STGroupString;
import java.io.File;
import java.io.FileWriter;
import java.net.URL;
import java.util.ArrayList;
import java.util.Arrays;
@ -147,7 +146,8 @@ public class BaseNodeTest implements RuntimeTestSupport {
ParserATNFactory f;
if (g.isLexer()) {
f = new LexerATNFactory((LexerGrammar) g);
} else {
}
else {
f = new ParserATNFactory(g);
}
@ -218,7 +218,8 @@ public class BaseNodeTest implements RuntimeTestSupport {
ttype = interp.match(input, Lexer.DEFAULT_MODE);
if (ttype == Token.EOF) {
tokenTypes.add("EOF");
} else {
}
else {
tokenTypes.add(lg.typeToTokenList.get(ttype));
}
@ -310,7 +311,8 @@ public class BaseNodeTest implements RuntimeTestSupport {
this.stderrDuringParse = null;
if (parserName == null) {
writeLexerTestFile(lexerName, false);
} else {
}
else {
writeParserTestFile(parserName, lexerName, listenerName,
visitorName, parserStartRuleName, debug);
}
@ -698,7 +700,8 @@ public class BaseNodeTest implements RuntimeTestSupport {
String parserStartRuleName, boolean debug) {
if (parserName == null) {
writeLexerTestFile(lexerName, debug);
} else {
}
else {
writeParserTestFile(parserName, lexerName, listenerName,
visitorName, parserStartRuleName, debug);
}

View File

@ -135,7 +135,8 @@ public class BaseSwiftTest implements RuntimeTestSupport {
String prop = System.getProperty(propName);
if (prop != null && prop.length() > 0) {
tmpdir = prop;
} else {
}
else {
tmpdir = new File(System.getProperty("java.io.tmpdir"), getClass().getSimpleName() +
"-" + Thread.currentThread().getName() + "-" + System.currentTimeMillis()).getAbsolutePath();
}
@ -372,7 +373,8 @@ public class BaseSwiftTest implements RuntimeTestSupport {
outputFileST.add("profile",
"let profiler = ProfilingATNSimulator(parser)\n" +
"parser.setInterpreter(profiler)");
} else {
}
else {
outputFileST.add("profile", new ArrayList<Object>());
}
outputFileST.add("createParser", createParserST);

View File

@ -38,7 +38,7 @@ void myBarLexerAction() { /* do something*/ };
channels { CommentsChannel, DirectiveChannel }
tokens {
DUMMY
DUMMY
}
Return: 'return';
@ -48,7 +48,7 @@ INT: Digit+;
Digit: [0-9];
ID: LETTER (LETTER | '0'..'9')*;
fragment LETTER : [a-zA-Z\u0080-\uFFFF];
fragment LETTER : [a-zA-Z\u0080-\u{10FFFF}];
LessThan: '<';
GreaterThan: '>';
@ -68,7 +68,7 @@ QuestionMark: '?';
Comma: ',' -> skip;
Dollar: '$' -> more, mode(Mode1);
Ampersand: '&' -> type(DUMMY);
String: '"' .*? '"';
Foo: {canTestFoo()}? 'foo' {isItFoo()}? { myFooLexerAction(); };
Bar: 'bar' {isItBar()}? { myBarLexerAction(); };

View File

@ -93,8 +93,9 @@ namespace antlr4 {
/// Tests whether or not {@code recognizer} is in the process of recovering
/// from an error. In error recovery mode, <seealso cref="Parser#consume"/> adds
/// symbols to the parse tree by calling
/// <seealso cref="ParserRuleContext#addErrorNode(Token)"/> instead of
/// <seealso cref="ParserRuleContext#addChild(Token)"/>.
/// {@link Parser#createErrorNode(ParserRuleContext, Token)} then
/// {@link ParserRuleContext#addErrorNode(ErrorNode)} instead of
/// {@link Parser#createTerminalNode(ParserRuleContext, Token)}.
/// </summary>
/// <param name="recognizer"> the parser instance </param>
/// <returns> {@code true} if the parser is currently recovering from a parse

View File

@ -8,6 +8,7 @@
#include "dfa/DFA.h"
#include "ParserRuleContext.h"
#include "tree/TerminalNode.h"
#include "tree/ErrorNodeImpl.h"
#include "Lexer.h"
#include "atn/ParserATNSimulator.h"
#include "misc/IntervalSet.h"
@ -111,7 +112,7 @@ Token* Parser::match(size_t ttype) {
if (_buildParseTrees && t->getTokenIndex() == INVALID_INDEX) {
// we must have conjured up a new token during single token insertion
// if it's not the current symbol
_ctx->addErrorNode(_tracker, t);
_ctx->addChild(createErrorNode(t));
}
}
return t;
@ -127,7 +128,7 @@ Token* Parser::matchWildcard() {
if (_buildParseTrees && t->getTokenIndex() == INVALID_INDEX) {
// we must have conjured up a new token during single token insertion
// if it's not the current symbol
_ctx->addErrorNode(_tracker, t);
_ctx->addChild(createErrorNode(t));
}
}
@ -293,17 +294,19 @@ Token* Parser::consume() {
if (o->getType() != EOF) {
getInputStream()->consume();
}
bool hasListener = _parseListeners.size() > 0 && !_parseListeners.empty();
if (_buildParseTrees || hasListener) {
if (_errHandler->inErrorRecoveryMode(this)) {
tree::ErrorNode* node = _ctx->addErrorNode(_tracker, o);
tree::ErrorNode *node = createErrorNode(o);
_ctx->addChild(node);
if (_parseListeners.size() > 0) {
for (auto listener : _parseListeners) {
listener->visitErrorNode(node);
}
}
} else {
tree::TerminalNode *node = _ctx->addChild(_tracker, o);
tree::TerminalNode *node = _ctx->addChild(createTerminalNode(o));
if (_parseListeners.size() > 0) {
for (auto listener : _parseListeners) {
listener->visitTerminal(node);
@ -617,6 +620,14 @@ bool Parser::isTrace() const {
return _tracer != nullptr;
}
tree::TerminalNode *Parser::createTerminalNode(Token *t) {
return _tracker.createInstance<tree::TerminalNodeImpl>(t);
}
tree::ErrorNode *Parser::createErrorNode(Token *t) {
return _tracker.createInstance<tree::ErrorNodeImpl>(t);
}
void Parser::InitializeInstanceFields() {
_errHandler = std::make_shared<DefaultErrorStrategy>();
_precedenceStack.clear();

View File

@ -54,13 +54,14 @@ namespace antlr4 {
/// Match current input symbol against {@code ttype}. If the symbol type
/// matches, <seealso cref="ANTLRErrorStrategy#reportMatch"/> and <seealso cref="#consume"/> are
/// called to complete the match process.
/// <p/>
///
/// If the symbol type does not match,
/// <seealso cref="ANTLRErrorStrategy#recoverInline"/> is called on the current error
/// strategy to attempt recovery. If <seealso cref="#getBuildParseTree"/> is
/// {@code true} and the token index of the symbol returned by
/// <seealso cref="ANTLRErrorStrategy#recoverInline"/> is -1, the symbol is added to
/// the parse tree by calling <seealso cref="ParserRuleContext#addErrorNode"/>.
/// the parse tree by calling {@link #createErrorNode(ParserRuleContext, Token)} then
/// {@link ParserRuleContext#addErrorNode(ErrorNode)}.
/// </summary>
/// <param name="ttype"> the token type to match </param>
/// <returns> the matched symbol </returns>
@ -258,11 +259,11 @@ namespace antlr4 {
/// </pre>
///
/// If the parser is not in error recovery mode, the consumed symbol is added
/// to the parse tree using <seealso cref="ParserRuleContext#addChild(Token)"/>, and
/// to the parse tree using <seealso cref="ParserRuleContext#addChild(TerminalNode)"/>, and
/// <seealso cref="ParseTreeListener#visitTerminal"/> is called on any parse listeners.
/// If the parser <em>is</em> in error recovery mode, the consumed symbol is
/// added to the parse tree using
/// <seealso cref="ParserRuleContext#addErrorNode(Token)"/>, and
/// added to the parse tree using {@link #createErrorNode(ParserRuleContext, Token)} then
/// {@link ParserRuleContext#addErrorNode(ErrorNode)} and
/// <seealso cref="ParseTreeListener#visitErrorNode"/> is called on any parse
/// listeners.
virtual Token* consume();
@ -376,6 +377,30 @@ namespace antlr4 {
tree::ParseTreeTracker& getTreeTracker() { return _tracker; };
/** How to create a token leaf node associated with a parent.
* Typically, the terminal node to create is not a function of the parent
* but this method must still set the parent pointer of the terminal node
* returned. I would prefer having {@link ParserRuleContext#addAnyChild(ParseTree)}
* set the parent pointer, but the parent pointer is implementation dependent
* and currently there is no setParent() in {@link TerminalNode} (and can't
* add method in Java 1.7 without breaking backward compatibility).
*
* @since 4.7
*/
tree::TerminalNode *createTerminalNode(Token *t);
/** How to create an error node, given a token, associated with a parent.
* Typically, the error node to create is not a function of the parent
* but this method must still set the parent pointer of the terminal node
* returned. I would prefer having {@link ParserRuleContext#addAnyChild(ParseTree)}
* set the parent pointer, but the parent pointer is implementation dependent
* and currently there is no setParent() in {@link ErrorNode} (and can't
* add method in Java 1.7 without breaking backward compatibility).
*
* @since 4.7
*/
tree::ErrorNode *createErrorNode(Token *t);
protected:
/// The ParserRuleContext object for the currently executing rule.
/// This is always non-null during the parsing process.

View File

@ -23,6 +23,7 @@
#include "Vocabulary.h"
#include "InputMismatchException.h"
#include "CommonToken.h"
#include "tree/ErrorNode.h"
#include "support/CPPUtils.h"
@ -288,14 +289,14 @@ void ParserInterpreter::recover(RecognitionException &e) {
_errorToken = getTokenFactory()->create({ tok->getTokenSource(), tok->getTokenSource()->getInputStream() },
expectedTokenType, tok->getText(), Token::DEFAULT_CHANNEL, INVALID_INDEX, INVALID_INDEX, // invalid start/stop
tok->getLine(), tok->getCharPositionInLine());
_ctx->addErrorNode(_tracker, _errorToken.get());
_ctx->addChild(createErrorNode(_errorToken.get()));
}
else { // NoViableAlt
Token *tok = e.getOffendingToken();
_errorToken = getTokenFactory()->create({ tok->getTokenSource(), tok->getTokenSource()->getInputStream() },
Token::INVALID_TYPE, tok->getText(), Token::DEFAULT_CHANNEL, INVALID_INDEX, INVALID_INDEX, // invalid start/stop
tok->getLine(), tok->getCharPositionInLine());
_ctx->addErrorNode(_tracker, _errorToken.get());
_ctx->addChild(createErrorNode(_errorToken.get()));
}
}
}

View File

@ -3,7 +3,8 @@
* can be found in the LICENSE.txt file in the project root.
*/
#include "tree/ErrorNodeImpl.h"
#include "tree/TerminalNode.h"
#include "tree/ErrorNode.h"
#include "misc/Interval.h"
#include "Parser.h"
#include "Token.h"
@ -34,6 +35,22 @@ void ParserRuleContext::copyFrom(ParserRuleContext *ctx) {
this->start = ctx->start;
this->stop = ctx->stop;
// copy any error nodes to alt label node
if (!ctx->children.empty()) {
for (auto child : ctx->children) {
auto errorNode = dynamic_cast<ErrorNode *>(child);
if (errorNode != nullptr) {
errorNode->setParent(this);
children.push_back(errorNode);
}
}
// Remove the just reparented error nodes from the source context.
ctx->children.erase(std::remove_if(ctx->children.begin(), ctx->children.end(), [this](tree::ParseTree *e) -> bool {
return std::find(children.begin(), children.end(), e) != children.end();
}), ctx->children.end());
}
}
void ParserRuleContext::enterRule(tree::ParseTreeListener * /*listener*/) {
@ -43,6 +60,7 @@ void ParserRuleContext::exitRule(tree::ParseTreeListener * /*listener*/) {
}
tree::TerminalNode* ParserRuleContext::addChild(tree::TerminalNode *t) {
t->setParent(this);
children.push_back(t);
return t;
}
@ -58,20 +76,6 @@ void ParserRuleContext::removeLastChild() {
}
}
tree::TerminalNode* ParserRuleContext::addChild(ParseTreeTracker &tracker, Token *matchedToken) {
auto t = tracker.createInstance<tree::TerminalNodeImpl>(matchedToken);
addChild(t);
t->parent = this;
return t;
}
tree::ErrorNode* ParserRuleContext::addErrorNode(ParseTreeTracker &tracker, Token *badToken) {
auto t = tracker.createInstance<tree::ErrorNodeImpl>(badToken);
addChild(t);
t->parent = this;
return t;
}
tree::TerminalNode* ParserRuleContext::getToken(size_t ttype, size_t i) {
if (i >= children.size()) {
return nullptr;

View File

@ -70,7 +70,8 @@ namespace antlr4 {
virtual ~ParserRuleContext() {}
/** COPY a ctx (I'm deliberately not using copy constructor) to avoid
* confusion with creating node with parent. Does not copy children.
* confusion with creating node with parent. Does not copy children
* (except error leaves).
*/
virtual void copyFrom(ParserRuleContext *ctx);
@ -80,7 +81,7 @@ namespace antlr4 {
virtual void enterRule(tree::ParseTreeListener *listener);
virtual void exitRule(tree::ParseTreeListener *listener);
/// Does not set parent link; other add methods do that.
/** Add a token leaf node child and force its parent to be this node. */
tree::TerminalNode* addChild(tree::TerminalNode *t);
RuleContext* addChild(RuleContext *ruleInvocation);
@ -89,9 +90,6 @@ namespace antlr4 {
/// generic ruleContext object.
virtual void removeLastChild();
virtual tree::TerminalNode* addChild(tree::ParseTreeTracker &tracker, Token *matchedToken);
virtual tree::ErrorNode* addErrorNode(tree::ParseTreeTracker &tracker, Token *badToken);
virtual tree::TerminalNode* getToken(size_t ttype, std::size_t i);
virtual std::vector<tree::TerminalNode *> getTokens(size_t ttype);
@ -132,14 +130,14 @@ namespace antlr4 {
* Note that the range from start to stop is inclusive, so for rules that do not consume anything
* (for example, zero length or error productions) this token may exceed stop.
*/
virtual Token*getStart();
virtual Token *getStart();
/**
* Get the final token in this context.
* Note that the range from start to stop is inclusive, so for rules that do not consume anything
* (for example, zero length or error productions) this token may precede start.
*/
virtual Token* getStop();
virtual Token *getStop();
/// <summary>
/// Used for rule context info debugging during parse-time, not so much for ATN debugging </summary>

View File

@ -203,11 +203,13 @@ void TokenStreamRewriter::Delete(Token *from, Token *to) {
}
void TokenStreamRewriter::Delete(const std::string &programName, size_t from, size_t to) {
replace(programName, from, to, nullptr);
std::string nullString;
replace(programName, from, to, nullString);
}
void TokenStreamRewriter::Delete(const std::string &programName, Token *from, Token *to) {
replace(programName, from, to, nullptr);
std::string nullString;
replace(programName, from, to, nullString);
}
size_t TokenStreamRewriter::getLastRewriteTokenIndex() {

View File

@ -13,6 +13,17 @@ namespace tree {
class ANTLR4CPP_PUBLIC TerminalNode : public ParseTree {
public:
virtual Token* getSymbol() = 0;
/** Set the parent for this leaf node.
*
* Technically, this is not backward compatible as it changes
* the interface but no one was able to create custom
* TerminalNodes anyway so I'm adding as it improves internal
* code quality.
*
* @since 4.7
*/
virtual void setParent(RuleContext *parent) = 0;
};
} // namespace tree

View File

@ -5,6 +5,7 @@
#include "misc/Interval.h"
#include "Token.h"
#include "RuleContext.h"
#include "tree/ParseTreeVisitor.h"
#include "tree/TerminalNodeImpl.h"
@ -19,6 +20,10 @@ Token* TerminalNodeImpl::getSymbol() {
return symbol;
}
void TerminalNodeImpl::setParent(RuleContext *parent) {
this->parent = parent;
}
misc::Interval TerminalNodeImpl::getSourceInterval() {
if (symbol == nullptr) {
return misc::Interval::INVALID;

View File

@ -17,6 +17,7 @@ namespace tree {
TerminalNodeImpl(Token *symbol);
virtual Token* getSymbol() override;
virtual void setParent(RuleContext *parent) override;
virtual misc::Interval getSourceInterval() override;
virtual antlrcpp::Any accept(ParseTreeVisitor *visitor) override;

View File

@ -70,14 +70,17 @@ public final class CodePointCharStream implements CharStream {
if (i == 0) {
// Undefined
return 0;
} else if (i < 0) {
}
else if (i < 0) {
if (codePointBuffer.position() + i < initialPosition) {
return IntStream.EOF;
}
return codePointBuffer.get(relativeBufferPosition(i));
} else if (i > codePointBuffer.remaining()) {
}
else if (i > codePointBuffer.remaining()) {
return IntStream.EOF;
} else {
}
else {
return codePointBuffer.get(relativeBufferPosition(i - 1));
}
}

View File

@ -10,7 +10,7 @@ import org.antlr.v4.runtime.misc.Pair;
/** The default mechanism for creating tokens. It's used by default in Lexer and
* the error handling strategy (to create missing tokens). Notifying the parser
* of a new factory means that it notifies it's token source and error strategy.
* of a new factory means that it notifies its token source and error strategy.
*/
public interface TokenFactory<Symbol extends Token> {
/** This is the method used to create tokens in the lexer and in the

View File

@ -134,8 +134,9 @@ public class UTF8CodePointDecoder {
public IntBuffer decodeCodePointsFromBuffer(
ByteBuffer utf8BytesIn,
IntBuffer codePointsOut,
boolean endOfInput
) throws CharacterCodingException {
boolean endOfInput)
throws CharacterCodingException
{
while (utf8BytesIn.hasRemaining()) {
if (decodingTrailBytesNeeded == -1) {
// Start a new UTF-8 sequence by checking the leading byte.
@ -216,7 +217,8 @@ public class UTF8CodePointDecoder {
int trailingValue = (trailingByte & 0xFF) - 0x80;
if (trailingValue < 0x00 || trailingValue > 0x3F) {
return false;
} else {
}
else {
decodingCurrentCodePoint = (decodingCurrentCodePoint << 6) | trailingValue;
return true;
}
@ -225,8 +227,9 @@ public class UTF8CodePointDecoder {
private IntBuffer appendCodePointFromInterval(
int codePoint,
Interval validCodePointRange,
IntBuffer codePointsOut
) throws CharacterCodingException {
IntBuffer codePointsOut)
throws CharacterCodingException
{
assert validCodePointRange != Interval.INVALID;
// Security check: UTF-8 must represent code points using their
@ -239,7 +242,8 @@ public class UTF8CodePointDecoder {
codePoint,
validCodePointRange),
codePointsOut);
} else {
}
else {
return appendCodePoint(codePoint, codePointsOut);
}
}
@ -258,11 +262,13 @@ public class UTF8CodePointDecoder {
private IntBuffer handleDecodeError(
final String error,
IntBuffer codePointsOut
) throws CharacterCodingException {
IntBuffer codePointsOut)
throws CharacterCodingException
{
if (decodingErrorAction == CodingErrorAction.REPLACE) {
codePointsOut = appendCodePoint(SUBSTITUTION_CHARACTER, codePointsOut);
} else if (decodingErrorAction == CodingErrorAction.REPORT) {
}
else if (decodingErrorAction == CodingErrorAction.REPORT) {
throw new CharacterCodingException() {
@Override
public String getMessage() {

View File

@ -161,7 +161,8 @@ public class ATNConfig {
public boolean equals(ATNConfig other) {
if (this == other) {
return true;
} else if (other == null) {
}
else if (other == null) {
return false;
}

View File

@ -105,7 +105,8 @@ public class ATNDeserializer {
return 1;
}
};
} else {
}
else {
return new UnicodeDeserializer() {
@Override
public int readUnicode(char[] data, int p) {

View File

@ -16,8 +16,8 @@ import java.io.InvalidClassException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.UUID;
@ -168,7 +168,8 @@ public class ATNSerializer {
for (IntervalSet set : sets.keySet()) {
if (set.getMaxElement() <= Character.MAX_VALUE) {
bmpSets.add(set);
} else {
}
else {
smpSets.add(set);
}
}

View File

@ -26,7 +26,8 @@ public abstract class CodePointTransitions {
public static Transition createWithCodePoint(ATNState target, int codePoint) {
if (Character.isSupplementaryCodePoint(codePoint)) {
return new SetTransition(target, IntervalSet.of(codePoint));
} else {
}
else {
return new AtomTransition(target, codePoint);
}
}
@ -43,7 +44,8 @@ public abstract class CodePointTransitions {
if (Character.isSupplementaryCodePoint(codePointFrom) ||
Character.isSupplementaryCodePoint(codePointTo)) {
return new SetTransition(target, IntervalSet.of(codePointFrom, codePointTo));
} else {
}
else {
return new RangeTransition(target, codePointFrom, codePointTo);
}
}

View File

@ -738,7 +738,8 @@ public class LexerATNSimulator extends ATNSimulator {
if ( curChar=='\n' ) {
line++;
charPositionInLine=0;
} else {
}
else {
charPositionInLine++;
}
input.consume();

View File

@ -6,8 +6,6 @@
package org.antlr.v4.runtime.misc;
import org.antlr.v4.runtime.Token;
import java.util.List;
/**
@ -132,10 +130,11 @@ public interface IntSet {
/**
* Returns the single value contained in the set, if {@link #size} is 1;
* otherwise, returns {@link Token#INVALID_TYPE}.
* otherwise, result is undefined. Check {@link #isNil()} before using
* this function.
*
* @return the single value contained in the set, if {@link #size} is 1;
* otherwise, returns {@link Token#INVALID_TYPE}.
* otherwise, result is undefined.
*/
int getSingleElement();

View File

@ -36,7 +36,8 @@ public class IntegerList {
if (capacity == 0) {
_data = EMPTY_DATA;
} else {
}
else {
_data = new int[capacity];
}
}
@ -258,7 +259,8 @@ public class IntegerList {
int newLength;
if (_data.length == 0) {
newLength = INITIAL_SIZE;
} else {
}
else {
newLength = _data.length;
}

View File

@ -429,10 +429,11 @@ public class IntervalSet implements IntSet {
}
/**
* Returns the maximum value contained in the set.
* Returns the maximum value contained in the set if not isNil().
* Otherwise, result is undefined.
*
* @return the maximum value contained in the set. If the set is empty, this
* method returns {@link Token#INVALID_TYPE}.
* @return the maximum value contained in the set. If the set is empty,
* result is undefined.
*/
public int getMaxElement() {
if ( isNil() ) {
@ -443,10 +444,11 @@ public class IntervalSet implements IntSet {
}
/**
* Returns the minimum value contained in the set.
* Returns the minimum value contained in the set if not isNil().
* Otherwise, result is undefined.
*
* @return the minimum value contained in the set. If the set is empty, this
* method returns {@link Token#INVALID_TYPE}.
* @return the minimum value contained in the set. If the set is empty,
* result is undefined.
*/
public int getMinElement() {
if ( isNil() ) {

View File

@ -32,9 +32,11 @@ public class IterativeParseTreeWalker extends ParseTreeWalker {
// pre-order visit
if (currentNode instanceof ErrorNode) {
listener.visitErrorNode((ErrorNode) currentNode);
} else if (currentNode instanceof TerminalNode) {
}
else if (currentNode instanceof TerminalNode) {
listener.visitTerminal((TerminalNode) currentNode);
} else {
}
else {
final RuleNode r = (RuleNode) currentNode;
enterRule(listener, r);
}

View File

@ -72,11 +72,12 @@ public protocol ANTLRErrorStrategy {
/// the parsing process
func sync(_ recognizer: Parser) throws // RecognitionException;
/// Tests whether or not {@code recognizer} is in the process of recovering
/// Tests whether or not recognizer} is in the process of recovering
/// from an error. In error recovery mode, {@link org.antlr.v4.runtime.Parser#consume} adds
/// symbols to the parse tree by calling
/// {@link org.antlr.v4.runtime.ParserRuleContext#addErrorNode(org.antlr.v4.runtime.Token)} instead of
/// {@link org.antlr.v4.runtime.ParserRuleContext#addChild(org.antlr.v4.runtime.Token)}.
/// {@link Parser#createErrorNode(ParserRuleContext, Token)} then
/// {@link ParserRuleContext#addErrorNode(ErrorNode)} instead of
/// {@link Parser#createTerminalNode(ParserRuleContext, Token)}.
///
/// - parameter recognizer: the parser instance
/// - returns: {@code true} if the parser is currently recovering from a parse

View File

@ -1,18 +1,15 @@
/* Copyright (c) 2012-2016 The ANTLR Project. All rights reserved.
* Use of this file is governed by the BSD 3-clause license that
* can be found in the LICENSE.txt file in the project root.
*/
/** This is all the parsing support code essentially; most of it is error recovery stuff. */
//public abstract class Parser : Recognizer<Token, ParserATNSimulator> {
///
/// Copyright (c) 2012-2016 The ANTLR Project. All rights reserved.
/// Use of this file is governed by the BSD 3-clause license that
/// can be found in the LICENSE.txt file in the project root.
///
import Foundation
/// This is all the parsing support code essentially; most of it is error recovery stuff.
open class Parser: Recognizer<ParserATNSimulator> {
public static let EOF: Int = -1
public static var ConsoleError = true
//false
public class TraceListener: ParseTreeListener {
var host: Parser
@ -62,10 +59,7 @@ open class Parser: Recognizer<ParserATNSimulator> {
public func exitEveryRule(_ ctx: ParserRuleContext) {
//TODO: check necessary
// if (ctx.children is ArrayList) {
// (ctx.children as ArrayList<?>).trimToSize();
// }
// TODO: Print exit info.
}
}
@ -75,10 +69,8 @@ open class Parser: Recognizer<ParserATNSimulator> {
*
* @see org.antlr.v4.runtime.atn.ATNDeserializationOptions#isGenerateRuleBypassTransitions()
*/
//private let bypassAltsAtnCache : Dictionary<String, ATN> =
// WeakHashMap<String, ATN>(); MapTable<NSString, ATN>
private let bypassAltsAtnCache: HashMap<String, ATN> = HashMap<String, ATN>()
/**
* The error handling strategy for the parser. The default value is a new
* instance of {@link org.antlr.v4.runtime.DefaultErrorStrategy}.
@ -86,7 +78,6 @@ open class Parser: Recognizer<ParserATNSimulator> {
* @see #getErrorHandler
* @see #setErrorHandler
*/
public var _errHandler: ANTLRErrorStrategy = DefaultErrorStrategy()
/**
@ -177,14 +168,15 @@ open class Parser: Recognizer<ParserATNSimulator> {
* strategy to attempt recovery. If {@link #getBuildParseTree} is
* {@code true} and the token index of the symbol returned by
* {@link org.antlr.v4.runtime.ANTLRErrorStrategy#recoverInline} is -1, the symbol is added to
* the parse tree by calling {@link org.antlr.v4.runtime.ParserRuleContext#addErrorNode}.</p>
* the parse tree by calling {@link #createErrorNode(ParserRuleContext, Token)} then
* {@link ParserRuleContext#addErrorNode(ErrorNode)}.</p>
*
* @param ttype the token type to match
* @return the matched symbol
* @throws org.antlr.v4.runtime.RecognitionException if the current input symbol did not match
* {@code ttype} and the error strategy could not recover from the
* mismatched symbol
*///; RecognitionException
*/
@discardableResult
public func match(_ ttype: Int) throws -> Token {
var t: Token = try getCurrentToken()
@ -196,7 +188,7 @@ open class Parser: Recognizer<ParserATNSimulator> {
if _buildParseTrees && t.getTokenIndex() == -1 {
// we must have conjured up a new token during single token insertion
// if it's not the current symbol
_ctx!.addErrorNode(t)
_ctx!.addErrorNode(createErrorNode(parent: _ctx!, t: t))
}
}
return t
@ -212,7 +204,8 @@ open class Parser: Recognizer<ParserATNSimulator> {
* strategy to attempt recovery. If {@link #getBuildParseTree} is
* {@code true} and the token index of the symbol returned by
* {@link org.antlr.v4.runtime.ANTLRErrorStrategy#recoverInline} is -1, the symbol is added to
* the parse tree by calling {@link org.antlr.v4.runtime.ParserRuleContext#addErrorNode}.</p>
* the parse tree by calling {@link #createErrorNode(ParserRuleContext, Token)} then
* {@link ParserRuleContext#addErrorNode(ErrorNode)}.</p>
*
* @return the matched symbol
* @throws org.antlr.v4.runtime.RecognitionException if the current input symbol did not match
@ -230,7 +223,7 @@ open class Parser: Recognizer<ParserATNSimulator> {
if _buildParseTrees && t.getTokenIndex() == -1 {
// we must have conjured up a new token during single token insertion
// if it's not the current symbol
_ctx!.addErrorNode(t)
_ctx!.addErrorNode(createErrorNode(parent: _ctx!, t: t))
}
}
@ -562,11 +555,11 @@ open class Parser: Recognizer<ParserATNSimulator> {
* </pre>
*
* If the parser is not in error recovery mode, the consumed symbol is added
* to the parse tree using {@link org.antlr.v4.runtime.ParserRuleContext#addChild(org.antlr.v4.runtime.Token)}, and
* to the parse tree using {@link ParserRuleContext#addChild(TerminalNode)}, and
* {@link org.antlr.v4.runtime.tree.ParseTreeListener#visitTerminal} is called on any parse listeners.
* If the parser <em>is</em> in error recovery mode, the consumed symbol is
* added to the parse tree using
* {@link org.antlr.v4.runtime.ParserRuleContext#addErrorNode(org.antlr.v4.runtime.Token)}, and
* added to the parse tree using {@link #createErrorNode(ParserRuleContext, Token)} then
* {@link ParserRuleContext#addErrorNode(ErrorNode)} and
* {@link org.antlr.v4.runtime.tree.ParseTreeListener#visitErrorNode} is called on any parse
* listeners.
*/
@ -583,14 +576,14 @@ open class Parser: Recognizer<ParserATNSimulator> {
if _buildParseTrees || hasListener {
if _errHandler.inErrorRecoveryMode(self) {
let node: ErrorNode = _ctx.addErrorNode(o)
let node: ErrorNode = _ctx.addErrorNode(createErrorNode(parent: _ctx, t: o))
if let _parseListeners = _parseListeners {
for listener: ParseTreeListener in _parseListeners {
listener.visitErrorNode(node)
}
}
} else {
let node: TerminalNode = _ctx.addChild(o)
let node: TerminalNode = _ctx.addChild(createTerminalNode(parent: _ctx, t: o))
if let _parseListeners = _parseListeners {
for listener: ParseTreeListener in _parseListeners {
listener.visitTerminal(node)
@ -600,6 +593,24 @@ open class Parser: Recognizer<ParserATNSimulator> {
}
return o
}
/** How to create a token leaf node associated with a parent.
* Typically, the terminal node to create is not a function of the parent.
*
* @since 4.7
*/
public func createTerminalNode(parent: ParserRuleContext, t: Token) -> TerminalNode {
return TerminalNodeImpl(t);
}
/** How to create an error node, given a token, associated with a parent.
* Typically, the error node to create is not a function of the parent.
*
* @since 4.7
*/
public func createErrorNode(parent: ParserRuleContext, t: Token) -> ErrorNode {
return ErrorNode(t);
}
internal func addContextToParseTree() {

View File

@ -72,6 +72,14 @@ open class ParserRuleContext: RuleContext {
/** COPY a ctx (I'm deliberately not using copy constructor) to avoid
* confusion with creating node with parent. Does not copy children.
*
* This is used in the generated parser code to flip a generic XContext
* node for rule X to a YContext for alt label Y. In that sense, it is
* not really a generic copy function.
*
* If we do an error sync() at start of a rule, we might add error nodes
* to the generic XContext so this function must copy those nodes to
* the YContext as well else they are lost!
*/
open func copyFrom(_ ctx: ParserRuleContext) {
self.parent = ctx.parent
@ -81,13 +89,12 @@ open class ParserRuleContext: RuleContext {
self.stop = ctx.stop
// copy any error nodes to alt label node
if ctx.children != nil{
if ctx.children != nil {
self.children = Array<ParseTree>()
// reset parent pointer for any error nodes
for child: ParseTree in ctx.children! {
if child is ErrorNode{
self.children?.append(child)
( (child as! ErrorNode)).parent = self
for child: ParseTree in ctx.children! {
if child is ErrorNode {
addChild(child as! ErrorNode)
}
}
}
@ -105,51 +112,90 @@ open class ParserRuleContext: RuleContext {
open func exitRule(_ listener: ParseTreeListener) {
}
/** Does not set parent link; other add methods do that */
/** Add a parse tree node to this as a child. Works for
* internal and leaf nodes. Does not set parent link;
* other add methods must do that. Other addChild methods
* call this.
*
* We cannot set the parent pointer of the incoming node
* because the existing interfaces do not have a setParent()
* method and I don't want to break backward compatibility for this.
*
* @since 4.7
*/
@discardableResult
open func addChild(_ t: TerminalNode) -> TerminalNode {
open func addAnyChild<T: ParseTree>(_ t: T) -> T {
if children == nil {
children = Array<ParseTree>()
children = [T]()
}
children!.append(t)
return t
}
@discardableResult
open func addChild(_ ruleInvocation: RuleContext) -> RuleContext {
if children == nil {
children = Array<ParseTree>()
}
children!.append(ruleInvocation)
return ruleInvocation
return addAnyChild(ruleInvocation)
}
/** Add a token leaf node child and force its parent to be this node. */
@discardableResult
open func addChild(_ t: TerminalNode) -> TerminalNode {
t.setParent(self)
return addAnyChild(t)
}
/** Add an error node child and force its parent to be this node.
*
* @since 4.7
*/
@discardableResult
open func addErrorNode(_ errorNode: ErrorNode) -> ErrorNode {
errorNode.setParent(self)
return addAnyChild(errorNode)
}
/** Add a child to this node based upon matchedToken. It
* creates a TerminalNodeImpl rather than using
* {@link Parser#createTerminalNode(ParserRuleContext, Token)}. I'm leaving this
* in for compatibility but the parser doesn't use this anymore.
*/
@available(*, deprecated)
open func addChild(_ matchedToken: Token) -> TerminalNode {
let t: TerminalNodeImpl = TerminalNodeImpl(matchedToken)
addAnyChild(t)
t.setParent(self)
return t
}
/** Add a child to this node based upon badToken. It
* creates a ErrorNodeImpl rather than using
* {@link Parser#createErrorNode(ParserRuleContext, Token)}. I'm leaving this
* in for compatibility but the parser doesn't use this anymore.
*/
@discardableResult
@available(*, deprecated)
open func addErrorNode(_ badToken: Token) -> ErrorNode {
let t: ErrorNode = ErrorNode(badToken)
addAnyChild(t)
t.setParent(self)
return t
}
// public void trace(int s) {
// if ( states==null ) states = new ArrayList<Integer>();
// states.add(s);
// }
/** Used by enterOuterAlt to toss out a RuleContext previously added as
* we entered a rule. If we have # label, we will need to remove
* generic ruleContext object.
*/
*/
open func removeLastChild() {
children?.removeLast()
//children.remove(children.size()-1);
}
// public void trace(int s) {
// if ( states==null ) states = new ArrayList<Integer>();
// states.add(s);
// }
open func addChild(_ matchedToken: Token) -> TerminalNode {
let t: TerminalNodeImpl = TerminalNodeImpl(matchedToken)
addChild(t)
t.parent = self
return t
}
@discardableResult
open func addErrorNode(_ badToken: Token) -> ErrorNode {
let t: ErrorNode = ErrorNode(badToken)
addChild(t)
t.parent = self
return t
if children != nil {
children!.remove(at: children!.count-1)
}
}
override
/** Override to make type more specific */

View File

@ -9,4 +9,18 @@ public class TerminalNode: ParseTree {
fatalError()
}
/** Set the parent for this leaf node.
*
* Technically, this is not backward compatible as it changes
* the interface but no one was able to create custom
* TerminalNodes anyway so I'm adding as it improves internal
* code quality.
*
* @since 4.7
*/
public func setParent(_ parent: RuleContext) {
RuntimeException(" must overriden !")
fatalError()
}
}

View File

@ -27,6 +27,11 @@ public class TerminalNodeImpl: TerminalNode {
return parent
}
override
public func setParent(_ parent: RuleContext) {
self.parent = parent
}
override
public func getPayload() -> AnyObject {
return symbol

View File

@ -41,12 +41,16 @@ static private void addProperty<k>() {
addPropertyAliases();
}
private static String normalize(String propertyCodeOrAlias) {
return propertyCodeOrAlias.toLowerCase(Locale.US).replace('-', '_');
}
/**
* Given a Unicode property (general category code, binary property name, or script name),
* returns the {@link IntervalSet} of Unicode code point ranges which have that property.
*/
public static IntervalSet getPropertyCodePoints(String propertyCodeOrAlias) {
String normalizedPropertyCodeOrAlias = propertyCodeOrAlias.toLowerCase(Locale.US);
String normalizedPropertyCodeOrAlias = normalize(propertyCodeOrAlias);
IntervalSet result = propertyCodePointRanges.get(normalizedPropertyCodeOrAlias);
if (result == null) {
String propertyCode = propertyAliases.get(normalizedPropertyCodeOrAlias);

View File

@ -75,11 +75,13 @@ public abstract class UnicodeDataTemplateController {
addUnicodeCategoryCodesToCodePointRanges(propertyCodePointRanges);
addUnicodeBinaryPropertyCodesToCodePointRanges(propertyCodePointRanges);
addUnicodeScriptCodesToCodePointRanges(propertyCodePointRanges);
addUnicodeBlocksToCodePointRanges(propertyCodePointRanges);
Map<String, String> propertyAliases = new LinkedHashMap<>();
addUnicodeCategoryCodesToNames(propertyAliases);
addUnicodeBinaryPropertyCodesToNames(propertyAliases);
addUnicodeScriptCodesToNames(propertyAliases);
addUnicodeBlocksToNames(propertyAliases);
Map<String, Object> properties = new LinkedHashMap<>();
properties.put("propertyCodePointRanges", propertyCodePointRanges);
@ -171,17 +173,17 @@ public abstract class UnicodeDataTemplateController {
}
}
private static void addUnicodeScriptCodesToCodePointRanges(Map<String, IntervalSet> propertyCodePointRanges) {
for (int script = UCharacter.getIntPropertyMinValue(UProperty.SCRIPT);
script <= UCharacter.getIntPropertyMaxValue(UProperty.SCRIPT);
script++) {
private static void addIntPropertyRanges(int property, String namePrefix, Map<String, IntervalSet> propertyCodePointRanges) {
for (int propertyValue = UCharacter.getIntPropertyMinValue(property);
propertyValue <= UCharacter.getIntPropertyMaxValue(property);
propertyValue++) {
UnicodeSet set = new UnicodeSet();
set.applyIntPropertyValue(UProperty.SCRIPT, script);
String scriptName = UCharacter.getPropertyValueName(UProperty.SCRIPT, script, UProperty.NameChoice.SHORT);
IntervalSet intervalSet = propertyCodePointRanges.get(scriptName);
set.applyIntPropertyValue(property, propertyValue);
String propertyName = namePrefix + UCharacter.getPropertyValueName(property, propertyValue, UProperty.NameChoice.SHORT);
IntervalSet intervalSet = propertyCodePointRanges.get(propertyName);
if (intervalSet == null) {
intervalSet = new IntervalSet();
propertyCodePointRanges.put(scriptName, intervalSet);
propertyCodePointRanges.put(propertyName, intervalSet);
}
for (UnicodeSet.EntryRange range : set.ranges()) {
intervalSet.add(range.codepoint, range.codepointEnd);
@ -189,16 +191,24 @@ public abstract class UnicodeDataTemplateController {
}
}
private static void addUnicodeScriptCodesToNames(Map<String, String> propertyAliases) {
for (int script = UCharacter.getIntPropertyMinValue(UProperty.SCRIPT);
script <= UCharacter.getIntPropertyMaxValue(UProperty.SCRIPT);
script++) {
String propertyName = UCharacter.getPropertyValueName(UProperty.SCRIPT, script, UProperty.NameChoice.SHORT);
private static void addUnicodeScriptCodesToCodePointRanges(Map<String, IntervalSet> propertyCodePointRanges) {
addIntPropertyRanges(UProperty.SCRIPT, "", propertyCodePointRanges);
}
private static void addUnicodeBlocksToCodePointRanges(Map<String, IntervalSet> propertyCodePointRanges) {
addIntPropertyRanges(UProperty.BLOCK, "In", propertyCodePointRanges);
}
private static void addIntPropertyAliases(int property, String namePrefix, Map<String, String> propertyAliases) {
for (int propertyValue = UCharacter.getIntPropertyMinValue(property);
propertyValue <= UCharacter.getIntPropertyMaxValue(property);
propertyValue++) {
String propertyName = namePrefix + UCharacter.getPropertyValueName(property, propertyValue, UProperty.NameChoice.SHORT);
int nameChoice = UProperty.NameChoice.LONG;
String alias;
while (true) {
try {
alias = UCharacter.getPropertyValueName(UProperty.SCRIPT, script, nameChoice);
alias = namePrefix + UCharacter.getPropertyValueName(property, propertyValue, nameChoice);
} catch (IllegalArgumentException e) {
// No more aliases.
break;
@ -209,4 +219,12 @@ public abstract class UnicodeDataTemplateController {
}
}
}
private static void addUnicodeScriptCodesToNames(Map<String, String> propertyAliases) {
addIntPropertyAliases(UProperty.SCRIPT, "", propertyAliases);
}
private static void addUnicodeBlocksToNames(Map<String, String> propertyAliases) {
addIntPropertyAliases(UProperty.BLOCK, "In", propertyAliases);
}
}

View File

@ -115,6 +115,129 @@ public class TestATNConstruction extends BaseJavaToolTest {
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSet() throws Exception {
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [abc] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-{97..99}->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSetRange() throws Exception {
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [a-c] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-{97..99}->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSetUnicodeBMPEscape() throws Exception {
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [\\uABCD] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-43981->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSetUnicodeBMPEscapeRange() throws Exception {
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [a-c\\uABCD-\\uABFF] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-{97..99, 43981..44031}->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSetUnicodeSMPEscape() throws Exception {
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [\\u{10ABCD}] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-1092557->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSetUnicodeSMPEscapeRange() throws Exception {
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [a-c\\u{10ABCD}-\\u{10ABFF}] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-{97..99, 1092557..1092607}->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSetUnicodePropertyEscape() throws Exception {
// The Gothic script is long dead and unlikely to change (which would
// cause this test to fail)
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [\\p{Gothic}] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-{66352..66378}->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSetUnicodePropertyInvertEscape() throws Exception {
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [\\P{Gothic}] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-{0..66351, 66379..1114111}->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSetUnicodeMultiplePropertyEscape() throws Exception {
// Ditto the Mahajani script. Not going to change soon. I hope.
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [\\p{Gothic}\\p{Mahajani}] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-{66352..66378, 69968..70006}->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testCharSetUnicodePropertyOverlap() throws Exception {
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+
"A : [\\p{ASCII_Hex_Digit}\\p{Hex_Digit}] ;"
);
String expecting =
"s0->RuleStart_A_1\n" +
"RuleStart_A_1->s3\n" +
"s3-{48..57, 65..70, 97..102, 65296..65305, 65313..65318, 65345..65350}->s4\n" +
"s4->RuleStop_A_2\n";
checkTokensRule(g, null, expecting);
}
@Test public void testRangeOrRange() throws Exception {
LexerGrammar g = new LexerGrammar(
"lexer grammar P;\n"+

View File

@ -0,0 +1,138 @@
/*
* Copyright (c) 2012-2017 The ANTLR Project. All rights reserved.
* Use of this file is governed by the BSD 3-clause license that
* can be found in the LICENSE.txt file in the project root.
*/
package org.antlr.v4.test.tool;
import org.antlr.v4.misc.EscapeSequenceParsing;
import org.antlr.v4.runtime.misc.IntervalSet;
import org.junit.Test;
import static org.antlr.v4.misc.EscapeSequenceParsing.Result;
import static org.junit.Assert.assertEquals;
public class TestEscapeSequenceParsing {
@Test
public void testParseEmpty() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("", 0));
}
@Test
public void testParseJustBackslash() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\", 0));
}
@Test
public void testParseInvalidEscape() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\z", 0));
}
@Test
public void testParseNewline() {
assertEquals(
new Result(Result.Type.CODE_POINT, '\n', IntervalSet.EMPTY_SET, 2),
EscapeSequenceParsing.parseEscape("\\n", 0));
}
@Test
public void testParseTab() {
assertEquals(
new Result(Result.Type.CODE_POINT, '\t', IntervalSet.EMPTY_SET, 2),
EscapeSequenceParsing.parseEscape("\\t", 0));
}
@Test
public void testParseUnicodeTooShort() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\uABC", 0));
}
@Test
public void testParseUnicodeBMP() {
assertEquals(
new Result(Result.Type.CODE_POINT, 0xABCD, IntervalSet.EMPTY_SET, 6),
EscapeSequenceParsing.parseEscape("\\uABCD", 0));
}
@Test
public void testParseUnicodeSMPTooShort() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\u{}", 0));
}
@Test
public void testParseUnicodeSMPMissingCloseBrace() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\u{12345", 0));
}
@Test
public void testParseUnicodeTooBig() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\u{110000}", 0));
}
@Test
public void testParseUnicodeSMP() {
assertEquals(
new Result(Result.Type.CODE_POINT, 0x10ABCD, IntervalSet.EMPTY_SET, 10),
EscapeSequenceParsing.parseEscape("\\u{10ABCD}", 0));
}
@Test
public void testParseUnicodePropertyTooShort() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\p{}", 0));
}
@Test
public void testParseUnicodePropertyMissingCloseBrace() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\p{1234", 0));
}
@Test
public void testParseUnicodeProperty() {
assertEquals(
new Result(Result.Type.PROPERTY, -1, IntervalSet.of(66560, 66639), 11),
EscapeSequenceParsing.parseEscape("\\p{Deseret}", 0));
}
@Test
public void testParseUnicodePropertyInvertedTooShort() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\P{}", 0));
}
@Test
public void testParseUnicodePropertyInvertedMissingCloseBrace() {
assertEquals(
EscapeSequenceParsing.Result.INVALID,
EscapeSequenceParsing.parseEscape("\\P{Deseret", 0));
}
@Test
public void testParseUnicodePropertyInverted() {
IntervalSet expected = IntervalSet.of(0, 66559);
expected.add(66640, Character.MAX_CODE_POINT);
assertEquals(
new Result(Result.Type.PROPERTY, -1, expected, 11),
EscapeSequenceParsing.parseEscape("\\P{Deseret}", 0));
}
}

View File

@ -389,12 +389,13 @@ public class TestSymbolIssues extends BaseJavaToolTest {
"TOKEN_RANGE: [aa-f];\n" +
"TOKEN_RANGE_2: [A-FD-J];\n" +
"TOKEN_RANGE_3: 'Z' | 'K'..'R' | 'O'..'V';\n" +
"TOKEN_RANGE_4: 'g'..'l' | [g-l];\n", // Handling in ATNOptimizer.
"TOKEN_RANGE_4: 'g'..'l' | [g-l];\n" +
"TOKEN_RANGE_WITHOUT_COLLISION: '_' | [a-zA-Z];",
"warning(" + ErrorType.CHARACTERS_COLLISION_IN_SET.code + "): L.g4:2:18: chars \"a-f\" used multiple times in set [aa-f]\n" +
"warning(" + ErrorType.CHARACTERS_COLLISION_IN_SET.code + "): L.g4:3:18: chars \"D-J\" used multiple times in set [A-FD-J]\n" +
"warning(" + ErrorType.CHARACTERS_COLLISION_IN_SET.code + "): L.g4:4:13: chars \"O-V\" used multiple times in set 'Z' | 'K'..'R' | 'O'..'V'\n" +
"warning(" + ErrorType.CHARACTERS_COLLISION_IN_SET.code + "): L.g4::: chars \"g-l\" used multiple times in set [g-l]\n"
"warning(" + ErrorType.CHARACTERS_COLLISION_IN_SET.code + "): L.g4::: chars \"g\" used multiple times in set {'g'..'l'}\n"
};
testErrors(test, false);

View File

@ -439,7 +439,7 @@ public class TestToolSyntaxErrors extends BaseJavaToolTest {
@Test public void testValidEscapeSequences() {
String grammar =
"lexer grammar A;\n" +
"NORMAL_ESCAPE : '\\b \\t \\n \\f \\r \\\" \\' \\\\';\n" +
"NORMAL_ESCAPE : '\\b \\t \\n \\f \\r \\' \\\\';\n" +
"UNICODE_ESCAPE : '\\u0001 \\u00A1 \\u00a1 \\uaaaa \\uAAAA';\n";
String expected =
"";
@ -462,9 +462,9 @@ public class TestToolSyntaxErrors extends BaseJavaToolTest {
"lexer grammar A;\n" +
"RULE : 'Foo \\uAABG \\x \\u';\n";
String expected =
"error(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): A.g4:2:12: invalid escape sequence\n" +
"error(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): A.g4:2:19: invalid escape sequence\n" +
"error(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): A.g4:2:22: invalid escape sequence\n";
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): A.g4:2:12: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): A.g4:2:19: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): A.g4:2:22: invalid escape sequence\n";
String[] pair = new String[] {
grammar,
@ -501,25 +501,66 @@ public class TestToolSyntaxErrors extends BaseJavaToolTest {
super.testErrors(pair, true);
}
@Test public void testInvalidCharSetAndRange() {
@Test public void testInvalidCharSetsAndStringLiterals() {
String grammar =
"lexer grammar Test;\n" +
"INVALID_RANGE: 'GH'..'LM';\n" +
"INVALID_RANGE_2: 'F'..'A' | 'Z';\n" +
"VALID_STRING_LITERALS: '\\u1234' | '\\t' | [\\-\\]];\n" +
"INVALID_CHAR_SET: [f-az][];\n" +
"INVALID_CHAR_SET_2: [\\u24\\uA2][\\u24];\n" + //https://github.com/antlr/antlr4/issues/1077
"INVALID_CHAR_SET_3: [\\t\\{];";
"INVALID_STRING_LITERAL: '\\\"' | '\\]' | '\\u24';\n" +
"INVALID_STRING_LITERAL_RANGE: 'GH'..'LM';\n" +
"INVALID_CHAR_SET: [\\u24\\uA2][\\{];\n" + //https://github.com/antlr/antlr4/issues/1077
"EMPTY_STRING_LITERAL_RANGE: 'F'..'A' | 'Z';\n" +
"EMPTY_CHAR_SET: [f-az][];\n" +
"VALID_STRING_LITERALS: '\\u1234' | '\\t' | '\\'';\n" +
"VALID_CHAR_SET: [`\\-=\\]];";
String expected =
"error(" + ErrorType.INVALID_LITERAL_IN_LEXER_SET.code + "): Test.g4:2:23: multi-character literals are not allowed in lexer sets: 'GH'\n" +
"error(" + ErrorType.INVALID_LITERAL_IN_LEXER_SET.code + "): Test.g4:2:29: multi-character literals are not allowed in lexer sets: 'LM'\n" +
"error(" + ErrorType.EMPTY_STRINGS_AND_SETS_NOT_ALLOWED.code + "): Test.g4:3:26: string literals and sets cannot be empty: 'F'..'A'\n" +
"error(" + ErrorType.EMPTY_STRINGS_AND_SETS_NOT_ALLOWED.code + "): Test.g4:5:23: string literals and sets cannot be empty: [f-a]\n" +
"error(" + ErrorType.EMPTY_STRINGS_AND_SETS_NOT_ALLOWED.code + "): Test.g4:5:29: string literals and sets cannot be empty: []\n" +
"error(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:6:23: invalid escape sequence\n" +
"error(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:6:33: invalid escape sequence\n" +
"error(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:7:23: invalid escape sequence\n";
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:2:31: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:2:38: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:2:45: invalid escape sequence\n" +
"error(" + ErrorType.INVALID_LITERAL_IN_LEXER_SET.code + "): Test.g4:3:30: multi-character literals are not allowed in lexer sets: 'GH'\n" +
"error(" + ErrorType.INVALID_LITERAL_IN_LEXER_SET.code + "): Test.g4:3:36: multi-character literals are not allowed in lexer sets: 'LM'\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:4:30: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:4:40: invalid escape sequence\n" +
"error(" + ErrorType.EMPTY_STRINGS_AND_SETS_NOT_ALLOWED.code + "): Test.g4:5:33: string literals and sets cannot be empty: 'F'..'A'\n" +
"error(" + ErrorType.EMPTY_STRINGS_AND_SETS_NOT_ALLOWED.code + "): Test.g4:6:30: string literals and sets cannot be empty: [f-a]\n" +
"error(" + ErrorType.EMPTY_STRINGS_AND_SETS_NOT_ALLOWED.code + "): Test.g4:6:36: string literals and sets cannot be empty: []\n";
String[] pair = new String[] {
grammar,
expected
};
super.testErrors(pair, true);
}
@Test public void testInvalidUnicodeEscapesInCharSet() {
String grammar =
"lexer grammar Test;\n" +
"INVALID_EXTENDED_UNICODE_EMPTY: [\\u{}];\n" +
"INVALID_EXTENDED_UNICODE_NOT_TERMINATED: [\\u{];\n" +
"INVALID_EXTENDED_UNICODE_TOO_LONG: [\\u{110000}];\n" +
"INVALID_UNICODE_PROPERTY_EMPTY: [\\p{}];\n" +
"INVALID_UNICODE_PROPERTY_NOT_TERMINATED: [\\p{];\n" +
"INVALID_INVERTED_UNICODE_PROPERTY_EMPTY: [\\P{}];\n" +
"INVALID_UNICODE_PROPERTY_UNKNOWN: [\\p{NotAProperty}];\n" +
"INVALID_INVERTED_UNICODE_PROPERTY_UNKNOWN: [\\P{NotAProperty}];\n" +
"UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE: [\\p{Uppercase_Letter}-\\p{Lowercase_Letter}];\n" +
"UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE_2: [\\p{Letter}-Z];\n" +
"UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE_3: [A-\\p{Number}];\n" +
"INVERTED_UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE: [\\P{Uppercase_Letter}-\\P{Number}];\n";
String expected =
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:2:32: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:3:41: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:4:35: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:5:32: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:6:41: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:7:41: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:8:34: invalid escape sequence\n" +
"warning(" + ErrorType.INVALID_ESCAPE_SEQUENCE.code + "): Test.g4:9:43: invalid escape sequence\n" +
"error(" + ErrorType.UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE.code + "): Test.g4:10:39: unicode property escapes not allowed in lexer charset range: [\\p{Uppercase_Letter}-\\p{Lowercase_Letter}]\n" +
"error(" + ErrorType.UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE.code + "): Test.g4:11:41: unicode property escapes not allowed in lexer charset range: [\\p{Letter}-Z]\n" +
"error(" + ErrorType.UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE.code + "): Test.g4:12:41: unicode property escapes not allowed in lexer charset range: [A-\\p{Number}]\n" +
"error(" + ErrorType.UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE.code + "): Test.g4:13:48: unicode property escapes not allowed in lexer charset range: [\\P{Uppercase_Letter}-\\P{Number}]\n";
String[] pair = new String[] {
grammar,

View File

@ -108,6 +108,20 @@ public class TestUnicodeData {
assertTrue(UnicodeData.getPropertyCodePoints("Cyrillic").contains(0x0404));
}
@Test
public void testUnicodeBlocks() {
assertTrue(UnicodeData.getPropertyCodePoints("InASCII").contains('0'));
assertTrue(UnicodeData.getPropertyCodePoints("InCJK").contains(0x4E04));
assertTrue(UnicodeData.getPropertyCodePoints("InCyrillic").contains(0x0404));
assertTrue(UnicodeData.getPropertyCodePoints("InMisc_Pictographs").contains(0x1F4A9));
}
@Test
public void testUnicodeBlockAliases() {
assertTrue(UnicodeData.getPropertyCodePoints("InBasic_Latin").contains('0'));
assertTrue(UnicodeData.getPropertyCodePoints("InMiscellaneous_Mathematical_Symbols_B").contains(0x29BE));
}
@Test
public void testPropertyCaseInsensitivity() {
assertTrue(UnicodeData.getPropertyCodePoints("l").contains('x'));
@ -116,6 +130,11 @@ public class TestUnicodeData {
assertTrue(UnicodeData.getPropertyCodePoints("Alnum").contains('0'));
}
@Test
public void testPropertyDashSameAsUnderscore() {
assertTrue(UnicodeData.getPropertyCodePoints("InLatin-1").contains('\u00F0'));
}
@Test
public void modifyingUnicodeDataShouldThrow() {
thrown.expect(IllegalStateException.class);

View File

@ -98,17 +98,24 @@ public class ATNOptimizer {
if (matchTransition instanceof NotSetTransition) {
throw new UnsupportedOperationException("Not yet implemented.");
}
IntervalSet set = matchTransition.label();
int minElem = set.getMinElement();
int maxElem = set.getMaxElement();
for (int k = minElem; k <= maxElem; k++) {
if (matchSet.contains(k)) {
// TODO: Token is missing (i.e. position in source will not be displayed).
g.tool.errMgr.grammarError(ErrorType.CHARACTERS_COLLISION_IN_SET, g.fileName,
null,
CharSupport.toRange(minElem, maxElem, CharSupport.ToRangeMode.NOT_BRACKETED),
CharSupport.toRange(set.getMinElement(), set.getMaxElement(), CharSupport.ToRangeMode.BRACKETED));
break;
IntervalSet set = matchTransition.label();
List<Interval> intervals = set.getIntervals();
int n = intervals.size();
for (int k = 0; k < n; k++) {
Interval setInterval = intervals.get(k);
int a = setInterval.a;
int b = setInterval.b;
if (a != -1 && b != -1) {
for (int v = a; v <= b; v++) {
if (matchSet.contains(v)) {
// TODO: Token is missing (i.e. position in source will not be displayed).
g.tool.errMgr.grammarError(ErrorType.CHARACTERS_COLLISION_IN_SET, g.fileName,
null,
String.valueOf(Character.toChars(v)),
matchSet.toString(true));
break;
}
}
}
}
matchSet.addAll(set);

View File

@ -10,6 +10,7 @@ import org.antlr.runtime.CommonToken;
import org.antlr.runtime.Token;
import org.antlr.v4.codegen.CodeGenerator;
import org.antlr.v4.misc.CharSupport;
import org.antlr.v4.misc.EscapeSequenceParsing;
import org.antlr.v4.parse.ANTLRParser;
import org.antlr.v4.runtime.IntStream;
import org.antlr.v4.runtime.Lexer;
@ -28,7 +29,6 @@ import org.antlr.v4.runtime.atn.LexerPushModeAction;
import org.antlr.v4.runtime.atn.LexerSkipAction;
import org.antlr.v4.runtime.atn.LexerTypeAction;
import org.antlr.v4.runtime.atn.NotSetTransition;
import org.antlr.v4.runtime.atn.RangeTransition;
import org.antlr.v4.runtime.atn.RuleStartState;
import org.antlr.v4.runtime.atn.SetTransition;
import org.antlr.v4.runtime.atn.TokensStartState;
@ -49,6 +49,7 @@ import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Set;
public class LexerATNFactory extends ParserATNFactory {
@ -303,7 +304,8 @@ public class LexerATNFactory extends ParserATNFactory {
if (set.getIntervals().size() == 1) {
Interval interval = set.getIntervals().get(0);
transition = CodePointTransitions.createWithCodePointRange(right, interval.a, interval.b);
} else {
}
else {
transition = new SetTransition(right, set);
}
@ -365,7 +367,7 @@ public class LexerATNFactory extends ParserATNFactory {
return new Handle(left, right);
}
/** [Aa\t \u1234a-z\]\-] char sets */
/** [Aa\t \u1234a-z\]\p{Letter}\-] char sets */
@Override
public Handle charSetLiteral(GrammarAST charSetAST) {
ATNState left = newState(charSetAST);
@ -376,10 +378,68 @@ public class LexerATNFactory extends ParserATNFactory {
return new Handle(left, right);
}
private static class CharSetParseState {
enum Mode {
NONE,
ERROR,
PREV_CODE_POINT,
PREV_PROPERTY
}
public static final CharSetParseState NONE = new CharSetParseState(Mode.NONE, false, -1, IntervalSet.EMPTY_SET);
public static final CharSetParseState ERROR = new CharSetParseState(Mode.ERROR, false, -1, IntervalSet.EMPTY_SET);
public final Mode mode;
public final boolean inRange;
public final int prevCodePoint;
public final IntervalSet prevProperty;
public CharSetParseState(
Mode mode,
boolean inRange,
int prevCodePoint,
IntervalSet prevProperty) {
this.mode = mode;
this.inRange = inRange;
this.prevCodePoint = prevCodePoint;
this.prevProperty = prevProperty;
}
@Override
public String toString() {
return String.format(
"%s mode=%s inRange=%s prevCodePoint=%d prevProperty=%s",
super.toString(),
mode,
inRange,
prevCodePoint,
prevProperty);
}
@Override
public boolean equals(Object other) {
if (!(other instanceof CharSetParseState)) {
return false;
}
CharSetParseState that = (CharSetParseState) other;
if (this == that) {
return true;
}
return Objects.equals(this.mode, that.mode) &&
Objects.equals(this.inRange, that.inRange) &&
Objects.equals(this.prevCodePoint, that.prevCodePoint) &&
Objects.equals(this.prevProperty, that.prevProperty);
}
@Override
public int hashCode() {
return Objects.hash(mode, inRange, prevCodePoint, prevProperty);
}
}
public IntervalSet getSetFromCharSetLiteral(GrammarAST charSetAST) {
String chars = charSetAST.getText();
chars = chars.substring(1, chars.length() - 1);
String cset = '"' + chars + '"';
IntervalSet set = new IntervalSet();
if (chars.length() == 0) {
@ -387,46 +447,127 @@ public class LexerATNFactory extends ParserATNFactory {
g.fileName, charSetAST.getToken(), "[]");
return set;
}
// unescape all valid escape char like \n, leaving escaped dashes as '\-'
// so we can avoid seeing them as '-' range ops.
chars = CharSupport.getStringFromGrammarStringLiteral(cset);
if (chars == null) {
g.tool.errMgr.grammarError(ErrorType.INVALID_ESCAPE_SEQUENCE,
g.fileName, charSetAST.getToken());
return set;
}
CharSetParseState state = CharSetParseState.NONE;
int n = chars.length();
// now make x-y become set of char
for (int i = 0; i < n; ) {
if (state.mode == CharSetParseState.Mode.ERROR) {
return new IntervalSet();
}
int c = chars.codePointAt(i);
int offset = Character.charCount(c);
if (c == '\\' && i+offset < n && chars.codePointAt(i+offset) == '-') { // \-
checkSetCollision(charSetAST, set, '-');
set.add('-');
offset++;
if (c == '\\') {
EscapeSequenceParsing.Result escapeParseResult =
EscapeSequenceParsing.parseEscape(chars, i);
switch (escapeParseResult.type) {
case INVALID:
g.tool.errMgr.grammarError(ErrorType.INVALID_ESCAPE_SEQUENCE,
g.fileName, charSetAST.getToken(), charSetAST.getText());
state = CharSetParseState.ERROR;
break;
case CODE_POINT:
state = applyPrevStateAndMoveToCodePoint(charSetAST, set, state, escapeParseResult.codePoint);
break;
case PROPERTY:
state = applyPrevStateAndMoveToProperty(charSetAST, set, state, escapeParseResult.propertyIntervalSet);
break;
}
offset = escapeParseResult.parseLength;
}
else if (i+offset+1 < n && chars.codePointAt(i+offset) == '-') { // range x-y
int x = c;
int y = chars.codePointAt(i+offset+1);
if (x <= y) {
checkSetCollision(charSetAST, set, x, y);
set.add(x,y);
else if (c == '-' && !state.inRange) {
if (state.mode == CharSetParseState.Mode.PREV_PROPERTY) {
g.tool.errMgr.grammarError(ErrorType.UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE,
g.fileName, charSetAST.getToken(), charSetAST.getText());
state = CharSetParseState.ERROR;
}
else {
g.tool.errMgr.grammarError(ErrorType.EMPTY_STRINGS_AND_SETS_NOT_ALLOWED,
g.fileName, charSetAST.getToken(), CharSupport.toRange(x, y, CharSupport.ToRangeMode.BRACKETED));
state = new CharSetParseState(state.mode, true, state.prevCodePoint, state.prevProperty);
}
offset += Character.charCount(y) + 1;
}
else {
checkSetCollision(charSetAST, set, c);
set.add(c);
state = applyPrevStateAndMoveToCodePoint(charSetAST, set, state, c);
}
i += offset;
}
if (state.mode == CharSetParseState.Mode.ERROR) {
return new IntervalSet();
}
// Whether or not we were in a range, we'll add the last code point found to the set.
// If the range wasn't terminated, we'll treat it as a standalone codepoint.
applyPrevState(charSetAST, set, state);
if (state.inRange) {
// Unterminated range; add a literal hyphen to the set.
checkSetCollision(charSetAST, set, '-');
set.add('-');
}
return set;
}
private CharSetParseState applyPrevStateAndMoveToCodePoint(
GrammarAST charSetAST,
IntervalSet set,
CharSetParseState state,
int codePoint) {
if (state.inRange) {
if (state.prevCodePoint > codePoint) {
g.tool.errMgr.grammarError(
ErrorType.EMPTY_STRINGS_AND_SETS_NOT_ALLOWED,
g.fileName,
charSetAST.getToken(),
CharSupport.toRange(state.prevCodePoint, codePoint, CharSupport.ToRangeMode.BRACKETED));
}
checkSetCollision(charSetAST, set, state.prevCodePoint, codePoint);
set.add(state.prevCodePoint, codePoint);
state = CharSetParseState.NONE;
}
else {
applyPrevState(charSetAST, set, state);
state = new CharSetParseState(
CharSetParseState.Mode.PREV_CODE_POINT,
false,
codePoint,
IntervalSet.EMPTY_SET);
}
return state;
}
private CharSetParseState applyPrevStateAndMoveToProperty(
GrammarAST charSetAST,
IntervalSet set,
CharSetParseState state,
IntervalSet property) {
if (state.inRange) {
g.tool.errMgr.grammarError(ErrorType.UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE,
g.fileName, charSetAST.getToken(), charSetAST.getText());
return CharSetParseState.ERROR;
}
else {
applyPrevState(charSetAST, set, state);
state = new CharSetParseState(
CharSetParseState.Mode.PREV_PROPERTY,
false,
-1,
property);
}
return state;
}
private void applyPrevState(GrammarAST charSetAST, IntervalSet set, CharSetParseState state) {
switch (state.mode) {
case NONE:
case ERROR:
break;
case PREV_CODE_POINT:
checkSetCollision(charSetAST, set, state.prevCodePoint);
set.add(state.prevCodePoint);
break;
case PREV_PROPERTY:
set.addAll(state.prevProperty);
break;
}
}
protected void checkSetCollision(GrammarAST ast, IntervalSet set, int el) {
if (set.contains(el)) {
g.tool.errMgr.grammarError(ErrorType.CHARACTERS_COLLISION_IN_SET, g.fileName, ast.getToken(),
@ -543,23 +684,30 @@ public class LexerATNFactory extends ParserATNFactory {
if (command.equals("skip")) {
if (ruleCommands.contains("more")) {
firstCommand = "more";
} else if (ruleCommands.contains("type")) {
}
else if (ruleCommands.contains("type")) {
firstCommand = "type";
} else if (ruleCommands.contains("channel")) {
}
else if (ruleCommands.contains("channel")) {
firstCommand = "channel";
}
} else if (command.equals("more")) {
}
else if (command.equals("more")) {
if (ruleCommands.contains("skip")) {
firstCommand = "skip";
} else if (ruleCommands.contains("type")) {
}
else if (ruleCommands.contains("type")) {
firstCommand = "type";
} else if (ruleCommands.contains("channel")) {
}
else if (ruleCommands.contains("channel")) {
firstCommand = "channel";
}
} else if (command.equals("type") || command.equals("channel")) {
}
else if (command.equals("type") || command.equals("channel")) {
if (ruleCommands.contains("more")) {
firstCommand = "more";
} else if (ruleCommands.contains("skip")) {
}
else if (ruleCommands.contains("skip")) {
firstCommand = "skip";
}
}

View File

@ -45,7 +45,8 @@ public class TailEpsilonRemover extends ATNVisitor {
// skip over q
if (p.transition(0) instanceof RuleTransition) {
((RuleTransition) p.transition(0)).followState = r;
} else {
}
else {
p.transition(0).target = r;
}
_atn.removeState(q);

View File

@ -221,7 +221,6 @@ public abstract class Target {
switch (escapedCodePoint) {
// Pass through any escapes that Java also needs
//
case '"':
case 'n':
case 'r':
case 't':
@ -239,7 +238,8 @@ public abstract class Target {
toAdvance++;
}
toAdvance++;
} else {
}
else {
toAdvance += 4;
}
String fullEscape = is.substring(i, i + toAdvance);
@ -250,19 +250,23 @@ public abstract class Target {
default:
if (shouldUseUnicodeEscapeForCodePointInDoubleQuotedString(escapedCodePoint)) {
appendUnicodeEscapedCodePoint(escapedCodePoint, sb);
} else {
}
else {
sb.appendCodePoint(escapedCodePoint);
}
break;
}
} else {
}
else {
if (codePoint == 0x22) {
// ANTLR doesn't escape " in literal strings,
// but every other language needs to do so.
sb.append("\\\"");
} else if (shouldUseUnicodeEscapeForCodePointInDoubleQuotedString(codePoint)) {
}
else if (shouldUseUnicodeEscapeForCodePointInDoubleQuotedString(codePoint)) {
appendUnicodeEscapedCodePoint(codePoint, sb);
} else {
}
else {
sb.appendCodePoint(codePoint);
}
}

View File

@ -17,7 +17,8 @@ public abstract class UnicodeEscapes {
// to int before passing to the %X formatter or else it throws.
sb.append(String.format("\\u%04X", (int)Character.highSurrogate(codePoint)));
sb.append(String.format("\\u%04X", (int)Character.lowSurrogate(codePoint)));
} else {
}
else {
sb.append(String.format("\\u%04X", codePoint));
}
}
@ -25,7 +26,8 @@ public abstract class UnicodeEscapes {
static public void appendPythonStyleEscapedCodePoint(int codePoint, StringBuilder sb) {
if (Character.isSupplementaryCodePoint(codePoint)) {
sb.append(String.format("\\U%08X", codePoint));
} else {
}
else {
sb.append(String.format("\\u%04X", codePoint));
}
}

View File

@ -40,9 +40,11 @@ public class CSharpTarget extends Target {
String formatted;
if (v >= 0 && v < targetCharValueEscape.length && targetCharValueEscape[v] != null) {
formatted = targetCharValueEscape[v];
} else if (v >= 0x20 && v < 127 && (v < '0' || v > '9') && (v < 'a' || v > 'f') && (v < 'A' || v > 'F')) {
}
else if (v >= 0x20 && v < 127 && (v < '0' || v > '9') && (v < 'a' || v > 'f') && (v < 'A' || v > 'F')) {
formatted = Character.toString((char)v);
} else {
}
else {
formatted = String.format("\\x%X", v & 0xFFFF);
}

View File

@ -124,7 +124,8 @@ public class SwiftTarget extends Target {
if (g.isLexer() && lexerAtnJSON == null) {
lexerAtnJSON = getLexerOrParserATNJson(g, fileName);
} else if (!g.isLexer() && parserAtnJSON == null && g.atn != null) {
}
else if (!g.isLexer() && parserAtnJSON == null && g.atn != null) {
parserAtnJSON = getLexerOrParserATNJson(g, fileName);
}

View File

@ -93,7 +93,8 @@ public class GraphicsSupport {
job.print(doc, attributes);
out.close();
}
} else {
}
else {
// parrt: works with [image/jpeg, image/png, image/x-png, image/vnd.wap.wbmp, image/bmp, image/gif]
Rectangle rect = comp.getBounds();
BufferedImage image = new BufferedImage(rect.width, rect.height,

View File

@ -20,11 +20,8 @@ import org.antlr.v4.runtime.TokenStream;
import org.antlr.v4.runtime.atn.PredictionMode;
import javax.print.PrintException;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.lang.reflect.Constructor;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
@ -163,10 +160,11 @@ public class TestRig {
CharStream charStream;
if ( charset.equals(StandardCharsets.UTF_8)) {
charStream = CharStreams.createWithUTF8Stream(System.in);
} else {
}
else {
try ( InputStreamReader r = new InputStreamReader(System.in, charset) ) {
charStream = new ANTLRInputStream(r);
}
}
}
process(lexer, parserClass, parser, charStream);
return;
@ -175,10 +173,11 @@ public class TestRig {
CharStream charStream;
if ( charset.equals(StandardCharsets.UTF_8) ) {
charStream = CharStreams.createWithUTF8(Paths.get(inputFile));
} else {
}
else {
try ( InputStreamReader r = new InputStreamReader(System.in, charset) ) {
charStream = new ANTLRInputStream(r);
}
}
}
if ( inputFiles.size()>1 ) {
System.err.println(inputFile);

View File

@ -159,7 +159,8 @@ public class TreeViewer extends JComponent {
double ctrly2 = y1;
c.setCurve(x1, y1, ctrlx1, ctrly1, ctrlx2, ctrly2, x2, y2);
((Graphics2D) g).draw(c);
} else {
}
else {
g.drawLine((int) x1, (int) y1,
(int) x2, (int) y2);
}

View File

@ -31,17 +31,12 @@ public class CharSupport {
ANTLRLiteralEscapedCharValue['b'] = '\b';
ANTLRLiteralEscapedCharValue['f'] = '\f';
ANTLRLiteralEscapedCharValue['\\'] = '\\';
ANTLRLiteralEscapedCharValue['\''] = '\'';
ANTLRLiteralEscapedCharValue['"'] = '"';
ANTLRLiteralEscapedCharValue['-'] = '-';
ANTLRLiteralEscapedCharValue[']'] = ']';
ANTLRLiteralCharValueEscape['\n'] = "\\n";
ANTLRLiteralCharValueEscape['\r'] = "\\r";
ANTLRLiteralCharValueEscape['\t'] = "\\t";
ANTLRLiteralCharValueEscape['\b'] = "\\b";
ANTLRLiteralCharValueEscape['\f'] = "\\f";
ANTLRLiteralCharValueEscape['\\'] = "\\\\";
ANTLRLiteralCharValueEscape['\''] = "\\'";
}
/** Return a string representing the escaped char for code c. E.g., If c
@ -68,7 +63,8 @@ public class CharSupport {
}
if (c <= 0xFFFF) {
return String.format("\\u%04X", c);
} else {
}
else {
return String.format("\\u{%06X}", c);
}
}
@ -103,7 +99,8 @@ public class CharSupport {
return null; // invalid escape sequence.
}
}
} else {
}
else {
for (end = i + 2; end < i + 6; end++) {
if ( end>n ) return null; // invalid escape sequence.
char charAt = literal.charAt(end);
@ -137,10 +134,10 @@ public class CharSupport {
case 2:
if ( cstr.charAt(0)!='\\' ) return -1;
// '\x' (antlr lexer will catch invalid char)
if ( Character.isDigit(cstr.charAt(1)) ) return -1;
int escChar = cstr.charAt(1);
char escChar = cstr.charAt(1);
if (escChar == '\'') return escChar; // escape quote only in string literals.
int charVal = ANTLRLiteralEscapedCharValue[escChar];
if ( charVal==0 ) return -1;
if (charVal == 0) return -1;
return charVal;
case 6:
// '\\u1234' or '\\u{12}'
@ -150,7 +147,8 @@ public class CharSupport {
if ( cstr.charAt(2) == '{' ) {
startOff = 3;
endOff = cstr.indexOf('}');
} else {
}
else {
startOff = 2;
endOff = cstr.length();
}
@ -163,18 +161,18 @@ public class CharSupport {
}
}
private static int parseHexValue(String cstr, int startOff, int endOff) {
public static int parseHexValue(String cstr, int startOff, int endOff) {
if (startOff < 0 || endOff < 0) {
return -1;
}
String unicodeChars = cstr.substring(startOff, endOff);
int result = -1;
try {
result = Integer.parseInt(unicodeChars, 16);
}
catch (NumberFormatException e) {
}
return result;
int result = -1;
try {
result = Integer.parseInt(unicodeChars, 16);
}
catch (NumberFormatException e) {
}
return result;
}
public static String capitalize(String s) {

View File

@ -0,0 +1,169 @@
/*
* Copyright (c) 2012-2017 The ANTLR Project. All rights reserved.
* Use of this file is governed by the BSD 3-clause license that
* can be found in the LICENSE.txt file in the project root.
*/
package org.antlr.v4.misc;
import org.antlr.v4.runtime.misc.IntervalSet;
import org.antlr.v4.unicode.UnicodeData;
import java.util.Objects;
/**
* Utility class to parse escapes like:
* \\n
* \\uABCD
* \\u{10ABCD}
* \\p{Foo}
* \\P{Bar}
*/
public abstract class EscapeSequenceParsing {
public static class Result {
public enum Type {
INVALID,
CODE_POINT,
PROPERTY
};
public static Result INVALID = new Result(Type.INVALID, -1, IntervalSet.EMPTY_SET, -1);
public final Type type;
public final int codePoint;
public final IntervalSet propertyIntervalSet;
public final int parseLength;
public Result(Type type, int codePoint, IntervalSet propertyIntervalSet, int parseLength) {
this.type = type;
this.codePoint = codePoint;
this.propertyIntervalSet = propertyIntervalSet;
this.parseLength = parseLength;
}
@Override
public String toString() {
return String.format(
"%s type=%s codePoint=%d propertyIntervalSet=%s parseLength=%d",
super.toString(),
type,
codePoint,
propertyIntervalSet,
parseLength);
}
@Override
public boolean equals(Object other) {
if (!(other instanceof Result)) {
return false;
}
Result that = (Result) other;
if (this == that) {
return true;
}
return Objects.equals(this.type, that.type) &&
Objects.equals(this.codePoint, that.codePoint) &&
Objects.equals(this.propertyIntervalSet, that.propertyIntervalSet) &&
Objects.equals(this.parseLength, that.parseLength);
}
@Override
public int hashCode() {
return Objects.hash(type, codePoint, propertyIntervalSet, parseLength);
}
}
/**
* Parses a single escape sequence starting at {@code startOff}.
*
* Returns {@link Result#INVALID} if no valid escape sequence was found, a Result otherwise.
*/
public static Result parseEscape(String s, int startOff) {
int offset = startOff;
if (offset + 2 > s.length() || s.codePointAt(offset) != '\\') {
return Result.INVALID;
}
// Move past backslash
offset++;
int escaped = s.codePointAt(offset);
// Move past escaped code point
offset += Character.charCount(escaped);
if (escaped == 'u') {
// \\u{1} is the shortest we support
if (offset + 3 > s.length()) {
return Result.INVALID;
}
int hexStartOffset;
int hexEndOffset;
if (s.codePointAt(offset) == '{') {
hexStartOffset = offset + 1;
hexEndOffset = s.indexOf('}', hexStartOffset);
if (hexEndOffset == -1) {
return Result.INVALID;
}
offset = hexEndOffset + 1;
}
else {
if (offset + 4 > s.length()) {
return Result.INVALID;
}
hexStartOffset = offset;
hexEndOffset = offset + 4;
offset = hexEndOffset;
}
int codePointValue = CharSupport.parseHexValue(s, hexStartOffset, hexEndOffset);
if (codePointValue == -1 || codePointValue > Character.MAX_CODE_POINT) {
return Result.INVALID;
}
return new Result(
Result.Type.CODE_POINT,
codePointValue,
IntervalSet.EMPTY_SET,
offset - startOff);
}
else if (escaped == 'p' || escaped == 'P') {
// \p{L} is the shortest we support
if (offset + 3 > s.length() || s.codePointAt(offset) != '{') {
return Result.INVALID;
}
int openBraceOffset = offset;
int closeBraceOffset = s.indexOf('}', openBraceOffset);
if (closeBraceOffset == -1) {
return Result.INVALID;
}
String propertyName = s.substring(openBraceOffset + 1, closeBraceOffset);
IntervalSet propertyIntervalSet = UnicodeData.getPropertyCodePoints(propertyName);
if (propertyIntervalSet == null) {
return Result.INVALID;
}
offset = closeBraceOffset + 1;
if (escaped == 'P') {
propertyIntervalSet = propertyIntervalSet.complement(IntervalSet.COMPLETE_CHAR_SET);
}
return new Result(
Result.Type.PROPERTY,
-1,
propertyIntervalSet,
offset - startOff);
}
else if (escaped < CharSupport.ANTLRLiteralEscapedCharValue.length) {
int codePoint = CharSupport.ANTLRLiteralEscapedCharValue[escaped];
if (codePoint == 0) {
if (escaped != ']' && escaped != '-') { // escape ']' and '-' only in char sets.
return Result.INVALID;
}
else {
codePoint = escaped;
}
}
return new Result(
Result.Type.CODE_POINT,
codePoint,
IntervalSet.EMPTY_SET,
offset - startOff);
}
else {
return Result.INVALID;
}
}
}

View File

@ -122,7 +122,8 @@ public class Utils {
public static void setSize(List<?> list, int size) {
if (size < list.size()) {
list.subList(size, list.size()).clear();
} else {
}
else {
while (size > list.size()) {
list.add(null);
}

View File

@ -646,7 +646,7 @@ ESC_SEQ
// The standard escaped character set such as tab, newline,
// etc.
//
'b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\'
'b'|'t'|'n'|'f'|'r'|'\''|'\\'
| // A Java style Unicode escape sequence
//
@ -664,6 +664,9 @@ ESC_SEQ
t.setLine(input.getLine());
t.setCharPositionInLine(input.getCharPositionInLine()-1);
grammarError(ErrorType.INVALID_ESCAPE_SEQUENCE, t);
if ( state.text==null ) {
setText(input.substring(state.tokenStartCharIndex, getCharIndex()-2));
}
}
)
;
@ -720,6 +723,9 @@ UNICODE_ESC
t.setLine(input.getLine());
t.setCharPositionInLine(input.getCharPositionInLine()-hCount-2);
grammarError(ErrorType.INVALID_ESCAPE_SEQUENCE, t);
if ( state.text==null ) {
setText(input.substring(state.tokenStartCharIndex, getCharIndex()-hCount-3));
}
}
}
;
@ -741,7 +747,10 @@ UNICODE_EXTENDED_ESC
t.setLine(input.getLine());
t.setCharPositionInLine(input.getCharPositionInLine()-numDigits);
grammarError(ErrorType.INVALID_ESCAPE_SEQUENCE, t);
}
if ( state.text==null ) {
setText(input.substring(state.tokenStartCharIndex, getCharIndex()-numDigits-3));
}
}
}
;

View File

@ -312,7 +312,8 @@ public class ScopeParser {
// do we see a matching '>' ahead? if so, hope it's a generic
// and not less followed by expr with greater than
p = _splitArgumentList(actionText, p + 1, '>', separatorChar, args);
} else {
}
else {
p++; // treat as normal char
}
break;

View File

@ -17,20 +17,20 @@ import org.antlr.v4.tool.ErrorManager;
import org.antlr.v4.tool.ErrorType;
import org.antlr.v4.tool.Grammar;
import org.antlr.v4.tool.LabelElementPair;
import org.antlr.v4.tool.LabelType;
import org.antlr.v4.tool.LeftRecursiveRule;
import org.antlr.v4.tool.LexerGrammar;
import org.antlr.v4.tool.Rule;
import org.antlr.v4.tool.ast.AltAST;
import org.antlr.v4.tool.ast.GrammarAST;
import org.antlr.v4.tool.LabelType;
import org.antlr.v4.tool.LeftRecursiveRule;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.ArrayList;
/** Check for symbol problems; no side-effects. Inefficient to walk rules
* and such multiple times, but I like isolating all error checking outside
@ -96,7 +96,8 @@ public class SymbolChecks {
nameNode = (GrammarAST) ampersandAST.getChild(0);
if (ampersandAST.getChildCount() == 2) {
name = nameNode.getText();
} else {
}
else {
scope = nameNode.getText();
name = ampersandAST.getChild(1).getText();
}
@ -107,7 +108,8 @@ public class SymbolChecks {
}
if (!scopeActions.contains(name)) {
scopeActions.add(name);
} else {
}
else {
errMgr.grammarError(ErrorType.ACTION_REDEFINITION,
g.fileName, nameNode.token, name);
}
@ -145,7 +147,8 @@ public class SymbolChecks {
List<LabelElementPair> list;
if (labelPairs.containsKey(labelName)) {
list = labelPairs.get(labelName);
} else {
}
else {
list = new ArrayList<>();
labelPairs.put(labelName, list);
}

View File

@ -289,7 +289,8 @@ public class DOTGenerator {
edgeST.add("arrowhead", arrowhead);
if (s.getNumberOfTransitions() > 1) {
edgeST.add("transitionIndex", i);
} else {
}
else {
edgeST.add("transitionIndex", false);
}
dot.add("edges", edgeST);

View File

@ -824,7 +824,7 @@ public enum ErrorType {
*
* @since 4.2.1
*/
INVALID_ESCAPE_SEQUENCE(156, "invalid escape sequence", ErrorSeverity.ERROR),
INVALID_ESCAPE_SEQUENCE(156, "invalid escape sequence", ErrorSeverity.WARNING),
/**
* Compiler Warning 157.
*
@ -1060,6 +1060,20 @@ public enum ErrorType {
*/
TOKEN_RANGE_IN_PARSER(181, "token ranges not allowed in parser: <arg>..<arg2>", ErrorSeverity.ERROR),
/**
* Compiler Error 182.
*
* <p>Unicode properties cannot be part of a lexer charset range</p>
*
* <pre>
* A: [\\p{Letter}-\\p{Number}];
* </pre>
*/
UNICODE_PROPERTY_NOT_ALLOWED_IN_RANGE(
182,
"unicode property escapes not allowed in lexer charset range: <arg>",
ErrorSeverity.ERROR),
/*
* Backward incompatibility errors
*/