BailErrorStrategy is supposed to throw an error that's different from
the ordinary recognition error, specifically so that it can be handled
differently by client code. This was not ported over from Java correctly.
Fix this by moving parseCancellation from ANTLRError to ANTLRException,
adding its RecognitionException argument, and throwing it from the
two handlers in BailErrorStrategy.
Also remove ANTLRException.cannotInvokeStartRule, which is unused.
(The Java runtime uses it when ParseTreePatternMatcher throws a generic
exception, but we don't have that.)
Remove pointless do block from LexerATNSimulator. This is a translation
from Java of a try/finally block, but we have the finally clause in a
defer block so we don't need the do block.
Change the initializer to ANTLRFileStream so that it throws any errors that
occur while reading the file. Previously, it was just dropping any errors on
the floor (inside Utils.readFile).
Remove Utils.readFile, it's not used anywhere else.
Fix initialization of {Lexer,Parser}Interpreter.decisionToDFA. These
were always being created as empty arrays, which would never work.
I don't know if anyone's using this code; presumably not.
Remove unused ATN.modeNameToStartState. In the Java runtime this is
only used by LexerATNFactory (i.e. during lexer generation) and we don't
have the equivalent in the Swift runtime at all.
Remove Recognizer.tokenTypeMapCache and .ruleIndexMapCache. These
were easily replaced in Swift with lazy vars. The input to these
two caches are fixed fields on the Recognizer (the Vocabulary and
rule names respectively) so a lazy var suffices.
Note that these differed compared with the Java runtime -- they are
declared as static in Java and therefore the caches are shared across
all recognizer instances, but the Swift runtime had them as Recognizer
instance variables, which meant that at most we had a cache with one
entry which got destroyed along with the parser. Regardless, using
lazy vars is still simpler.
This removes the only usage of ArrayWrapper in the Swift runtime, so
delete that too.
Make DFA.precedenceDfa be a "let" rather than a "var", and remove
setPrecedenceDfa. This field never varies after construction. The
code in setPrecedenceDfa was carried over from the Java runtime, but
it only threw an exception, and was deprecated. There's no need for
that in the Swift runtime.
Replace IntervalSet.setReadonly(Bool) with makeReadonly(). This
operation only ever works in one direction, and would throw an exception
if a caller attempted to make a read-only IntervalSet read-write again.
By changing the interface we remove the need to check this, and so we
don't need to declare the exception. Unlike in the Java runtime, we
need to declare the possibility of the exception at the callsite, so this
was pointlessly cluttering.
Make ParseTree, RuleNode, and TerminalNode be protocols rather than
classes. These had no useful functionality (which is not surprising,
since they are interfaces in the Java implementation) so there is
no need for them to be classes. This reduces the depth of the inheritance
tree.
Add a subscript getter to ParseTree (and corresponding implementations in
the concrete classes). This has two advantages over Tree.getChild(_: Int):
it can be declared to return ParseTree rather than Tree, and it can fault
on index-out-of-range rather than returning nil. Note that covariant
specialization of the return type is not supported through protocols in Swift
yet (https://bugs.swift.org/browse/SR-522). This means that ParseTree
cannot specialize Tree.getChild()'s return type in the way that the Java
implementation does.
Remove the return value from addChild / addErrorNode / addAnyChild.
This kind of chaining where a function returns its parameter does not fit
well with Swift's generics / protocols model.
Change ParserRuleContext.exception to be RecognitionException?
rather than AnyObject!. I don't know why it was declared that
way because the Java code uses RecognitionException.
Remove ParserRuleContext.addChild(Token) and addErrorNode(Token).
These are deprecated in the Java code and there was no need to
bring them over to the Swift runtime.
Fix ParserRuleContext.toInfoString, which was mangled when it was
ported from Java.
Various other tidyups: removal of useless type annotations, use of
if let, etc.
Remove some functions that are no longer used, and update the
rest to Swift 4's String API. lastIndexOf changes to lastIndex(of: ),
matching the standard library naming conventions, and returns a
String.Index? instead of an Int.
Add an implementation of Substring.hasPrefix for Linux; this
is in the Apple standard library but not the Linux one.
https://bugs.swift.org/browse/SR-5627
Add unit tests for StringExtension.
Bump the Swift download for the Travis Linux tests from 4.0
to 4.0.2. There is a bug in Substring.range(of:) in 4.0.0
(https://bugs.swift.org/browse/SR-5663) that we need to avoid.
In Swift 4, Strings have a set of sequence operations that we can use, so
that we don't need our String extensions. Tidy up a bunch of places where
the code has been converted from Java through Swift 3 and Swift 4, and
become a mess.
In Swift, we use description for the same thing. All these just stubbed-over
or replicated the description implementation, except for PredicateTransition
which now implements CustomStringConvertible.
parameter rather than StringBuilder.
Tidy up the rest of the class on the way through.
This is the last use of StringBuilder, so we can remove that class entirely.
Remove the uses of StringBuilder where it is simply accumulating a String
for us. In Swift we can use a var String for this; there is no need for
a StringBuilder class like in Java.
Fix the parsing inside ParseTreePatternMatcher.split. It was trivially
broken in a number of ways, with bugs that aren't in the Java version
that it was ported from, so it's obviously never been run before.
This adds unit tests for ParseTreePatternMatcher.split, and makes Chunk
implement Equatable, so that it we can compare Chunk instances in the
tests.
Tidy up the description implementations at the same time.
Remove lots of unnecessary type annotations, replace unnecessarily
complicated static initializers, and use "if let" and "guard let" to remove
lots of casting.
Bring together a couple of hundred lines of copy-paste code between
the deserialize and deserializeFromJson paths.
Fix some obvious bugs in the deserialize path. This code is entirely unused;
we use deserializeFromJson in the autogenerated parsers. I'm inclined to
remove deserialize since it was so broken, but I'm leaving it for now, in
case someone needs compatibility with ATNs from different language targets
and wants to fix it.
The implementation here before just tried to make a UUID from the empty
string.
Remove the unused UUID.toUUID. It was broken too.
Rename the file that this was in, since NSUUID and Foundation.UUID are not
the same thing.
The call to stream.read needs to use buffer.count, not buffer.capacity,
as the maxLength. Otherwise, some bytes get dropped on the floor and the
stream is corrupted.
Remove the code to pad self.data back to up to its previous capacity when
copying data at the end of release. This came over from the Java port, but
I don't think it makes sense in Swift, given the copy-on-write Array
value semantics. Instead, just copy the tail of the buffer if there is
anything left to read (i.e. self.data gets smaller) and when there is nothing
in the buffer to read, reset to the specified bufferSize (i.e. self.data
goes back to the specified self.bufferSize.
Remove debug print statement that was accidentally left in.
- Add "explicit" to Interval(size_t, size_t) constructor.
- Change an IntervalSet constructor to delegate part of the construction
- Add "explicit" to Interval(size_t, size_t) constructor.
- Change an IntervalSet constructor to delegate part of the construction
token stream that triggered the error.
These are useful for error diagnostics, but if client code wants to throw
the RecognitionException but discard the parser and token stream, then
the fields in RecognitionException need to be cleared.
This adds RecognitionException.{clearRecognizer,clearInputStream} so that
client code can clear those fields if desired. It also makes
RecognitionException.ctx weak, so it will go nil at the same time as
the parser is discarded.
This was causing all the tokens, streams, and lexers to be retained. The
primary cycle was because of the backreference at CommonToken.source, and
the fact that the token streams buffer the tokens that they create.
Fix this by replacing the use of a (TokenSource?, CharStream?) pair with
TokenSourceAndStream, which does the same job but references its fields
weakly. This means that Token.getTokenSource() and Token.getInputStream()
will return valid values as long as you retain the lexer / stream elsewhere,
but a Token won't itself retain those things.
This was causing the entire parser to be retained, resulting in a large
memory leak.
This fix simply changes the reference from ParserATNSimulator to Parser
to be unowned.
Ditto between Lexer and LexerATNSimulator, except this reference is made
weak because LexerATNSimulator.recog is nullable. (That difference is
dubious IMHO, but I'm leaving it intact for now.)
This is a port of the equivalent code in the Java runtime.
This required a change to the CharStream interface: getText was documented
as throwing exceptions, but it wasn't actually declared as such. The
UnbufferedCharStream.getText implementation throws exceptions (in order to
match the semantics of the Java implementation), so this declaration is now
needed, and callsites need to be adjusted appropriately.
These classes throw exceptions if the instance is read-only, and only in
that case. This means that there is no need for us to propagate exception
declarations in the cases where we have guaranteed by construction
that the instance is writable. In particular, this means that IntervalSet
and ATNConfigSet's constructors won't throw exceptions(!) The set options
that return a new set (e.g. complement) no longer throw either.
To help with this, this cset adds BitSet.firstSetBit(). This is equivalent
to BitSet.nextSetBit(0), but is guaranteed not to throw an exception.
As a consequence, ANTLRErrorListener / DiagnosticErrorListener no longer
throw exceptions through any of their functions (syntaxError and report*),
and DefaultErrorStrategy can no longer throw exceptions as part of its
internal operations (though of course it can still throw exceptions if
recovery fails and a real parsing error needs to be reported).
Also, LL1Analyzer no longer throws exceptions at all, and so ATN.nextTokens
doesn't throw either.
This removes the generic parameter on RecognitionException, to make it
easier to handle them. This means that we no longer need to store them as
AnyObject and cast them back again. To do this, we add RecognizerProtocol,
which is a non-generic equivalent of the Recognizer interface (at least, the
parts of it that we need for error handling).
Remove all paths where the RecognitionException subclasses were throwing
exceptions in their initializers. This is just insane.
This has been ported over from the Java code, but it was deprecated there.
There's no point having it in the Swift runtime because we don't have the
legacy code to support. Also, it wasn't implemented properly, so it
never worked.
Remove {DFA,IntervalSet}.toString(_:[String?]?)
and the inits in ParserInterpreter and DFASerializer for the same reason.
Switch the unit tests to use the alternate toString(_:Vocabulary).
This fixes some hangovers from the port from Java:
* unnecessary type annotations;
* failure to use "if let" for nil checks;
* comments with Java code in them;
* a couple of fields that should have been declared private;
* some whitespace issues.
No semantic change.
These were ported over from the Java runtime, but they were all deprecated
there, and were commented as such here. There is no point having them in
the Swift runtime because we don't have legacy code to support.
Use Swift's overflowing operators rather than multipliedReportingOverflow
etc.
Use UInt32 for the hash values. This matches how MurmurHash3 is generally
defined (e.g. on Wikipedia).
Add support for decoding Strings (UTF-8, then little-endian) and hashing
the resultant UInt32 values.
Add a test set, using test patterns from Ian Boyd (public domain).
Remove a number of generic type constraints, since these can now
be inferred by the compiler.
Match the syntax change when passing a tuple into a function (adding
an extra set of parens).
Change filterPrecedencePredicates to avoid a now-illegal cast.
Match the renames truncatingBitPattern -> truncatingIfNeeded,
multiplyWithOverflow -> multipliedReportingOverflow, etc. In
some cases the multiplyWithOverflow calls are replaced by
overflowing operators (e.g. &*) instead.
This test is run by `go test`.
and also add test and testing utils.
Note: `github.com/stretchr/testify/assert` is required.
This assert library is almost same functionality for Java assert.
Since the install target install static and shared libs into same
folder, and because on windows a shared lib also outputs a shared
.lib file to link against, need to make sure the static/shared
.lib files do not clobber each other.
ANTLR parsers in Java are allowed to access the number of encountered
syntax errors via the getNumberOfSyntaxErrors method. However, the
Python variants must use the protected _syntaxErrors member to get this
value. The patch defines the same getter for Python targets too.
Setting ATNConfig properties can change the hash code of the instance, leading
to cases where the closureBusy set places objects in the wrong buckets. While
this has not led to known cases of stack overflow, it has led to cases where
one or more buckets contains a large number of duplicate objects, and the set's
add operation goes from O(1) to O(n).
When compiling under gcc, ANTLR4CPP_PUBLIC macro expands to the following
gcc visibility attribute:
__attribute__((visibility ("default")))
(when compiling under Windows it expands to the corresponding __declspec
attribute)
This change was introduced in commit 8ff852640a
Although the attribute makes perfect sense when applied to a "class"
declaration, it makes no sense (has no effect) when applied to an
"enum class" declaration. I assume that doing so was unintentional; that
when the change was introduced it was it was added mechanically to all
"class XXX" instances in the source code, a process which accidentally
picked up one "enum class XXX" instance.
Although it has no effect on the object code, it leads to the following
warning when compiling under gcc:
/usr/local/include/antlr4-runtime/atn/PredictionMode.h:18:31: error: type attributes ignored after type is already defined [-Werror=attributes]
enum class ANTLR4CPP_PUBLIC PredictionMode {
This is a problem for people who would like their builds to be warning-free.
Happily, this declaration can be safely removed. The "enum class" construct
(just like with regular enum) does not cause any linker symbols to be
emitted. So having a linker attribute on the type does not actually have any
effect. It can therefore be safely removed.
TokenStreamRewriter implementation was missing
Ported code from Java version; however, there are couple of deviations due to difference between composition (Go) and inheritance (Java) concepts
Ported tests from Swift for LexerA
Iterators on an unordered_map were being dereferenced after dropping a
read lock, leading to races where iterators could be invalidated before
they were used.
- Remove the readonly status from IntervalSet.
- Remove virtual functions from IntervalSet and Interval. These are
passed by value throughout the C++ runtime; meaningful inheritance is
not possible anyway.
- Moving the atomic flag into ATNState as a "now cached" flag.
- Return a const reference from ATN::nextStates(ATNState*) so the readonly
status is enforced by the compiler not at runtime in the code.
- Use value semantics using std::move to reduce the number of copies performed,
constent with how these classes are used in the C++ runtime source.
- Remove type-unsafe varargs constructor in IntervalSet, replace with
type-safe varadic templates implementation.
This is a proposed fix to bug #1826 which removes a race condition where
multiple threads could update ATNState::nextTokenWithinRule, leading to
corrupted std::vector instances in an InstanceSet.
ATN::nextTokens(ATNState* s) updates s->nextTokenWithinRule if the
IntervalSet is empty, and then sets it to be read only. However, if the
updated IntervalSet value was also empty, it becomes a read-only empty
set, causing an exception on a second call on the same state.
This was exposed a change I made to make IntervalSet::operator=()
respect the _readonly flag. (Which in turn was found by compiling with a
high warningly level.)
The approach in this update is to perform the update if the updated
value is not empty or if the current value is not read only. This
preserves the previous behaviour of creating a read-only empty set and
working on subsequent calls. It will throw on an attempt to update a
read-only value, where previously the read-only value would be silently
discarded and set to updatable.
The Travis CI build is failing after an include of <cstddef> -- This is
an attempt to work around that by including <stddef.h> instead. Problem
not apparent in my FreeBSD environment.
These changes are for compiling with high warning levels and -Werror.
There are no functional changes in this commit. Compiled with gcc 5.4
and clang 3.8.
Summary:
- Put virtual destructors into the appropriate .cpp file instead
of the inline version in the header to avoid many vtables.
- Change C-style casts to modern C++ casts.
- Add explicit casts in some signed to/from unsigned conversions.
- Remove unreached code in BufferedTokenStream.cpp and
LexerATNSimulator.cpp.
- Remove shadowed variables by qualifying constructor arguments with
the name name as a member variable.
- Add explicitly defined copy constructors and assignment operators
where required by gcc's -Weff-c++.
- Use std::numeric_limits<size_t>::max() instead of assigning a negative
number.
- Remove semi-colons after function definitions.
- Remove unneccessary casts.
- In preprocessor statements "#if label > value" change to
"#if defined(label) && label > value" to avoid warnings about the
undefined symbol being seen as zero.
- Remove ANTLR4CPP_PUBLIC from "enum class" definitions.
- Change the FinalAction move constructor to move instead of copy the
_cleanUp std::function object. (A side-effect of explicitly
initialising member variables as required by gcc's -Weff-c++. I turned
this one off because most constructors needed to be touched,
especially the classes implemented with InitializeInstanceFields()).
- Mark hex digit conversion functions as file static in guid.cpp.
as intended.
The existing code intended for ParseTreeWalker::DEFAULT to provide a
IterativeParseTreeWalker. However, the implementation initialized
ParseTreeWalker::DEFAULT by doing a (value) copy of an
IterativeParseTreeWalker, which sliced the object and therefore,
unfortunately, transformed it back into a regular ParseTreeWalker.
This change implements the desired behavior. Furthermore by making DEFAULT
a reference, we are able to preserve the interface to existing code.
Currently the JS runtime sometimes returns (and mangles) the global
`window` object instead of a proper InputStream. This is prevented by
using the `new` keyword in all cases.
- A wrong check for EOF has been corrected in the UnbufferedTokenStream (now using the correct data type for the cast to avoid warnings).
- The interpreter data write function no longer implicitly writes out imported grammars. Grammars are merged and hence contain everything from imported grammars already. If interpreter data for an imported grammar is required pass that grammar explicitly to the ANTLR tool.
Especially when you want to use LexerInterpreter and/or ParserInterpreter in any of the non-Java targets you have to provide the ATN and other data. The classes to generate these values are not in the runtime, however. Hence we need a way to tell ANTLR to produce that in a way that can be consumed by all targets.
This patch adds a new command line parameter (-interpreter) which causes ANTLR to parse the given grammars as usual and then let it generated a file for each grammar with the required interpreter values. A new InterpreterDataReader class has been added to the Java + C++ runtimes. This class can load the data file (a plain text file) and generate the structures that can directly be fed to the interpreters.
Starting with iOS 10, macOS 10.12, tvOS 10.0 and watchOS 3.0, Foundation contains
its own definition of String.contains(_:), which conflicts with the extension
provided by antlr.