token stream that triggered the error.
These are useful for error diagnostics, but if client code wants to throw
the RecognitionException but discard the parser and token stream, then
the fields in RecognitionException need to be cleared.
This adds RecognitionException.{clearRecognizer,clearInputStream} so that
client code can clear those fields if desired. It also makes
RecognitionException.ctx weak, so it will go nil at the same time as
the parser is discarded.
This was causing all the tokens, streams, and lexers to be retained. The
primary cycle was because of the backreference at CommonToken.source, and
the fact that the token streams buffer the tokens that they create.
Fix this by replacing the use of a (TokenSource?, CharStream?) pair with
TokenSourceAndStream, which does the same job but references its fields
weakly. This means that Token.getTokenSource() and Token.getInputStream()
will return valid values as long as you retain the lexer / stream elsewhere,
but a Token won't itself retain those things.
This was causing the entire parser to be retained, resulting in a large
memory leak.
This fix simply changes the reference from ParserATNSimulator to Parser
to be unowned.
Ditto between Lexer and LexerATNSimulator, except this reference is made
weak because LexerATNSimulator.recog is nullable. (That difference is
dubious IMHO, but I'm leaving it intact for now.)
This is a port of the equivalent code in the Java runtime.
This required a change to the CharStream interface: getText was documented
as throwing exceptions, but it wasn't actually declared as such. The
UnbufferedCharStream.getText implementation throws exceptions (in order to
match the semantics of the Java implementation), so this declaration is now
needed, and callsites need to be adjusted appropriately.
These classes throw exceptions if the instance is read-only, and only in
that case. This means that there is no need for us to propagate exception
declarations in the cases where we have guaranteed by construction
that the instance is writable. In particular, this means that IntervalSet
and ATNConfigSet's constructors won't throw exceptions(!) The set options
that return a new set (e.g. complement) no longer throw either.
To help with this, this cset adds BitSet.firstSetBit(). This is equivalent
to BitSet.nextSetBit(0), but is guaranteed not to throw an exception.
As a consequence, ANTLRErrorListener / DiagnosticErrorListener no longer
throw exceptions through any of their functions (syntaxError and report*),
and DefaultErrorStrategy can no longer throw exceptions as part of its
internal operations (though of course it can still throw exceptions if
recovery fails and a real parsing error needs to be reported).
Also, LL1Analyzer no longer throws exceptions at all, and so ATN.nextTokens
doesn't throw either.
This removes the generic parameter on RecognitionException, to make it
easier to handle them. This means that we no longer need to store them as
AnyObject and cast them back again. To do this, we add RecognizerProtocol,
which is a non-generic equivalent of the Recognizer interface (at least, the
parts of it that we need for error handling).
Remove all paths where the RecognitionException subclasses were throwing
exceptions in their initializers. This is just insane.
This has been ported over from the Java code, but it was deprecated there.
There's no point having it in the Swift runtime because we don't have the
legacy code to support. Also, it wasn't implemented properly, so it
never worked.
Remove {DFA,IntervalSet}.toString(_:[String?]?)
and the inits in ParserInterpreter and DFASerializer for the same reason.
Switch the unit tests to use the alternate toString(_:Vocabulary).
This fixes some hangovers from the port from Java:
* unnecessary type annotations;
* failure to use "if let" for nil checks;
* comments with Java code in them;
* a couple of fields that should have been declared private;
* some whitespace issues.
No semantic change.
These were ported over from the Java runtime, but they were all deprecated
there, and were commented as such here. There is no point having them in
the Swift runtime because we don't have legacy code to support.
Use Swift's overflowing operators rather than multipliedReportingOverflow
etc.
Use UInt32 for the hash values. This matches how MurmurHash3 is generally
defined (e.g. on Wikipedia).
Add support for decoding Strings (UTF-8, then little-endian) and hashing
the resultant UInt32 values.
Add a test set, using test patterns from Ian Boyd (public domain).
Remove a number of generic type constraints, since these can now
be inferred by the compiler.
Match the syntax change when passing a tuple into a function (adding
an extra set of parens).
Change filterPrecedencePredicates to avoid a now-illegal cast.
Match the renames truncatingBitPattern -> truncatingIfNeeded,
multiplyWithOverflow -> multipliedReportingOverflow, etc. In
some cases the multiplyWithOverflow calls are replaced by
overflowing operators (e.g. &*) instead.
This test is run by `go test`.
and also add test and testing utils.
Note: `github.com/stretchr/testify/assert` is required.
This assert library is almost same functionality for Java assert.
Since the install target install static and shared libs into same
folder, and because on windows a shared lib also outputs a shared
.lib file to link against, need to make sure the static/shared
.lib files do not clobber each other.
ANTLR parsers in Java are allowed to access the number of encountered
syntax errors via the getNumberOfSyntaxErrors method. However, the
Python variants must use the protected _syntaxErrors member to get this
value. The patch defines the same getter for Python targets too.
Setting ATNConfig properties can change the hash code of the instance, leading
to cases where the closureBusy set places objects in the wrong buckets. While
this has not led to known cases of stack overflow, it has led to cases where
one or more buckets contains a large number of duplicate objects, and the set's
add operation goes from O(1) to O(n).
When compiling under gcc, ANTLR4CPP_PUBLIC macro expands to the following
gcc visibility attribute:
__attribute__((visibility ("default")))
(when compiling under Windows it expands to the corresponding __declspec
attribute)
This change was introduced in commit 8ff852640a
Although the attribute makes perfect sense when applied to a "class"
declaration, it makes no sense (has no effect) when applied to an
"enum class" declaration. I assume that doing so was unintentional; that
when the change was introduced it was it was added mechanically to all
"class XXX" instances in the source code, a process which accidentally
picked up one "enum class XXX" instance.
Although it has no effect on the object code, it leads to the following
warning when compiling under gcc:
/usr/local/include/antlr4-runtime/atn/PredictionMode.h:18:31: error: type attributes ignored after type is already defined [-Werror=attributes]
enum class ANTLR4CPP_PUBLIC PredictionMode {
This is a problem for people who would like their builds to be warning-free.
Happily, this declaration can be safely removed. The "enum class" construct
(just like with regular enum) does not cause any linker symbols to be
emitted. So having a linker attribute on the type does not actually have any
effect. It can therefore be safely removed.
TokenStreamRewriter implementation was missing
Ported code from Java version; however, there are couple of deviations due to difference between composition (Go) and inheritance (Java) concepts
Ported tests from Swift for LexerA
Iterators on an unordered_map were being dereferenced after dropping a
read lock, leading to races where iterators could be invalidated before
they were used.
- Remove the readonly status from IntervalSet.
- Remove virtual functions from IntervalSet and Interval. These are
passed by value throughout the C++ runtime; meaningful inheritance is
not possible anyway.
- Moving the atomic flag into ATNState as a "now cached" flag.
- Return a const reference from ATN::nextStates(ATNState*) so the readonly
status is enforced by the compiler not at runtime in the code.
- Use value semantics using std::move to reduce the number of copies performed,
constent with how these classes are used in the C++ runtime source.
- Remove type-unsafe varargs constructor in IntervalSet, replace with
type-safe varadic templates implementation.
This is a proposed fix to bug #1826 which removes a race condition where
multiple threads could update ATNState::nextTokenWithinRule, leading to
corrupted std::vector instances in an InstanceSet.
ATN::nextTokens(ATNState* s) updates s->nextTokenWithinRule if the
IntervalSet is empty, and then sets it to be read only. However, if the
updated IntervalSet value was also empty, it becomes a read-only empty
set, causing an exception on a second call on the same state.
This was exposed a change I made to make IntervalSet::operator=()
respect the _readonly flag. (Which in turn was found by compiling with a
high warningly level.)
The approach in this update is to perform the update if the updated
value is not empty or if the current value is not read only. This
preserves the previous behaviour of creating a read-only empty set and
working on subsequent calls. It will throw on an attempt to update a
read-only value, where previously the read-only value would be silently
discarded and set to updatable.
The Travis CI build is failing after an include of <cstddef> -- This is
an attempt to work around that by including <stddef.h> instead. Problem
not apparent in my FreeBSD environment.
These changes are for compiling with high warning levels and -Werror.
There are no functional changes in this commit. Compiled with gcc 5.4
and clang 3.8.
Summary:
- Put virtual destructors into the appropriate .cpp file instead
of the inline version in the header to avoid many vtables.
- Change C-style casts to modern C++ casts.
- Add explicit casts in some signed to/from unsigned conversions.
- Remove unreached code in BufferedTokenStream.cpp and
LexerATNSimulator.cpp.
- Remove shadowed variables by qualifying constructor arguments with
the name name as a member variable.
- Add explicitly defined copy constructors and assignment operators
where required by gcc's -Weff-c++.
- Use std::numeric_limits<size_t>::max() instead of assigning a negative
number.
- Remove semi-colons after function definitions.
- Remove unneccessary casts.
- In preprocessor statements "#if label > value" change to
"#if defined(label) && label > value" to avoid warnings about the
undefined symbol being seen as zero.
- Remove ANTLR4CPP_PUBLIC from "enum class" definitions.
- Change the FinalAction move constructor to move instead of copy the
_cleanUp std::function object. (A side-effect of explicitly
initialising member variables as required by gcc's -Weff-c++. I turned
this one off because most constructors needed to be touched,
especially the classes implemented with InitializeInstanceFields()).
- Mark hex digit conversion functions as file static in guid.cpp.
as intended.
The existing code intended for ParseTreeWalker::DEFAULT to provide a
IterativeParseTreeWalker. However, the implementation initialized
ParseTreeWalker::DEFAULT by doing a (value) copy of an
IterativeParseTreeWalker, which sliced the object and therefore,
unfortunately, transformed it back into a regular ParseTreeWalker.
This change implements the desired behavior. Furthermore by making DEFAULT
a reference, we are able to preserve the interface to existing code.
Currently the JS runtime sometimes returns (and mangles) the global
`window` object instead of a proper InputStream. This is prevented by
using the `new` keyword in all cases.
- A wrong check for EOF has been corrected in the UnbufferedTokenStream (now using the correct data type for the cast to avoid warnings).
- The interpreter data write function no longer implicitly writes out imported grammars. Grammars are merged and hence contain everything from imported grammars already. If interpreter data for an imported grammar is required pass that grammar explicitly to the ANTLR tool.
Especially when you want to use LexerInterpreter and/or ParserInterpreter in any of the non-Java targets you have to provide the ATN and other data. The classes to generate these values are not in the runtime, however. Hence we need a way to tell ANTLR to produce that in a way that can be consumed by all targets.
This patch adds a new command line parameter (-interpreter) which causes ANTLR to parse the given grammars as usual and then let it generated a file for each grammar with the required interpreter values. A new InterpreterDataReader class has been added to the Java + C++ runtimes. This class can load the data file (a plain text file) and generate the structures that can directly be fed to the interpreters.
Starting with iOS 10, macOS 10.12, tvOS 10.0 and watchOS 3.0, Foundation contains
its own definition of String.contains(_:), which conflicts with the extension
provided by antlr.
Current version of swift package manager doesn't support shell command
or any mechanism that we can leverage to generate parser files. Adding
a python script to kick off the unit tests.
The lock for the shared DFA state needs to protect a few more operations than just the addDFAState stuff and had to move up one call level. This in turn requires now 2 places where the lock must be aquired.