- Reverted the change from RecognitionException to exception_ptr (and removed the make_exception_ptr() call). This is not needed and only leads to object slicing. Instead we check directly the given exception object for the individual handling in reportError() and use std::current_exception() to get the exception pointer for our contexts.
- Removed precompiled header stuff from cmake file for the same reasons as I did in the XCode project (Visual Studio still pending).
In order to ease including the antlr runtime in other projects the include structure has been changed:
- Removed precompiled header usage (only OSX, VS + cmake still need updates).
- Created umbrella header antlr4-runtime.h that contains everything needed in a target application.
- Changed all includes to use relative paths, so it is enough to add the src folder to the header search path in an application.
- Also fixed a smaller issue in the C++ template wrt the serialized ATN storage for large grammars.
- The upper char limit in Lexer.h was wrong. Now correctly set to 10FFFF.
- The lexer ATN simulator now uses lower + upper limit of char32_t instead of hardcoded values.
- Added a little hack to Interval, where a range ending with 0xFFFF will automatically be extended to 0x10FFFF. This is necessary until ANTLR generates full Unicode intervals. This hack allows to include Unicode chars beyond the BMP in char classes in a lexer.
- Fixed an error display issue in Lexer.
- In order to support UTF-8 the input streams now support loading data from UTF-8 strings and convert them internally to UTF32.
- Additionally, all the toString() functions that (unnecessarily) used wstring are now on string as well. Only some corner cases remain where we still have std::wstring (ATNSerializer).
- The transition classes use size_t instead of int now for vocabulary matching.
- The max char value in the lexer has been increased to 1FFFE, to allow matching the full Unicode range. This is also used in the Interval class to e.g. to negate sets.
- Renamed Strings.h to StringUtils.h, to avoid a conflict with an OSX header (used with Obj-C compilation, e.g. in a project that uses the runtime).
- Also the ArrayPredictionContext has parent references (like the SingletonPredictionContext) which need to be strong refs or we may lose some of the parent contexts if they are not held somewhere else.
- Don't use WCHAR_MIN as lower bounds for char input checks, it's not 0 as you would expect but -2G, making so EOF succeed even though it should fail this check.
- Don't resize the parents array when merging parents + return states in PredictionContext or we will try to access parents outside of the available range.
- Use an unordered set when merging parents in PredictionContext, so that the normal equality pattern kicks in when comparing contexts.
- Some parameters in AbstractParseTreeVisitor where wrongly outcommented where only the param name should.
C++ template:
- No longer include the DEFAULT_MODE in the generated lexer (it's defined elsewhere).
- Corrected formatting and finished some reference rules that were not done yet.
- Reversed the meaning of grammar sections members + declarations to maintain the same meaning for members between C++ and Java target. Now members are placed in the public section of a class, while declarations use the private section. This change helps to minimize language specific parts in grammar actions.
- Removed deleted cpp files from XCode project.
- Cpp.stg:
- Renamed all occurences of "result" back to "_localctx" as they appear in the Java.stg file. While the name "result" better fits the purpose the rename increases differences between targets, hence it was taken back, so we can use the same actions in all targets.
- TokenPropertyRef_text is now complete.
Some warnings in generated files cannot be fixed in a general way because usage of parameters depends on the grammar, hence we suppress unused-parameter warnings in the grammar (for lexer and parser files).
- @parser::context or @lexer::context are now also accepted for code that should be placed directly before the class declaration (e.g. additional types, like enums etc.)
- Reverted the removal of explicit EOF handling. Thought we can just live with the EOF macro, but that doesn't work out, so we go with the same approach as the ANTLR3 C target: #undef EOF and use EOF member constants as in the original Java code.
- Fixed a crash when trying to create a hash from a null parent in PredictionContext.cpp.
- Generated token and rule enums are now placed in the lexer/parser classes which allows to use them without qualfication within those classes, making so actions in a grammar more language independent. Outside code still has to use e.g. TParser::ID to access them.
- Made some lambda capture lists more explicit. Need to test yet if we can just use a default capture instead.
- Compiling with cmake brought up quite a number of new warnings. Some of them have to be disabled yet (dollar in identifier, four char constant, overloaded virtual). The others have all been fixed.
- Updated the README to include build instruction.
termination condition.
The PredictionMode::hasSLLConflictTerminatingPrediction method aims to
create ATNConfig objects from another ATNConfig and SemanticContext
objects. In case of the Python targets, the initialization
happened without keyword arguments. Since the called __init__
method had default values set for all the parameters, the parameter
substitution worked by indices. As a consequence, the first ATNConfig
parameter was wrongly interpreted as an ATNState and the SemanticContext as
an alternative. The patch fixes this by adding the missing keywords.
C++ target:
- More sections are now supported: pre + post include, declarations, definitions (in addition to header and members).
- Added specific variants of these sections for (base)listener + (base)visitor files (baselistenerpreinclude etc.).
Had to add named sections to VisitorFile.java + ListenerFile.java.
Also added the new namedActions parameter to all target stg files where needed.
The runtime folder now contains the individual project files for each platform (and associated files/folders). The actual source code moved down one level to folder src.
The default install name is /usr/libs/antlrcpp.dylib which makes running the demo from command line not working (location and install name of the dylib differ). Since the probability that the lib is placed in an app bundle is much higher than that the lib is being installed in the system, the default has been changed to just the dylib name. This can be changed to anything else if needed in a concrete project.
- Updated XCode project (removed obsolete cpp files refs).
- Set channel datatype in lexer + tokens to size_t.
- Removed one unit test that no longer works, now that we require all objects in a vector given to murmur hash code computation to support the hashCode() function.
Removed a few unneeded cpp files on the way.
Note: building as DLL produces many of the well known compiler warnings C4251 (type needs to have a DLL interface ..) because the runtime uses many STL classes in exported classes. I tried exporting all those types via explicit template instantion, but once I hit unordered_map I gave up. Just crazy what you have to export just to make this map export properly (and I already had like 50 exports already in place). So at the end I disabled this warning on project level. So make sure you build this DLL and all binaries using it with exact the same compiler and linker settings (same C++ runtime etc.).
Sometimes in the prediction process temporary ATNConfig instances are created which either share their prediction context with other configs or get a new context which receives the context of another config as parent. Later in the process such temporary configs are released, but the set parent prediction context should stay alive as it is used later. Since there is no top level structure that would keep them alive we need a way to make them stay. For this effect the SinglePredictionContext (which is the only prediction context that keeps a parent reference) uses a shared_ptr instead of a weak_ptr for the parent reference.
- Changed the empty return state PredictionContext to max int (instead fixing it on int 16), like it's done in the Java runtime.
- Converted 2 static init lambdas in the Cpp.stg to normal code, as VS doesn't allow us to access private class members in such lambdas.
We can certainly revert to simple locks, but then have to do synchronization differently than the Java target does, which is not in the scope of the current work.
- Finished adding all new ATN related classes (e.g. profiling infos, lexer actions).
- Introduced a global Ref<X> alias for std::shared_ptr<X>. The previous approach (defining a type individually for each class led to cyclic #includes).
- Converted all ::Ref typedefs and uses to the new Ref template (+ a number of std::shared_ptr<> occurances). Work in progress still, all those occurences will be converted for 100% consistency.
- Removed rules no longer used in the Cpp.stg.
- Removed CppTarget.java in old place.
- Converted + added most new LexerAction* classes to XCode project.
- Added changes to a number of classes.
- Class ATN is currently leaking states (delete disabled in d-tor) until I have found a solution for the failing return value optimization from ATNDeserializer::deserialize().
- Added member init in BlockEndState.
Originally the serialized ATN was coded as a wide string literal with embeded Unicode escapes (where necessary). This might fail on Windows however, as VS is very strict when checking Unicode code points and rejects compilation if it finds undefined code points. Hence this generation has been changed to hex numbers instead of that string literal.
- Solved all compilation issues. Updated the antlr4cpp library project. At the moment we only build a static lib. Need to add exports for a DLL. Since we want this library to be compatible with VS 2013 still, we cannot use std::rethrow_with_nested(), hence we do a simple unnested throw in such cases. Starting with VS 2015 this works fully then.
- Added demo application.
- Added parser generation script.
- The needed ANTLR jar is provided now, so it's not needed to build it yourself.
- The generate.sh script has been updated to use the new jar.
- Small update of the readme too.
- All allocations are now checked for proper deallocation.
- Ran LLVM analyzer over the runtime but it found mostly valid stuff and did not find non-freed allocations I left undeleted by intention. So it's not worth much.
- Added move and copy assignment operator overloading, as well as a copy c-tor to ATN class to avoid a copy (and to be able to free content properly) after deserialization.
- Some clean up.
- Removed ultra simple test grammar + parser. No longer needed.
- Removed long list of keywords from (regular) test grammar.
- Fixed a number of toString() methods to get better debug output.
- Moved Ref typedefs from Declarations.h to the individual classes as defining them on the forward declarations totally confuses the XCode debugger.
- Removed reference to the owning ATN in an ATNState. We cannot guarantee to have the correct address there due to the way the states are created. The reference is not needed anyway.
- ATNDeserializationOptions now has verifyATN set by default (as in the Java target).
- Had to add a workaround for a weird situation: static initialization in ATNDeseralizer stopped working for no apparent reason. Need to investigate this.
- Added a few support methods to the CPPUtils, mostly to ease debug output creation.
- Added console listener by default to the listeners list (as done in the Java target).
- Fixed translation mistakes in the CommonTokenStream class.
- Fixed some memory leaks and exception handling bugs.
- Removed a few unused classes.
- More raw pointers to smart pointers conversion: RuleContext, ParserRuleContext, ParseTree, Token, ParseTreeWalker, Tree...
- BitSet is now used directly instead of all those dynamic allocations and is a derived class instead of a composite.
- Replced ATNState equals with == operator overload.
- Correct wrong iterator over ATNConfigsets.
- Added utilitiy function that mimics Java's generic toString().
- Exceptions are now consistently thrown by value and captured by reference. C++11 exception_ptr and nested_exception are used when exception references are neeeded or when implementing the equivalent of Java's nesting.
- The is<> helper didn't handle properly (const) references, which is now explicitly handled. Added new unit tests for that.
- Fixed a number of places where a catch all was used to implement a "finally" (which hides exceptions).
- Changed exceptions to hold (temporary) raw pointers instead of shared pointers, as otherwise it is tried to free wrapped pointers which might just be references to static objects. Might later be updated again when we continue with removing raw pointers.
- Some smaller fixes.
- The generated simple parser now runs through without any error (yet, it doesn't do anything useful).
- ANTLR C++ target template:
- Added getListener and genVisitor bool members to ANTLR's LexerFile + ParserFile classes, so can use them in the template.
- Made addition of listener #include dependent on the new genListener member, which allows to run parser generation without listeners/visitors.
- Added an even simpler grammar to ease debugging while getting the lib into a working state.
- Added helper template is<> to ease frequent type checks (for value types, ref types and shared_ptr). Added some unit tests for that as well.
- Changed the MurmurHash::hashCode() function to take shared_ptr as this is the only variant we need. Had to change the MurmurHash unit tests for that.
- Removed conflicting IntStream::_EOF (and other variants). We use the C runtime EOF value instead.
- Changed all references to semantic contexts, prediction context and the prediction context cache to use shared_ptr<>. Created *Ref typedefs to simplify usage.
- Adjusted the C++ string templates for that.
- Fixed a number of memory leaks + some cleanup.
- Reworked the exception hierarchy to conform with the Java hierarchy (where we mimic that). Ultimative base class is std::exception, which uses std::string (char* actually) for messages, so all exceptions use std::string for that as well. Consider that as first step to rework the entire lib to use std::string instead of std::wstring (with utf-8 for full Unicode support).
- Removed ASSERTException + TODOException and fixed the places where they were used.
- Removed ANTLRException, which was only an intermediate layer without an equivalent on Java side.
- Replaced some equals() calls by == (with defined operator overloading).
- Enhanced Arrays::equals() to ensure it compiles only if the actual types being compared support the != operator (both value + reference types).
- Made the Recognizer class template free by using plain polymorphism. Some adjustments were need also in the Cpp template to support that. Could convert the .inl file to .cpp then.
- Added IntervalSet unit tests. Fixed a few bugs found by that.
- Enhanced the demo grammar so that we use as many as possible template rules from Cpp.stg. Still not fully done it seems.
- Fixed bugs in size determination for arrays (vectors now).
- Simplified PredictionContext and SemanticContext (one template parameter less).
- Removed no longer used Utils.h/cpp. Fixed CPPUtils.h/cpp.
- Extended LexerXXCommand template rules to take a new grammar parameter (code gen has been updated) as we need this context in the Cpp target. This change requires to update all existing templates! Cannot do here as this is an old revision.
- Some cleanup.
While testing Interval() and Interval::of() I found that the latter is twice as slow as the normal object creation. Seems caching single element intervals doesn't have the same impact as in Java (quite the opposite), so I removed Interval::off and the interval cache.
The MurmurHash implementation was actually for a 32bit platform, so I added a 64 bit version too (stripped down from 128 bit MurmurHash3). Tests cannot directly check the correctness of the algorithm, but duplicate checks over 300K hashs (for short input, which is more prone to duplicates than longer input) showed there are no duplicates. So I take it that the code is good.
Fixed a hash creation bug in PredictionContext.cpp.
- Added first real unit test set and enable code coverage collection in XCode (for ANTLRInputStream).
- Reworked ANTLRFileStream::load, which is now more flexible (supports Unicode BOM + 3 possible encodings), can load from Unicode file names and has almost no platform code.
- Enabled strict data size and sign checks in XCode (clang) and fixed a million places...
- Started converting int to size_t where it makes more sense.
- Started working on const correctness.
- Fixed a ton of memory leaks.
- The ATN and ATNConfigSet classes now entirely work as value types. Same for Interval(Set). These seem to be the most critical data structures (ATNConfig + ATNState are pending).
- The abstract IntSet class is gone now.
- Murmur hash code now works with size_t instead of int (need to add unit tests for that).
- Fixed a number of TODOs and other smaller things.
- The Cpp template now properly handles grammar rule return values.
- Reworked the ATNConfigSet + the config lookup implementation it uses. The new implementation no longer needs the hand written Array2DHashSet class but instead relies now on std::unordered_set with custom hasher and comparer classes.
- Fixed a bug where the ATNConfigSet was deriving from std::set while in the original Java code it only implements the Set interface (not the config set itself is a set but the config lookup is). As a consequence all iterations over ATNConfigSet now iterate over ATNConfigSet->configLookup.
- Removed the Any class as it didn't solve the problems we had mind.
- Removed the no longer necessary Array2DHashSet, AbstractEqualityComparer and ObjectEqualityComparer classes.
- Instead there is a new ConfigLookup implementation with a templated config lookup implementation.
- Removed ATNConfig::equals, as this is already implement in the == operator overloading. So the operator is used instead where needed.
ATNs are top level structures, which are created and kept by parser/lexer classes (or their simulator equivalents). Hence there are now value types in their controlling class and passed around as const &.
IRecognizer was a template class without real need, which has been changed to make it a simple interface easily usable without having to find C++ hacks for fancy Java wildcard generics.
The Any class is loosly modelled after boost::Any and allows us to use equals() and hashCode() functions to be used where we have no common base class (like Java's Object class). By introducing this class we can replace all void* occurances that would otherwise not work.
- Reformatted every single file to have a consistent indentation style using only space chars, with 2 chars per indentation. Reduced huge indentation due to deep namespace nesting by not indenting namespaces.
- Reduced #include usage to a minimum.
- Made copyright header the first entry in all files.
- Moved the previously mac-only prefix file (antlrcpp-Prefix.h) to the runtime. It can now be used by all platforms and includes all necessary standard headers.
- Removed a number of unused files.
- ATN deserialization finally works.
- Changed a number of pointer to STL classes to just the STL classes and pass them around by const & where necessary.
- The demo now uses a real setup to parse something and print output.
- Replaced empty UUID implementation by the guid implementation from Graeme Hill. Fixed uuid handling in a few places.
- Removed some obsolete (and mostly empty) lib files.
- Mac: the XCode project now does regenerate the files only after a grammar change, not always.
- Simplified ATNDeserializer + ATNSerializer and fixed a few things there (e.g. feature determination, duplicate copy op of the input).
- Removed some deprecated functions from ATNSimulator.h
- Fixed some bugs (e.g. uninitialized vars + leaks).
- Corrected a mistake I did in CppTarget which lead to wrong serialized ATN output.
- Cpp template:
- Added proper static init code generation for Lexer + Parser classes.
- Added some missing functions.
- Created a new command line target in the XCode project. Win + Linux yet outstanding.
- Reorganized the C++ runtime folder structure a bit
* Put everything in a new folder "runtime"
* Added a "demo" folder for the demo grammar + app
* Renamed Apple folder to Mac in demo folder
* Added a script with some descriptions to run parser regeneration (via jar or classes). This is also used in the XCode project to regenerate the files.
* Moved all C++ runtime files up in the folder hierarchy. No need to mimic the deep nesting from Java.
- Some adjustments here and there in the C++ runtime for consistency.
- Overhaul of the Cpp.stg file to produce compilable code. Extracted file level templates into a new template Files.stg. Experimented with new named actions (@parser::listenerheader) but the result is not satisfying yet. Need to investigate more.
- Extended ANTLR to produce header files if a target class returns true in the new function needsHeader().
- Added generated folder from demo to .gitignore
- Added myself to contributors file + maven xml.
AND.evalPrecedence() method is attempting to invoke a method on a
non-existent class "SemanticPredicate". It should be using
"SemanticContext" instead.
- according to the corresponding Java implementation, the equals()
method for LexerActionExecutor should be doing an equals() test
on each of the actions rather than testing the Array reference
is equal.
It is used nowhere but imports java.awt.*; Android runtime
has no java.awt.* so Android SDK build tools say "it includes
invalid packages". It's better if antlr4-runtime has no dependency
on java.awt.*, esp. it is not used anymore.
LexerActionExecutor caches its hash string in a member called
'hashString'. However, the class also has a method with the
same name which leads to unexpected results.
The member has been renamed to '_hashString' to avoid the name
clash.
The PredictionContext should be passed to the ATNConfig constructor
in the first argument, the params object. Instead, it is being passed
as the second argument which is intended to be the config.