- The ATN simular generated lots of duplicate DFAStates that consumed a lot of memory, due to missing hashing/== functionality. By using an own hasher/comparer for the unordered_map in the DFA this problem could be solved. Memory consumption is now 100% stable on a low level (e.g. 115MB for a fairly complex parser) and doesn't increase anymore after the warmup phase.
- Fixed also a number of things reported by the clang static analyzer.
- Fixed a bug in Parser.cpp where the stop token in the current context would be set twice overriding a previously set value.
- Optimized token range check via ATN::nextTokens, by not duplicating the stored interval set for a given state.
- Slightly improved IntervalSet::contains() by moving some tests out of the loop and not using intermediate variables.
- Reverted the change from RecognitionException to exception_ptr (and removed the make_exception_ptr() call). This is not needed and only leads to object slicing. Instead we check directly the given exception object for the individual handling in reportError() and use std::current_exception() to get the exception pointer for our contexts.
- Removed precompiled header stuff from cmake file for the same reasons as I did in the XCode project (Visual Studio still pending).
In order to ease including the antlr runtime in other projects the include structure has been changed:
- Removed precompiled header usage (only OSX, VS + cmake still need updates).
- Created umbrella header antlr4-runtime.h that contains everything needed in a target application.
- Changed all includes to use relative paths, so it is enough to add the src folder to the header search path in an application.
- Also fixed a smaller issue in the C++ template wrt the serialized ATN storage for large grammars.
- The upper char limit in Lexer.h was wrong. Now correctly set to 10FFFF.
- The lexer ATN simulator now uses lower + upper limit of char32_t instead of hardcoded values.
- Added a little hack to Interval, where a range ending with 0xFFFF will automatically be extended to 0x10FFFF. This is necessary until ANTLR generates full Unicode intervals. This hack allows to include Unicode chars beyond the BMP in char classes in a lexer.
- Fixed an error display issue in Lexer.
- In order to support UTF-8 the input streams now support loading data from UTF-8 strings and convert them internally to UTF32.
- Additionally, all the toString() functions that (unnecessarily) used wstring are now on string as well. Only some corner cases remain where we still have std::wstring (ATNSerializer).
- The transition classes use size_t instead of int now for vocabulary matching.
- The max char value in the lexer has been increased to 1FFFE, to allow matching the full Unicode range. This is also used in the Interval class to e.g. to negate sets.
- Renamed Strings.h to StringUtils.h, to avoid a conflict with an OSX header (used with Obj-C compilation, e.g. in a project that uses the runtime).
- Also the ArrayPredictionContext has parent references (like the SingletonPredictionContext) which need to be strong refs or we may lose some of the parent contexts if they are not held somewhere else.
- Don't use WCHAR_MIN as lower bounds for char input checks, it's not 0 as you would expect but -2G, making so EOF succeed even though it should fail this check.
- Don't resize the parents array when merging parents + return states in PredictionContext or we will try to access parents outside of the available range.
- Use an unordered set when merging parents in PredictionContext, so that the normal equality pattern kicks in when comparing contexts.
- Some parameters in AbstractParseTreeVisitor where wrongly outcommented where only the param name should.
C++ template:
- No longer include the DEFAULT_MODE in the generated lexer (it's defined elsewhere).
- Corrected formatting and finished some reference rules that were not done yet.
- Reversed the meaning of grammar sections members + declarations to maintain the same meaning for members between C++ and Java target. Now members are placed in the public section of a class, while declarations use the private section. This change helps to minimize language specific parts in grammar actions.
- Removed deleted cpp files from XCode project.
- Cpp.stg:
- Renamed all occurences of "result" back to "_localctx" as they appear in the Java.stg file. While the name "result" better fits the purpose the rename increases differences between targets, hence it was taken back, so we can use the same actions in all targets.
- TokenPropertyRef_text is now complete.
Some warnings in generated files cannot be fixed in a general way because usage of parameters depends on the grammar, hence we suppress unused-parameter warnings in the grammar (for lexer and parser files).
- @parser::context or @lexer::context are now also accepted for code that should be placed directly before the class declaration (e.g. additional types, like enums etc.)
- Reverted the removal of explicit EOF handling. Thought we can just live with the EOF macro, but that doesn't work out, so we go with the same approach as the ANTLR3 C target: #undef EOF and use EOF member constants as in the original Java code.
- Fixed a crash when trying to create a hash from a null parent in PredictionContext.cpp.
- Generated token and rule enums are now placed in the lexer/parser classes which allows to use them without qualfication within those classes, making so actions in a grammar more language independent. Outside code still has to use e.g. TParser::ID to access them.
- Made some lambda capture lists more explicit. Need to test yet if we can just use a default capture instead.
- Compiling with cmake brought up quite a number of new warnings. Some of them have to be disabled yet (dollar in identifier, four char constant, overloaded virtual). The others have all been fixed.
- Updated the README to include build instruction.