- The ATN simular generated lots of duplicate DFAStates that consumed a lot of memory, due to missing hashing/== functionality. By using an own hasher/comparer for the unordered_map in the DFA this problem could be solved. Memory consumption is now 100% stable on a low level (e.g. 115MB for a fairly complex parser) and doesn't increase anymore after the warmup phase.
- Fixed also a number of things reported by the clang static analyzer.
- Fixed a bug in Parser.cpp where the stop token in the current context would be set twice overriding a previously set value.
- Optimized token range check via ATN::nextTokens, by not duplicating the stored interval set for a given state.
- Slightly improved IntervalSet::contains() by moving some tests out of the loop and not using intermediate variables.
- Reverted the change from RecognitionException to exception_ptr (and removed the make_exception_ptr() call). This is not needed and only leads to object slicing. Instead we check directly the given exception object for the individual handling in reportError() and use std::current_exception() to get the exception pointer for our contexts.
- Removed precompiled header stuff from cmake file for the same reasons as I did in the XCode project (Visual Studio still pending).
In order to ease including the antlr runtime in other projects the include structure has been changed:
- Removed precompiled header usage (only OSX, VS + cmake still need updates).
- Created umbrella header antlr4-runtime.h that contains everything needed in a target application.
- Changed all includes to use relative paths, so it is enough to add the src folder to the header search path in an application.
- Also fixed a smaller issue in the C++ template wrt the serialized ATN storage for large grammars.
- The upper char limit in Lexer.h was wrong. Now correctly set to 10FFFF.
- The lexer ATN simulator now uses lower + upper limit of char32_t instead of hardcoded values.
- Added a little hack to Interval, where a range ending with 0xFFFF will automatically be extended to 0x10FFFF. This is necessary until ANTLR generates full Unicode intervals. This hack allows to include Unicode chars beyond the BMP in char classes in a lexer.
- Fixed an error display issue in Lexer.
- In order to support UTF-8 the input streams now support loading data from UTF-8 strings and convert them internally to UTF32.
- Additionally, all the toString() functions that (unnecessarily) used wstring are now on string as well. Only some corner cases remain where we still have std::wstring (ATNSerializer).
- The transition classes use size_t instead of int now for vocabulary matching.
- The max char value in the lexer has been increased to 1FFFE, to allow matching the full Unicode range. This is also used in the Interval class to e.g. to negate sets.
- Renamed Strings.h to StringUtils.h, to avoid a conflict with an OSX header (used with Obj-C compilation, e.g. in a project that uses the runtime).
- Also the ArrayPredictionContext has parent references (like the SingletonPredictionContext) which need to be strong refs or we may lose some of the parent contexts if they are not held somewhere else.
- Don't use WCHAR_MIN as lower bounds for char input checks, it's not 0 as you would expect but -2G, making so EOF succeed even though it should fail this check.
- Don't resize the parents array when merging parents + return states in PredictionContext or we will try to access parents outside of the available range.
- Use an unordered set when merging parents in PredictionContext, so that the normal equality pattern kicks in when comparing contexts.
- Some parameters in AbstractParseTreeVisitor where wrongly outcommented where only the param name should.
C++ template:
- No longer include the DEFAULT_MODE in the generated lexer (it's defined elsewhere).
- Corrected formatting and finished some reference rules that were not done yet.
- Reversed the meaning of grammar sections members + declarations to maintain the same meaning for members between C++ and Java target. Now members are placed in the public section of a class, while declarations use the private section. This change helps to minimize language specific parts in grammar actions.
- Removed deleted cpp files from XCode project.
- Cpp.stg:
- Renamed all occurences of "result" back to "_localctx" as they appear in the Java.stg file. While the name "result" better fits the purpose the rename increases differences between targets, hence it was taken back, so we can use the same actions in all targets.
- TokenPropertyRef_text is now complete.
Some warnings in generated files cannot be fixed in a general way because usage of parameters depends on the grammar, hence we suppress unused-parameter warnings in the grammar (for lexer and parser files).
- @parser::context or @lexer::context are now also accepted for code that should be placed directly before the class declaration (e.g. additional types, like enums etc.)
- Reverted the removal of explicit EOF handling. Thought we can just live with the EOF macro, but that doesn't work out, so we go with the same approach as the ANTLR3 C target: #undef EOF and use EOF member constants as in the original Java code.
- Fixed a crash when trying to create a hash from a null parent in PredictionContext.cpp.
- Generated token and rule enums are now placed in the lexer/parser classes which allows to use them without qualfication within those classes, making so actions in a grammar more language independent. Outside code still has to use e.g. TParser::ID to access them.
- Made some lambda capture lists more explicit. Need to test yet if we can just use a default capture instead.
- Compiling with cmake brought up quite a number of new warnings. Some of them have to be disabled yet (dollar in identifier, four char constant, overloaded virtual). The others have all been fixed.
- Updated the README to include build instruction.
C++ target:
- More sections are now supported: pre + post include, declarations, definitions (in addition to header and members).
- Added specific variants of these sections for (base)listener + (base)visitor files (baselistenerpreinclude etc.).
Had to add named sections to VisitorFile.java + ListenerFile.java.
Also added the new namedActions parameter to all target stg files where needed.
The runtime folder now contains the individual project files for each platform (and associated files/folders). The actual source code moved down one level to folder src.
The default install name is /usr/libs/antlrcpp.dylib which makes running the demo from command line not working (location and install name of the dylib differ). Since the probability that the lib is placed in an app bundle is much higher than that the lib is being installed in the system, the default has been changed to just the dylib name. This can be changed to anything else if needed in a concrete project.
- Updated XCode project (removed obsolete cpp files refs).
- Set channel datatype in lexer + tokens to size_t.
- Removed one unit test that no longer works, now that we require all objects in a vector given to murmur hash code computation to support the hashCode() function.
One of the lambdas used in a "finally" expression executed already while defining the "finally" instance, which is way too early of course (others did not show this behavior), which happens only with the Visual Studio compiler, not clang. By changing the capture list to a general reference capture things started working as they should.
Removed a few unneeded cpp files on the way.
Note: building as DLL produces many of the well known compiler warnings C4251 (type needs to have a DLL interface ..) because the runtime uses many STL classes in exported classes. I tried exporting all those types via explicit template instantion, but once I hit unordered_map I gave up. Just crazy what you have to export just to make this map export properly (and I already had like 50 exports already in place). So at the end I disabled this warning on project level. So make sure you build this DLL and all binaries using it with exact the same compiler and linker settings (same C++ runtime etc.).
Sometimes in the prediction process temporary ATNConfig instances are created which either share their prediction context with other configs or get a new context which receives the context of another config as parent. Later in the process such temporary configs are released, but the set parent prediction context should stay alive as it is used later. Since there is no top level structure that would keep them alive we need a way to make them stay. For this effect the SinglePredictionContext (which is the only prediction context that keeps a parent reference) uses a shared_ptr instead of a weak_ptr for the parent reference.
- Changed the empty return state PredictionContext to max int (instead fixing it on int 16), like it's done in the Java runtime.
- Converted 2 static init lambdas in the Cpp.stg to normal code, as VS doesn't allow us to access private class members in such lambdas.
We can certainly revert to simple locks, but then have to do synchronization differently than the Java target does, which is not in the scope of the current work.
- Finished adding all new ATN related classes (e.g. profiling infos, lexer actions).
- Introduced a global Ref<X> alias for std::shared_ptr<X>. The previous approach (defining a type individually for each class led to cyclic #includes).
- Converted all ::Ref typedefs and uses to the new Ref template (+ a number of std::shared_ptr<> occurances). Work in progress still, all those occurences will be converted for 100% consistency.