- Mutexes have been consolidated. Instead of one per DFA (which can easily get to hundreds of them) we only have one mutex in the Recognizer class and all other parties use this for serialization. It's only about protected the DFA anyway, which is stored in a recognizer (lexer/parser).
- ATNState::getStateType() returns a size_t value now (actually an enum).
- Replaced checks via RTTI for transitions by the (serialization) type of the transition, for simplicity.
- Added some missing initialization for fields in certain ATN state classes.
- Fixed mem leak in DFA by shadowing the s0 field. That way still have a ref to the self created instance, even is s0 was replaced later.
- Added variable init in code generation for a rule context declaration (e.g. for labels).
There are 2 parts in an ANTLR genrated parser where memory is allocated: the actual parsing (with or w/o creating a parse tree) and the prediction part (via DFA/ATN etc.). The first part is highly volatile as it recreates parse tree instances (the class) on each parser run. In fact also lexer tokens belong to that part, but are already managed via unique pointers. This first part works without any smart pointer now. Instead there is a simple tracker class which holds all created references and frees them when the parser is reset or destroyed. This is a bit less optimal if the parser is set to create no parse tree, as created rule context objects are not freed immediately (like with smart pointers), but during reset. On the other hand this change gives (depending on the input) a nice speed up (0%-100%, after the warm up phase). Additionally memory consumption drops by a good amount.
Everything in the simulartors (and interpreters) remains unchanged. This is the shared prediction part.
- Had to adjust a comparison <= 0 to the new unsigned EOF.
- For testing: extended the runtime tests to all C++ tests.
- Install uuid-dev in Travis CI in order to be able to build the C++ runtime.
- Switched most symbolic signed constants to unsigned variants. Redefined EOF in particular to become (size)-1, to avoid having to use signed token type values.
- Introduced INVALID_INDEX for all previous -1 values to indicate e.g. not found indexes etc.
- Added 2 helpers to convert between symbolic and numeric form (mostly for intervals and toString()).
- Removed many no longer needed type casts to size_t.
- Updated templates for these changes.
- Limited runtime tests to C++ tests only, to see how Travis CI copes with that.
- Lesser use of shared_ptr, e.g. in listeners and some loops.
- Removed useless access methods for children in ParseRuleContext. The child list is public. Fixed initialization for start and stop nodes.
- Simplified parent + child organization in Tree and all derived classes. Instead of using overridable functions in various descendants we have now central parent + child fields in the base tree class (where they belong actually, considering this is about forming a tree). Users have to cast to the appropriate classes if necessary.
- Removed obsolete getChildren() function in Trees helper. We can just return the child vector.
- Changed edges member to an unordered_map, as this is a sparse container. This speeds up certain grammars by 1000% (e.g. highly recursive expression rules) and avoids wasting a lot of memory. This change also simplifies handling significantly.
- Had to escape tabs + linebreaks in DefaultErrorStrategy when generating a text representation. Also removed a few explicit string instance creations on the way.
- Member vars in parser context classes that take (optional) Token references must be initialized.
- Fixed a warning that copyFrom() would hide a virtual function in a ParserRuleContext.
- Another attempt to limit genrating double semicolons.
- Added new rule to test grammar to get code generation for wildcard capture.
- Updated the Cpp.stg template file for that.
- Made the Unicode hack (auto extend 0xFFFF to 0x10FFFF) dependent on a parameter, so we only use this hack when deserializing an ATN. This avoids trouble with intervals used in other contexts (like string offsets).
- Added a few operator != overloads, to fix compilation after recent changes.
- Simplified operands comparison in SemanticContext (uses the Arrays class now). Some cleanup in that class too.
- The abstract parse tree visitor now uses const& for Any references, to avoid reallocating new instances over and over again.
- The lexer counts syntax errors the same way as the parser does. So we can directly determine if there was any error by simply examining that (which avoids having to use a temporary listener).
The translation from Java generics to templates in C++ lead to the need of virtual template functions, which is not supported by C++. Instead we use now the Any class for results of visits and no longer need templates for that part.
No need to use shared_ptr for management. Listeners are, like the other main classes (parser, lexer, input stream etc.) provided by the application and hence managed there.
In order to lower the overhead when passing around Token instances via smart pointers and because the ownership is clear (token streams own them), these instances can be passed as raw pointers.