Currently the project tool-codegen generates the file UnicodeData.java into the /target directory of the project tool. Running “mvn clean package” in tool erases this file and generates a compile error since UnicodeData.java can’t be found.
This patch changes tool-codegen to generate UnicodeData.java into the /src directory of tool.
* Properly handle elements that are optional in some alts but not others
* Properly handle block sets (a group of terminals producing a SetTransition)
* Properly handle OPTIONAL subrule
This analysis is required for proper code generation in the TypeScript target
when strict null checks are enabled. It also applies to targets intending to
differentiate optional values from required values.
```
beast:/tmp $ a4.6 T.g4
org/antlr/v4/parse/GrammarTreeVisitor.g: node from line 2:7 no viable alternative at input '..'
org/antlr/v4/parse/GrammarTreeVisitor.g: node from line 2:7 no viable alternative at input '..'
org/antlr/v4/parse/GrammarTreeVisitor.g: node from line 2:7 no viable alternative at input '..'
org/antlr/v4/parse/GrammarTreeVisitor.g: node from line 2:7 no viable alternative at input '..'
org/antlr/v4/parse/GrammarTreeVisitor.g: node from line 2:7 no viable alternative at input '..'
context [/report INTERNAL_ERROR] 1:17 attribute arg isn't defined
error(20): internal error:
beast:/tmp $ a4.6.1 T.g4
error(181): T.g4:2:4: token ranges not allowed in parser: 'A'..'Z'
```
In order to export generated classes in DLLs we need a way to specify the __declspec setting. This is is usually done via a macro that takes the import or export value. The new parameter (`-export-macro`) allows to specify this, which increases the flexibility of the generated classes.
The C++ target documentation has been extended to describe build specific things, includig this new parameter.
In order to export generated classes in DLLs we need a way to specify the __declspec setting. This is is usually done via a macro that takes the import or export value. The new parameter (`-export-macro`) allows to specify this, which increases the flexibility of the generated classes.
The C++ target documentation has been extended to describe build specific things, includig this new parameter.
This comes from https://github.com/janyou/ANTLR-Swift-Target and is
marked Copyright (c) 2016 janyou on top of the BSD license and Copyrights
for Terence Parr and Sam Harwell derived from the original ANTLR source.
Incorrect code was generated for $e.v in a rule like this one:
e returns [int v] ::=
INT {$v = $INT.int;}
| '(' e ')' {$v = $e.v;}
;
After parsing "(99)" the result would have v == 0 instead of 99.
- Mutexes have been consolidated. Instead of one per DFA (which can easily get to hundreds of them) we only have one mutex in the Recognizer class and all other parties use this for serialization. It's only about protected the DFA anyway, which is stored in a recognizer (lexer/parser).
- ATNState::getStateType() returns a size_t value now (actually an enum).
- Replaced checks via RTTI for transitions by the (serialization) type of the transition, for simplicity.
- Added some missing initialization for fields in certain ATN state classes.
- Fixed mem leak in DFA by shadowing the s0 field. That way still have a ref to the self created instance, even is s0 was replaced later.
- Added variable init in code generation for a rule context declaration (e.g. for labels).
There are 2 parts in an ANTLR genrated parser where memory is allocated: the actual parsing (with or w/o creating a parse tree) and the prediction part (via DFA/ATN etc.). The first part is highly volatile as it recreates parse tree instances (the class) on each parser run. In fact also lexer tokens belong to that part, but are already managed via unique pointers. This first part works without any smart pointer now. Instead there is a simple tracker class which holds all created references and frees them when the parser is reset or destroyed. This is a bit less optimal if the parser is set to create no parse tree, as created rule context objects are not freed immediately (like with smart pointers), but during reset. On the other hand this change gives (depending on the input) a nice speed up (0%-100%, after the warm up phase). Additionally memory consumption drops by a good amount.
Everything in the simulartors (and interpreters) remains unchanged. This is the shared prediction part.
- Had to adjust a comparison <= 0 to the new unsigned EOF.
- For testing: extended the runtime tests to all C++ tests.
- Install uuid-dev in Travis CI in order to be able to build the C++ runtime.
- Switched most symbolic signed constants to unsigned variants. Redefined EOF in particular to become (size)-1, to avoid having to use signed token type values.
- Introduced INVALID_INDEX for all previous -1 values to indicate e.g. not found indexes etc.
- Added 2 helpers to convert between symbolic and numeric form (mostly for intervals and toString()).
- Removed many no longer needed type casts to size_t.
- Updated templates for these changes.
- Limited runtime tests to C++ tests only, to see how Travis CI copes with that.
- Lesser use of shared_ptr, e.g. in listeners and some loops.
- Removed useless access methods for children in ParseRuleContext. The child list is public. Fixed initialization for start and stop nodes.
- Simplified parent + child organization in Tree and all derived classes. Instead of using overridable functions in various descendants we have now central parent + child fields in the base tree class (where they belong actually, considering this is about forming a tree). Users have to cast to the appropriate classes if necessary.
- Removed obsolete getChildren() function in Trees helper. We can just return the child vector.
- Changed edges member to an unordered_map, as this is a sparse container. This speeds up certain grammars by 1000% (e.g. highly recursive expression rules) and avoids wasting a lot of memory. This change also simplifies handling significantly.
- Had to escape tabs + linebreaks in DefaultErrorStrategy when generating a text representation. Also removed a few explicit string instance creations on the way.
- Member vars in parser context classes that take (optional) Token references must be initialized.
- Fixed a warning that copyFrom() would hide a virtual function in a ParserRuleContext.
- Another attempt to limit genrating double semicolons.
- Added new rule to test grammar to get code generation for wildcard capture.
- Updated the Cpp.stg template file for that.
- Made the Unicode hack (auto extend 0xFFFF to 0x10FFFF) dependent on a parameter, so we only use this hack when deserializing an ATN. This avoids trouble with intervals used in other contexts (like string offsets).
- Added a few operator != overloads, to fix compilation after recent changes.
- Simplified operands comparison in SemanticContext (uses the Arrays class now). Some cleanup in that class too.
- The abstract parse tree visitor now uses const& for Any references, to avoid reallocating new instances over and over again.
- The lexer counts syntax errors the same way as the parser does. So we can directly determine if there was any error by simply examining that (which avoids having to use a temporary listener).
The translation from Java generics to templates in C++ lead to the need of virtual template functions, which is not supported by C++. Instead we use now the Any class for results of visits and no longer need templates for that part.
No need to use shared_ptr for management. Listeners are, like the other main classes (parser, lexer, input stream etc.) provided by the application and hence managed there.
In order to lower the overhead when passing around Token instances via smart pointers and because the ownership is clear (token streams own them), these instances can be passed as raw pointers.
And some related fixes.
- Combined multiple blank lines between package decls into one
- Moved some doc comments to just above what they document
- Changed some doc comments to start with the documented thing's name
- Removed some redundant anchor and wrap ST options
- Changed an empty struct type containing just a newline to struct{}
- Added some missing doc comments to exported interface methods
- Removed unneeded parentheses around a single import
- Prevents generating empty const blocks
- Generates single-line package const decls if only one decl
- Indented switch case bodies
- Content checks for some template vars, e.g. if v defined then v else nothing
- Removed template redundancies, e.g. from <attrs:{a | <a>}> to <attrs>.
- Removed blank lines that group embedded types and fields separately
- Changed some trivial one-line func/method decls to mult-line
- Changed empty package slice decls to use nil, e.g. var foo []uint16
- Changed const int enums that used the ST <i> var to use iota instead
- Replaced separator in GoTarget.encodeIntAsCharEscape with a template separator
All shared_ptr<> now use const& for function parameters to avoid constant copies + locks. Ownership and lifetime control is still ensured by the owning containers. Code templates have been updated as well.
- Changed namespace chain (org::antlr::v4::runtime) to just antlr4 in all files.
- Fixed runtime tests for that.
- Added conversion of the xpath code, which compiles now (no tests, tho, as there are runtime tests for it).
- Removed TestRig stuff. That doesn't work in C++.
- Avoiding double semicolons is tricky with the kind of rule nesting. Previous changes for that caused the tests to break as there were semicolons missing then.
- VS complained about the shift code generated using 1L as base, which is signed. Changed that to 1ULL, which is what is actually intended.
- Reverted the change to avoid a warning in RuleSempredFunction() in Cpp.stg as the fix didn't work 100%. We need a different solution.
- Fixed all disabled tests and enabled them.
- Some more adjustments of the test template + target template was needed.
- Worked on semicolon usage in Cpp.stg to avoid double semicolons. Might need more work, tho.