antlr/doc/cpp-target.md

8.8 KiB

C++

The C++ target supports all platforms that can either run MS Visual Studio 2013 (or newer), XCode 7 (or newer) or CMake (C++11 required). All build tools can either create static or dynamic libraries, both as 64bit or 32bit arch. Additionally, XCode can create an iOS library. Also see Antlr4 for C++ with CMake: A practical example.

How to create a C++ lexer or parser?

This is pretty much the same as creating a Java lexer or parser, except you need to specify the language target, for example:

$ antlr4 -Dlanguage=Cpp MyGrammar.g4

You will see that there are a whole bunch of files generated by this call. If visitor or listener are not suppressed (which is the default) you'll get:

  • MyGrammarLexer.h + MyGrammarLexer.cpp
  • MyGrammarParser.h + MyGrammarParser.cpp
  • MyGrammarVisitor.h + MyGrammarVisitor.cpp
  • MyGrammarBaseVisitor.h + MyGrammarBaseVisitor.cpp
  • MyGrammarListener.h + MyGrammarListener.cpp
  • MyGrammarBaseListener.h + MyGrammarBaseListener.cpp

Where can I get the runtime?

Once you've generated the lexer and/or parser code, you need to download or build the runtime. Prebuilt C++ runtime binaries for Windows (Visual Studio 2013/2015), OSX/macOS and iOS are available on the ANTLR web site:

Use CMake to build a Linux library (works also on OSX, however not for the iOS library).

Instead of downloading a prebuilt binary you can also easily build your own library on OSX or Windows. Just use the provided projects for XCode or Visual Studio and build it. Should work out of the box without any additional dependency.

How do I run the generated lexer and/or parser?

Putting it all together to get a working parser is really easy. Look in the runtime/Cpp/demo folder for a simple example. The README there describes shortly how to build and run the demo on OSX, Windows or Linux.

How do I create and run a custom listener?

The generation step above created a listener and base listener class for you. The listener class is an abstract interface, which declares enter and exit methods for each of your parser rules. The base listener implements all those abstract methods with an empty body, so you don't have to do it yourself if you just want to implement a single function. Hence use this base listener as the base class for your custom listener:

#include <iostream>

#include "antlr4-runtime.h"
#include "MyGrammarLexer.h"
#include "MyGrammarParser.h"
#include "MyGrammarBaseListener.h"

using namespace org::antlr::v4::runtime;

class TreeShapeListener : public MyGrammarBaseListener {
public:
  void enterKey(ParserRuleContext *ctx) override {
	// Do something when entering the key rule.
  }
};


int main(int argc, const char* argv[]) {
  std::ifstream stream;
  stream.open(argv[1]);
  ANTLRInputStream input(stream);
  MyGrammarLexer lexer(&input);
  CommonTokenStream tokens(&lexer);
  MyGrammarParser parser(&tokens);

  tree::ParseTree *tree = parser.key();
  TreeShapeListener listener;
  tree::ParseTreeWalker::DEFAULT->walk(&listener, tree);

  return 0;
}

This example assumes your grammar contains a parser rule named key for which the enterKey function was generated.

Specialities of this ANTLR target

There are a couple of things that only the C++ ANTLR target has to deal with. They are described here.

Build Aspects

The code generation (by running the ANTLR4 jar) allows to specify 2 values you might find useful for better integration of the generated files into your application (both are optional):

  • A namespace: use the -package parameter to specify the namespace you want.
  • An export macro: especially in VC++ extra work is required to export your classes from a DLL. This is usually accomplished by a macro that has different values depending on whether you are creating the DLL or import it. The ANTLR4 runtime itself also uses one for its classes:
  #ifdef ANTLR4CPP_EXPORTS
    #define ANTLR4CPP_PUBLIC __declspec(dllexport)
  #else
    #ifdef ANTLR4CPP_STATIC
      #define ANTLR4CPP_PUBLIC
    #else
      #define ANTLR4CPP_PUBLIC __declspec(dllimport)
    #endif
  #endif

Just like the ANTLR4CPP_PUBLIC macro here you can specify your own one for the generated classes using the -DexportMacro=... command-line parameter or grammar option options {exportMacro='...';} in your grammar file.

In order to create a static lib in Visual Studio define the ANTLR4CPP_STATIC macro in addition to the project settings that must be set for a static library (if you compile the runtime yourself).

For gcc and clang it is possible to use the -fvisibility=hidden setting to hide all symbols except those that are made default-visible (which has been defined for all public classes in the runtime).

Memory Management

Since C++ has no built-in memory management we need to take extra care. For that we rely mostly on smart pointers, which however might cause time penalties or memory side effects (like cyclic references) if not used with care. Currently however the memory household looks very stable. Generally, when you see a raw pointer in code consider this as being managed elsewehere. You should never try to manage such a pointer (delete, assign to smart pointer etc.).

Unicode Support

Encoding is mostly an input issue, i.e. when the lexer converts text input into lexer tokens. The parser is completely encoding unaware.

The C++ target always expects UTF-8 input (either in a string or stream) which is then converted to UTF-32 (a char32_t array) and fed to the lexer.

Named Actions

In order to help customizing the generated files there are a number of additional socalled named actions. These actions are tight to specific areas in the generated code and allow to add custom (target specific) code. All targets support these actions

  • @parser::header
  • @parser::members
  • @lexer::header
  • @lexer::members

(and their scopeless alternatives @header and @members) where header doesn't mean a C/C++ header file, but the top of a code file. The content of the header action appears in all generated files at the first line. So it's good for things like license/copyright information.

The content of a members action is placed in the public section of lexer or parser class declarations. Hence it can be used for public variables or predicate functions used in a grammar predicate. Since all targets support header + members they are the best place for stuff that should be available also in generated files for other languages.

In addition to that the C++ target supports many more such named actions. Unfortunately, it's not possible to define new scopes (e.g. listener in addition to parser) so they had to be defined as part of the existing scopes (lexer or parser). The grammar in the demo application contains all of the named actions as well for reference. Here's the list:

  • @lexer::preinclude - Placed right before the first #include (e.g. good for headers that must appear first, for system headers etc.). Appears in both lexer h and cpp file.
  • @lexer::postinclude - Placed right after the last #include, but before any class code (e.g. for additional namespaces). Appears in both lexer h and cpp file.
  • @lexer::context - Placed right before the lexer class declaration. Use for e.g. additional types, aliases, forward declarations and the like. Appears in the lexer h file.
  • @lexer::declarations - Placed in the private section of the lexer declaration (generated sections in all classes strictly follow the pattern: public, protected, privat, from top to bottom). Use this for private vars etc.
  • @lexer::definitions - Placed before other implementations in the cpp file (but after @postinclude). Use this to implement e.g. private types.

For the parser there are the same actions as shown above for the lexer. In addition to that there are even more actions for visitor and listener classes:

  • @parser::listenerpreinclude
  • @parser::listenerpostinclude
  • @parser::listenerdeclarations
  • @parser::listenermembers
  • @parser::listenerdefinitions
  • @parser::baselistenerpreinclude
  • @parser::baselistenerpostinclude
  • @parser::baselistenerdeclarations
  • @parser::baselistenermembers
  • @parser::baselistenerdefinitions
  • @parser::visitorpreinclude
  • @parser::visitorpostinclude
  • @parser::visitordeclarations
  • @parser::visitormembers
  • @parser::visitordefinitions
  • @parser::basevisitorpreinclude
  • @parser::basevisitorpostinclude
  • @parser::basevisitordeclarations
  • @parser::basevisitormembers
  • @parser::basevisitordefinitions

and should be self explanatory now. Note: there is no context action for listeners or visitors, simply because they would be even less used than the other actions and there are so many already.