diff --git a/.gitignore b/.gitignore index 189e07153..010914716 100644 --- a/.gitignore +++ b/.gitignore @@ -62,5 +62,4 @@ tool/target runtime-testsuite/target tool-testsuite/target runtime/Cpp/demo/generated -runtime/Cpp/demo/Mac/antlrcpp.xcodeproj/xcuserdata -runtime/Cpp/demo/Mac/antlrcpp.xcodeproj/project.xcworkspace/xcuserdata +xcuserdata diff --git a/.travis.yml b/.travis.yml new file mode 100644 index 000000000..faf7fcfa0 --- /dev/null +++ b/.travis.yml @@ -0,0 +1,19 @@ +sudo: true +language: java +script: + - mvn install +jdk: + - openjdk6 + - oraclejdk7 + - oraclejdk8 +before_install: + - sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF + - sudo add-apt-repository ppa:fkrull/deadsnakes -y + - sudo add-apt-repository ppa:rwky/nodejs -y + - sudo apt-get update -qq + - sudo apt-get install -qq python3.5 + - sudo apt-get install -qq nodejs + - echo "deb http://download.mono-project.com/repo/debian wheezy/snapshots/3.12.1 main" | sudo tee /etc/apt/sources.list.d/mono-xamarin.list + - sudo apt-get install -qq mono-complete + - python --version + - python3 --version diff --git a/CHANGES.txt b/CHANGES.txt index 8e4d978e3..b2eef1054 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,5 +1,45 @@ +**************************************************************************** +As of ANTLR 4.2.1, March 25 2014, we are no longer updating this file. Instead, +we are using the github release mechanism. For example, here is +4.2.1 release notes: + +https://github.com/antlr/antlr4/releases/tag/4.2.1 +**************************************************************************** + ANTLR v4 Honey Badger +January 15, 2014 + +* Unit tests for lexer actions from yesterday. +* Refactored TreeView so we can refresh tree externally w/o creating new one. + Needed for intellij plugin. + +January 14, 2014 + +* Updated serialized ATN representation of lexer actions, allowing the lexer + interpreter to execute the majority of lexer commands (#408) + +January 12, 2014 + +* Support executing precedence predicates during the SLL phase of + adaptivePredict (#401). The result is a massive performance boost for grammars + containing direct left-recursion (improvements of 5% to 1000+% have been + observed, depending on the grammar and input). + +December 29, 2013 + +* Internal change: Tool.loadGrammar() -> parseGrammar(). Tool.load()->parse() + +* Added Tool.loadGrammar(fileName) that completely parses, extracts implicit lexer, + and processes into Grammar object. Does not geneate code. Use + Grammar.getImplicitLexer() to get the lexer created during processing of + combined grammar. + +* Added Grammar.load(fileName) that creates Tool object for you. loadGrammar() + lets you create your own Tool for setting error handlers etc... + + final Grammar g = Grammar.load("/tmp/MyGrammar.g4"); + December 19, 2013 * Sam: @@ -14,19 +54,19 @@ November 24, 2013 * Ter adds tree pattern matching. Preferred interface: - ParseTree t = parser.expr(); - ParseTreePattern p = parser.compileParseTreePattern("+0", MyParser.RULE_expr); - ParseTreeMatch m = p.match(t); - String id = m.get("ID"); + ParseTree t = parser.expr(); + ParseTreePattern p = parser.compileParseTreePattern("+0", MyParser.RULE_expr); + ParseTreeMatch m = p.match(t); + String id = m.get("ID"); or - String xpath = "//blockStatement/*"; - String treePattern = "int = ;"; - ParseTreePattern p = - parser.compileParseTreePattern(treePattern, - JavaParser.RULE_localVariableDeclarationStatement); - List matches = p.findAll(tree, xpath); + String xpath = "//blockStatement/*"; + String treePattern = "int = ;"; + ParseTreePattern p = + parser.compileParseTreePattern(treePattern, + JavaParser.RULE_localVariableDeclarationStatement); + List matches = p.findAll(tree, xpath); November 20, 2013 diff --git a/LICENSE.txt b/LICENSE.txt index 49a546774..95d0a2554 100644 --- a/LICENSE.txt +++ b/LICENSE.txt @@ -1,5 +1,5 @@ [The "BSD license"] -Copyright (c) 2013 Terence Parr, Sam Harwell +Copyright (c) 2015 Terence Parr, Sam Harwell All rights reserved. Redistribution and use in source and binary forms, with or without diff --git a/README.md b/README.md new file mode 100644 index 000000000..c1d6902de --- /dev/null +++ b/README.md @@ -0,0 +1,47 @@ +# ANTLR v4 + +**ANTLR** (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build parse trees and also generates a listener interface (or visitor) that makes it easy to respond to the recognition of phrases of interest. + +*Given day-job constraints, my time working on this project is limited so I'll have to focus first on fixing bugs rather than changing/improving the feature set. Likely I'll do it in bursts every few months. Please do not be offended if your bug or pull request does not yield a response! --parrt* + +## Authors and major contributors + +* [Terence Parr](http://www.cs.usfca.edu/~parrt/), parrt@cs.usfca.edu +ANTLR project lead and supreme dictator for life +[University of San Francisco](http://www.usfca.edu/) +* [Sam Harwell](http://tunnelvisionlabs.com/) (Tool co-author, Java and C# target) +* Eric Vergnaud (Javascript, Python2, Python3 targets and significant work on C# target) + +## Useful information + +* [Release notes](https://github.com/antlr/antlr4/releases) +* [Getting started with v4](https://raw.githubusercontent.com/antlr/antlr4/master/doc/getting-started.md) +* [Official site](http://www.antlr.org/) +* [Documentation](https://raw.githubusercontent.com/antlr/antlr4/master/doc/index.md) +* [FAQ](https://raw.githubusercontent.com/antlr/antlr4/master/doc/faq/index.md) +* [API](http://www.antlr.org/api/Java/index.html) +* [ANTLR v3](http://www.antlr3.org/) +* [v3 to v4 Migration, differences](https://raw.githubusercontent.com/antlr/antlr4/master/doc/faq/general.md) + +You might also find the following pages useful, particularly if you want to mess around with the various target languages. + +* [How to build ANTLR itself](https://raw.githubusercontent.com/antlr/antlr4/master/doc/building-antlr.md) +* [How we create and deploy an ANTLR release](https://raw.githubusercontent.com/antlr/antlr4/master/doc/releasing-antlr.md) + +## The Definitive ANTLR 4 Reference + +Programmers run into parsing problems all the time. Whether it’s a data format like JSON, a network protocol like SMTP, a server configuration file for Apache, a PostScript/PDF file, or a simple spreadsheet macro language—ANTLR v4 and this book will demystify the process. ANTLR v4 has been rewritten from scratch to make it easier than ever to build parsers and the language applications built on top. This completely rewritten new edition of the bestselling Definitive ANTLR Reference shows you how to take advantage of these new features. + +You can buy the book [The Definitive ANTLR 4 Reference](http://amzn.com/1934356999) at amazon or an [electronic version at the publisher's site](https://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference). + +You will find the [Book source code](http://pragprog.com/titles/tpantlr2/source_code) useful. + +## Additional grammars +[This repository](https://github.com/antlr/grammars-v4) is a collection of grammars without actions where the +root directory name is the all-lowercase name of the language parsed +by the grammar. For example, java, cpp, csharp, c, etc... + +Travis Status +--------- + + diff --git a/README.txt b/README.txt deleted file mode 100644 index 8aac63ffd..000000000 --- a/README.txt +++ /dev/null @@ -1,121 +0,0 @@ -ANTLR v4 - -Terence Parr, parrt@cs.usfca.edu -ANTLR project lead and supreme dictator for life -University of San Francisco - -INTRODUCTION - -Hi and welcome to the Honey Badger 4.1 release of ANTLR! - -INSTALLATION - -UNIX - -0. Install Java (version 1.6 or higher) - -1. Download - - $ cd /usr/local/lib - $ curl -O http://www.antlr4.org/download/antlr-4.1-complete.jar - - Or just download in browser using URL: - - http://www.antlr4.org/download/antlr-4.1-complete.jar - - and put it somewhere rational like /usr/local/lib. - -2. Add antlr-4.1-complete.jar to your CLASSPATH: - - $ export CLASSPATH=".:/usr/local/lib/antlr-4.1-complete.jar:$CLASSPATH" - - Is also a good idea to put this in your .bash_profile or whatever your - startup script is. - -3. Create aliases for the ANTLR Tool, and TestRig. - - $ alias antlr4='java -jar /usr/local/lib/antlr-4.1-complete.jar' - $ alias grun='java org.antlr.v4.runtime.misc.TestRig' - -WINDOWS (Thanks to Graham Wideman) - -0. Install Java (version 1.6 or higher) - -1. Download http://antlr.org/download/antlr-4.1-complete.jar - Save to your directory for 3rd party Java libraries, say C:\Javalib - -2. Add antlr-4.1-complete.jar to CLASSPATH, either: - - * Permanently: Using System Properties dialog > Environment variables > - Create or append to CLASSPATH variable - - * Temporarily, at command line: - SET CLASSPATH=C:\Javalib\antlr-4.1-complete.jar;%CLASSPATH% - -3. Create short convenient commands for the ANTLR Tool, and TestRig, - using batch files or doskey commands: - - * Batch files (in directory in system PATH) - - antlr4.bat: java org.antlr.v4.Tool %* - run.bat: java org.antlr.v4.runtime.misc.TestRig %* - - * Or, use doskey commands: - - doskey antlr4=java org.antlr.v4.Tool $* - doskey grun =java org.antlr.v4.runtime.misc.TestRig $* - -TESTING INSTALLATION - -Either launch org.antlr.v4.Tool directly: - -$ java org.antlr.v4.Tool -ANTLR Parser Generator Version 4.1 - -o ___ specify output directory where all output is generated - -lib ___ specify location of .tokens files -... - -or use -jar option on java: - -$ java -jar /usr/local/lib/antlr-4.1-complete.jar -ANTLR Parser Generator Version 4.1 - -o ___ specify output directory where all output is generated - -lib ___ specify location of .tokens files -... - - -EXAMPLE - -In a temporary directory, put the following grammar inside file Hello.g4: - -// Define a grammar called Hello -// match keyword hello followed by an identifier -// match lower-case identifiers -grammar Hello; -r : 'hello' ID ; -ID : [a-z]+ ; -WS : [ \t\n]+ -> skip ; // skip spaces, tabs, newlines - -Then run ANTLR the tool on it: - -$ cd /tmp -$ antlr4 Hello.g4 -$ javac Hello*.java - -Now test it: - -$ grun Hello r -tree -hello parrt -^D -(r hello parrt) - -(That ^D means EOF on unix; it's ^Z in Windows.) The -tree option prints -the parse tree in LISP notation. - -BOOK SOURCE CODE - -http://pragprog.com/titles/tpantlr2/source_code - -GRAMMARS - -https://github.com/antlr/grammars-v4 diff --git a/antlr4-maven-plugin/.classpath b/antlr4-maven-plugin/.classpath deleted file mode 100644 index 595a5bf4f..000000000 --- a/antlr4-maven-plugin/.classpath +++ /dev/null @@ -1,7 +0,0 @@ - - - - - - - diff --git a/antlr4-maven-plugin/.project b/antlr4-maven-plugin/.project deleted file mode 100644 index 1aedcc067..000000000 --- a/antlr4-maven-plugin/.project +++ /dev/null @@ -1,23 +0,0 @@ - - - antlr4-maven-plugin - - - - - - org.eclipse.jdt.core.javabuilder - - - - - org.eclipse.m2e.core.maven2Builder - - - - - - org.eclipse.jdt.core.javanature - org.eclipse.m2e.core.maven2Nature - - diff --git a/antlr4-maven-plugin/.settings/org.eclipse.core.resources.prefs b/antlr4-maven-plugin/.settings/org.eclipse.core.resources.prefs deleted file mode 100644 index 99f26c020..000000000 --- a/antlr4-maven-plugin/.settings/org.eclipse.core.resources.prefs +++ /dev/null @@ -1,2 +0,0 @@ -eclipse.preferences.version=1 -encoding/=UTF-8 diff --git a/antlr4-maven-plugin/.settings/org.eclipse.jdt.core.prefs b/antlr4-maven-plugin/.settings/org.eclipse.jdt.core.prefs deleted file mode 100644 index 60105c1b9..000000000 --- a/antlr4-maven-plugin/.settings/org.eclipse.jdt.core.prefs +++ /dev/null @@ -1,5 +0,0 @@ -eclipse.preferences.version=1 -org.eclipse.jdt.core.compiler.codegen.targetPlatform=1.6 -org.eclipse.jdt.core.compiler.compliance=1.6 -org.eclipse.jdt.core.compiler.problem.forbiddenReference=warning -org.eclipse.jdt.core.compiler.source=1.6 diff --git a/antlr4-maven-plugin/.settings/org.eclipse.m2e.core.prefs b/antlr4-maven-plugin/.settings/org.eclipse.m2e.core.prefs deleted file mode 100644 index f897a7f1c..000000000 --- a/antlr4-maven-plugin/.settings/org.eclipse.m2e.core.prefs +++ /dev/null @@ -1,4 +0,0 @@ -activeProfiles= -eclipse.preferences.version=1 -resolveWorkspaceProjects=true -version=1 diff --git a/antlr4-maven-plugin/pom.xml b/antlr4-maven-plugin/pom.xml index 3e518518a..b9e144d33 100644 --- a/antlr4-maven-plugin/pom.xml +++ b/antlr4-maven-plugin/pom.xml @@ -1,9 +1,8 @@ + + 4.0.0 + + org.antlr + antlr4-master + 4.5.4-SNAPSHOT + + antlr4-maven-plugin + maven-plugin + ANTLR 4 Maven plugin + Maven plugin for ANTLR 4 grammars + + 3.0 + + + 2009 - + - 4.0.0 + + - - org.antlr - antlr4-master - 4.1.1-SNAPSHOT - + + + org.apache.maven + maven-plugin-api + 3.0.5 + compile + + + org.apache.maven + maven-project + 2.2.1 + + + org.codehaus.plexus + plexus-compiler-api + 2.2 + + + org.sonatype.plexus + plexus-build-api + 0.0.7 + + + + org.antlr + antlr4 + ${project.version} + + + + junit + junit + 4.11 + test + + + org.apache.maven.shared + maven-plugin-testing-harness + 1.1 + test + + + org.apache.maven.plugin-tools + maven-plugin-annotations + 3.2 + provided + + - antlr4-maven-plugin - maven-plugin + + + + org.apache.maven.plugins + maven-plugin-plugin + 3.3 + + + true + + + + mojo-descriptor + + descriptor + + + + help-goal + + helpmojo + + + + + + - ANTLR 4 Maven plugin - Maven plugin for ANTLR 4 grammars - http://www.antlr.org - - - 3.0 - - - - 2009 - - - - - - - - - org.apache.maven - maven-plugin-api - 3.0.5 - compile - - - - org.apache.maven - maven-project - 2.2.1 - - - - org.codehaus.plexus - plexus-compiler-api - 2.2 - - - - - org.antlr - antlr4 - ${project.version} - - - - - - junit - junit - 4.11 - test - - - - - org.apache.maven.shared - maven-plugin-testing-harness - 1.1 - test - - - - org.apache.maven.plugin-tools - maven-plugin-annotations - 3.2 - provided - - - - - - - install - - - - - org.apache.maven.plugins - maven-plugin-plugin - 3.2 - - - true - - - - mojo-descriptor - - descriptor - - - - help-goal - - helpmojo - - - - - - - org.apache.maven.plugins - maven-site-plugin - 3.3 - - - - org.apache.maven.plugins - maven-project-info-reports-plugin - 2.7 - - false - - - - - - - - - - - org.apache.maven.plugins - maven-plugin-plugin - 3.2 - - - - org.apache.maven.plugins - maven-javadoc-plugin - 2.9 - - true - - - - - org.apache.maven.plugins - maven-jxr-plugin - 2.3 - - - + + + + org.apache.maven.plugins + maven-plugin-plugin + 3.3 + + + org.apache.maven.plugins + maven-javadoc-plugin + 2.9 + + true + + + + org.apache.maven.plugins + maven-jxr-plugin + 2.3 + + + diff --git a/antlr4-maven-plugin/resources/META-INF/m2e/lifecycle-mapping-metadata.xml b/antlr4-maven-plugin/resources/META-INF/m2e/lifecycle-mapping-metadata.xml new file mode 100644 index 000000000..de806eee9 --- /dev/null +++ b/antlr4-maven-plugin/resources/META-INF/m2e/lifecycle-mapping-metadata.xml @@ -0,0 +1,18 @@ + + + + + + + antlr4 + + + + + true + true + + + + + diff --git a/antlr4-maven-plugin/src/main/java/org/antlr/mojo/antlr4/Antlr4ErrorLog.java b/antlr4-maven-plugin/src/main/java/org/antlr/mojo/antlr4/Antlr4ErrorLog.java index e573b7a5f..b2518786e 100644 --- a/antlr4-maven-plugin/src/main/java/org/antlr/mojo/antlr4/Antlr4ErrorLog.java +++ b/antlr4-maven-plugin/src/main/java/org/antlr/mojo/antlr4/Antlr4ErrorLog.java @@ -28,10 +28,14 @@ */ package org.antlr.mojo.antlr4; -import org.antlr.v4.runtime.misc.NotNull; +import org.antlr.v4.Tool; import org.antlr.v4.tool.ANTLRMessage; import org.antlr.v4.tool.ANTLRToolListener; import org.apache.maven.plugin.logging.Log; +import org.sonatype.plexus.build.incremental.BuildContext; +import org.stringtemplate.v4.ST; + +import java.io.File; /** * This implementation of {@link ANTLRToolListener} reports messages to the @@ -41,6 +45,8 @@ import org.apache.maven.plugin.logging.Log; */ public class Antlr4ErrorLog implements ANTLRToolListener { + private final Tool tool; + private final BuildContext buildContext; private final Log log; /** @@ -48,43 +54,70 @@ public class Antlr4ErrorLog implements ANTLRToolListener { * * @param log The Maven log */ - public Antlr4ErrorLog(@NotNull Log log) { + public Antlr4ErrorLog(Tool tool, BuildContext buildContext, Log log) { + this.tool = tool; + this.buildContext = buildContext; this.log = log; } /** * {@inheritDoc} - *

+ *

* This implementation passes the message to the Maven log. - * + *

* @param message The message to send to Maven */ @Override public void info(String message) { + if (tool.errMgr.formatWantsSingleLineMessage()) { + message = message.replace('\n', ' '); + } log.info(message); } /** * {@inheritDoc} - *

+ *

* This implementation passes the message to the Maven log. - * + *

* @param message The message to send to Maven. */ @Override public void error(ANTLRMessage message) { - log.error(message.toString()); + ST msgST = tool.errMgr.getMessageTemplate(message); + String outputMsg = msgST.render(); + if (tool.errMgr.formatWantsSingleLineMessage()) { + outputMsg = outputMsg.replace('\n', ' '); + } + + log.error(outputMsg); + + if (message.fileName != null) { + String text = message.getMessageTemplate(false).render(); + buildContext.addMessage(new File(message.fileName), message.line, message.charPosition, text, BuildContext.SEVERITY_ERROR, message.getCause()); + } } /** * {@inheritDoc} - *

+ *

* This implementation passes the message to the Maven log. - * + *

* @param message */ @Override public void warning(ANTLRMessage message) { - log.warn(message.toString()); + ST msgST = tool.errMgr.getMessageTemplate(message); + String outputMsg = msgST.render(); + if (tool.errMgr.formatWantsSingleLineMessage()) { + outputMsg = outputMsg.replace('\n', ' '); + } + + log.warn(outputMsg); + + if (message.fileName != null) { + String text = message.getMessageTemplate(false).render(); + buildContext.addMessage(new File(message.fileName), message.line, message.charPosition, text, BuildContext.SEVERITY_WARNING, message.getCause()); + } } } diff --git a/antlr4-maven-plugin/src/main/java/org/antlr/mojo/antlr4/Antlr4Mojo.java b/antlr4-maven-plugin/src/main/java/org/antlr/mojo/antlr4/Antlr4Mojo.java index a190b13bf..d8712ee6a 100644 --- a/antlr4-maven-plugin/src/main/java/org/antlr/mojo/antlr4/Antlr4Mojo.java +++ b/antlr4-maven-plugin/src/main/java/org/antlr/mojo/antlr4/Antlr4Mojo.java @@ -32,13 +32,13 @@ package org.antlr.mojo.antlr4; import org.antlr.v4.Tool; import org.antlr.v4.codegen.CodeGenerator; import org.antlr.v4.runtime.misc.MultiMap; -import org.antlr.v4.runtime.misc.NotNull; import org.antlr.v4.runtime.misc.Utils; import org.antlr.v4.tool.Grammar; import org.apache.maven.plugin.AbstractMojo; import org.apache.maven.plugin.MojoExecutionException; import org.apache.maven.plugin.MojoFailureException; import org.apache.maven.plugin.logging.Log; +import org.apache.maven.plugins.annotations.Component; import org.apache.maven.plugins.annotations.LifecyclePhase; import org.apache.maven.plugins.annotations.Mojo; import org.apache.maven.plugins.annotations.Parameter; @@ -49,11 +49,13 @@ import org.codehaus.plexus.compiler.util.scan.SimpleSourceInclusionScanner; import org.codehaus.plexus.compiler.util.scan.SourceInclusionScanner; import org.codehaus.plexus.compiler.util.scan.mapping.SourceMapping; import org.codehaus.plexus.compiler.util.scan.mapping.SuffixMapping; +import org.sonatype.plexus.build.incremental.BuildContext; import java.io.BufferedWriter; import java.io.File; -import java.io.FileWriter; import java.io.IOException; +import java.io.OutputStream; +import java.io.OutputStreamWriter; import java.io.StringWriter; import java.io.Writer; import java.net.URI; @@ -143,10 +145,11 @@ public class Antlr4Mojo extends AbstractMojo { * the generate phase of the plugin. Note that the plugin is smart enough to * realize that imported grammars should be included but not acted upon * directly by the ANTLR Tool. - *

+ *

* A set of Ant-like inclusion patterns used to select files from the source * directory for processing. By default, the pattern * **/*.g4 is used to select grammar files. + *

*/ @Parameter protected Set includes = new HashSet(); @@ -181,6 +184,9 @@ public class Antlr4Mojo extends AbstractMojo { @Parameter(defaultValue = "${basedir}/src/main/antlr4/imports") private File libDirectory; + @Component + private BuildContext buildContext; + public File getSourceDirectory() { return sourceDirectory; } @@ -206,9 +212,9 @@ public class Antlr4Mojo extends AbstractMojo { * The main entry point for this Mojo, it is responsible for converting * ANTLR 4.x grammars into the target language specified by the grammar. * - * @throws MojoExecutionException if a configuration or grammar error causes + * @exception MojoExecutionException if a configuration or grammar error causes * the code generation process to fail - * @throws MojoFailureException if an instance of the ANTLR 4 {@link Tool} + * @exception MojoFailureException if an instance of the ANTLR 4 {@link Tool} * cannot be created */ @Override @@ -347,9 +353,9 @@ public class Antlr4Mojo extends AbstractMojo { /** * * @param sourceDirectory - * @throws InclusionScanException + * @exception InclusionScanException */ - @NotNull + private List> processGrammarFiles(List args, File sourceDirectory) throws InclusionScanException { // Which files under the source set should we be looking for as grammar files SourceMapping mapping = new SuffixMapping("g4", Collections.emptySet()); @@ -366,6 +372,22 @@ public class Antlr4Mojo extends AbstractMojo { scan.addSourceMapping(mapping); Set grammarFiles = scan.getIncludedSources(sourceDirectory, null); + // We don't want the plugin to run for every grammar, regardless of whether + // it's changed since the last compilation. Check the mtime of the tokens vs + // the grammar file mtime to determine whether we even need to execute. + Set grammarFilesToProcess = new HashSet(); + + for (File grammarFile : grammarFiles) { + String tokensFileName = grammarFile.getName().split("\\.")[0] + ".tokens"; + File outputFile = new File(outputDirectory, tokensFileName); + if ( (! outputFile.exists()) || + outputFile.lastModified() < grammarFile.lastModified() ) { + grammarFilesToProcess.add(grammarFile); + } + } + + grammarFiles = grammarFilesToProcess; + if (grammarFiles.isEmpty()) { getLog().info("No grammars to process"); return Collections.emptyList(); @@ -375,6 +397,12 @@ public class Antlr4Mojo extends AbstractMojo { // Iterate each grammar file we were given and add it into the tool's list of // grammars to process. for (File grammarFile : grammarFiles) { + if (!buildContext.hasDelta(grammarFile)) { + continue; + } + + buildContext.removeMessages(grammarFile); + getLog().debug("Grammar file '" + grammarFile.getPath() + "' detected."); String relPathBase = findSourceSubdir(sourceDirectory, grammarFile.getPath()); @@ -452,7 +480,7 @@ public class Antlr4Mojo extends AbstractMojo { public CustomTool(String[] args) { super(args); - addListener(new Antlr4ErrorLog(getLog())); + addListener(new Antlr4ErrorLog(this, buildContext, getLog())); } @Override @@ -486,8 +514,8 @@ public class Antlr4Mojo extends AbstractMojo { URI relativePath = project.getBasedir().toURI().relativize(outputFile.toURI()); getLog().debug(" Writing file: " + relativePath); - FileWriter fw = new FileWriter(outputFile); - return new BufferedWriter(fw); + OutputStream outputStream = buildContext.newFileOutputStream(outputFile); + return new BufferedWriter(new OutputStreamWriter(outputStream)); } } } diff --git a/build.xml b/build.xml deleted file mode 100644 index e447a3394..000000000 --- a/build.xml +++ /dev/null @@ -1,279 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/build/Antlr4.Runtime.nuspec b/build/Antlr4.Runtime.nuspec new file mode 100644 index 000000000..f629ba1fe --- /dev/null +++ b/build/Antlr4.Runtime.nuspec @@ -0,0 +1,64 @@ + + + + Antlr4.Runtime + 0.0.0 + Sam Harwell, Terence Parr + Sam Harwell + The runtime library for parsers generated by the C# target of ANTLR 4. This package supports projects targeting .NET 2.0 or newer, and built using Visual Studio 2008 or newer. + en-us + https://github.com/sharwell/antlr4cs + https://raw.github.com/sharwell/antlr4cs/master/LICENSE.txt + https://raw.github.com/antlr/website-antlr4/master/images/icons/antlr.png + Copyright © Sam Harwell 2014 + https://github.com/sharwell/antlr4cs/releases/v$version$ + true + antlr antlr4 parsing + ANTLR 4 Runtime + The runtime library for parsers generated by the C# target of ANTLR 4. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/build/Antlr4.VS2008.nuspec b/build/Antlr4.VS2008.nuspec new file mode 100644 index 000000000..36ffe7ec9 --- /dev/null +++ b/build/Antlr4.VS2008.nuspec @@ -0,0 +1,35 @@ + + + + Antlr4.VS2008 + 0.0.0 + Sam Harwell, Terence Parr + Sam Harwell + The C# target of the ANTLR 4 parser generator for Visual Studio 2008 projects. This package supports projects targeting .NET 2.0 or newer, and built using Visual Studio 2008. + en-us + https://github.com/sharwell/antlr4cs + https://raw.github.com/sharwell/antlr4cs/master/LICENSE.txt + https://raw.github.com/antlr/website-antlr4/master/images/icons/antlr.png + Copyright © Sam Harwell 2014 + https://github.com/sharwell/antlr4cs/releases/v$version$ + true + true + antlr antlr4 parsing + ANTLR 4 (Visual Studio 2008) + The C# target of the ANTLR 4 parser generator for Visual Studio 2008 projects. + + + + + + + + + + + + + + + + diff --git a/build/Antlr4.nuspec b/build/Antlr4.nuspec new file mode 100644 index 000000000..9dc4e4754 --- /dev/null +++ b/build/Antlr4.nuspec @@ -0,0 +1,35 @@ + + + + Antlr4 + 0.0.0 + Sam Harwell, Terence Parr + Sam Harwell + The C# target of the ANTLR 4 parser generator for Visual Studio 2010+ projects. This package supports projects targeting .NET 2.0 or newer, and built using Visual Studio 2010 or newer. + en-us + https://github.com/sharwell/antlr4cs + https://raw.github.com/sharwell/antlr4cs/master/LICENSE.txt + https://raw.github.com/antlr/website-antlr4/master/images/icons/antlr.png + Copyright © Sam Harwell 2014 + https://github.com/sharwell/antlr4cs/releases/v$version$ + true + true + antlr antlr4 parsing + ANTLR 4 + The C# target of the ANTLR 4 parser generator for Visual Studio 2010+ projects. + + + + + + + + + + + + + + + + diff --git a/build/build.ps1 b/build/build.ps1 new file mode 100644 index 000000000..5dcc5e607 --- /dev/null +++ b/build/build.ps1 @@ -0,0 +1,139 @@ +param ( + [switch]$Debug, + [string]$VisualStudioVersion = "12.0", + [switch]$NoClean, + [string]$Java6Home, + [string]$MavenHome, + [string]$MavenRepo = "$($env:USERPROFILE)\.m2", + [switch]$SkipMaven, + [switch]$SkipKeyCheck +) + +# build the solutions +$SolutionPath = "..\Runtime\CSharp\Antlr4.sln" +$CF35SolutionPath = "..\Runtime\CSharp\Antlr4.VS2008.sln" + +# make sure the script was run from the expected path +if (!(Test-Path $SolutionPath)) { + echo "The script was run from an invalid working directory." + exit 1 +} + +. .\version.ps1 + +If ($Debug) { + $BuildConfig = 'Debug' +} Else { + $BuildConfig = 'Release' +} + +If ($NoClean) { + $Target = 'build' +} Else { + $Target = 'rebuild' +} + +If (-not $MavenHome) { + $MavenHome = $env:M2_HOME +} + +$Java6RegKey = 'HKLM:\SOFTWARE\JavaSoft\Java Runtime Environment\1.6' +$Java6RegValue = 'JavaHome' +If (-not $Java6Home -and (Test-Path $Java6RegKey)) { + $JavaHomeKey = Get-Item -LiteralPath $Java6RegKey + If ($JavaHomeKey.GetValue($Java6RegValue, $null) -ne $null) { + $JavaHomeProperty = Get-ItemProperty $Java6RegKey $Java6RegValue + $Java6Home = $JavaHomeProperty.$Java6RegValue + } +} + +# this is configured here for path checking, but also in the .props and .targets files +[xml]$pom = Get-Content "..\tool\pom.xml" +$CSharpToolVersionNodeInfo = Select-Xml "/mvn:project/mvn:version" -Namespace @{mvn='http://maven.apache.org/POM/4.0.0'} $pom +$CSharpToolVersion = $CSharpToolVersionNodeInfo.Node.InnerText.trim() + +# build the main project +$msbuild = "$env:windir\Microsoft.NET\Framework64\v4.0.30319\msbuild.exe" + +&$msbuild '/nologo' '/m' '/nr:false' "/t:$Target" "/p:Configuration=$BuildConfig" "/p:VisualStudioVersion=$VisualStudioVersion" $SolutionPath +if ($LASTEXITCODE -ne 0) { + $host.ui.WriteErrorLine('Build failed, aborting!') + exit $p.ExitCode +} + +# build the compact framework project +$msbuild = "$env:windir\Microsoft.NET\Framework\v4.0.30319\msbuild.exe" + +&$msbuild '/nologo' '/m' '/nr:false' '/t:rebuild' "/p:Configuration=$BuildConfig" $CF35SolutionPath +if ($LASTEXITCODE -ne 0) { + $host.ui.WriteErrorLine('.NET 3.5 Compact Framework Build failed, aborting!') + exit $p.ExitCode +} + +if (-not (Test-Path 'nuget')) { + mkdir "nuget" +} + +# Build the Java library using Maven +If (-not $SkipMaven) { + $OriginalPath = $PWD + + cd '..\tool' + $MavenPath = "$MavenHome\bin\mvn.bat" + If (-not (Test-Path $MavenPath)) { + $host.ui.WriteErrorLine("Couldn't locate Maven binary: $MavenPath") + cd $OriginalPath + exit 1 + } + + If (-not (Test-Path $Java6Home)) { + $host.ui.WriteErrorLine("Couldn't locate Java 6 installation: $Java6Home") + cd $OriginalPath + exit 1 + } + + $MavenGoal = 'package' + &$MavenPath '-DskipTests=true' '--errors' '-e' '-Dgpg.useagent=true' "-Djava6.home=$Java6Home" '-Psonatype-oss-release' $MavenGoal + if ($LASTEXITCODE -ne 0) { + $host.ui.WriteErrorLine('Maven build of the C# Target custom Tool failed, aborting!') + cd $OriginalPath + exit $p.ExitCode + } + + cd $OriginalPath +} + +$JarPath = "..\tool\target\antlr4-csharp-$CSharpToolVersion-complete.jar" +if (!(Test-Path $JarPath)) { + $host.ui.WriteErrorLine("Couldn't locate the complete jar used for building C# parsers: $JarPath") + exit 1 +} + +# By default, do not create a NuGet package unless the expected strong name key files were used +if (-not $SkipKeyCheck) { + . .\keys.ps1 + + foreach ($pair in $Keys.GetEnumerator()) { + $assembly = Resolve-FullPath -Path "..\runtime\CSharp\Antlr4.Runtime\bin\$($pair.Key)\$BuildConfig\Antlr4.Runtime.dll" + # Run the actual check in a separate process or the current process will keep the assembly file locked + powershell -Command ".\check-key.ps1 -Assembly '$assembly' -ExpectedKey '$($pair.Value)' -Build '$($pair.Key)'" + if ($LASTEXITCODE -ne 0) { + Exit $p.ExitCode + } + } +} + +$packages = @( + 'Antlr4.Runtime' + 'Antlr4' + 'Antlr4.VS2008') + +$nuget = '..\runtime\CSharp\.nuget\NuGet.exe' +ForEach ($package in $packages) { + If (-not (Test-Path ".\$package.nuspec")) { + $host.ui.WriteErrorLine("Couldn't locate NuGet package specification: $package") + exit 1 + } + + &$nuget 'pack' ".\$package.nuspec" '-OutputDirectory' 'nuget' '-Prop' "Configuration=$BuildConfig" '-Version' "$AntlrVersion" '-Prop' "M2_REPO=$M2_REPO" '-Prop' "CSharpToolVersion=$CSharpToolVersion" '-Symbols' +} diff --git a/build/check-key.ps1 b/build/check-key.ps1 new file mode 100644 index 000000000..b92a9cdcc --- /dev/null +++ b/build/check-key.ps1 @@ -0,0 +1,31 @@ +param( + [string]$Assembly, + [string]$ExpectedKey, + [string]$Build = $null +) + +function Get-PublicKeyToken() { + param([string]$assembly = $null) + if ($assembly) { + $bytes = $null + $bytes = [System.Reflection.Assembly]::ReflectionOnlyLoadFrom($assembly).GetName().GetPublicKeyToken() + if ($bytes) { + $key = "" + for ($i=0; $i -lt $bytes.Length; $i++) { + $key += "{0:x2}" -f $bytes[$i] + } + + $key + } + } +} + +if (-not $Build) { + $Build = $Assembly +} + +$actual = Get-PublicKeyToken -assembly $Assembly +if ($actual -ne $ExpectedKey) { + $host.ui.WriteErrorLine("Invalid publicKeyToken for '$Build'; expected '$ExpectedKey' but found '$actual'") + exit 1 +} diff --git a/build/keys.ps1 b/build/keys.ps1 new file mode 100644 index 000000000..4e2f34250 --- /dev/null +++ b/build/keys.ps1 @@ -0,0 +1,17 @@ +# Note: these values may only change during minor release +$Keys = @{ + 'net20' = '7983ae52036899ac' + 'net30' = '7671200403f6656a' + 'net35-cf' = '770a97458f51159e' + 'net35-client' = '4307381ae04f9aa7' + 'net40-client' = 'bb1075973a9370c4' + 'net45' = 'edc21c04cf562012' + 'netcore45' = 'e4e9019902d0b6e2' + 'portable-net40' = '90bf14da8e1462b4' + 'portable-net45' = '3d23c8e77559f391' +} + +function Resolve-FullPath() { + param([string]$Path) + [System.IO.Path]::GetFullPath((Join-Path (pwd) $Path)) +} diff --git a/build/push.ps1 b/build/push.ps1 new file mode 100644 index 000000000..17791c1cd --- /dev/null +++ b/build/push.ps1 @@ -0,0 +1,29 @@ +. .\version.ps1 + +If ($AntlrVersion.EndsWith('-dev')) { + $host.ui.WriteErrorLine("Cannot push development version '$AntlrVersion' to NuGet.") + Exit 1 +} + +$packages = @( + 'Antlr4.Runtime' + 'Antlr4' + 'Antlr4.VS2008') + +# Make sure all packages exist before pushing any packages +ForEach ($package in $packages) { + If (-not (Test-Path ".\nuget\$package.$AntlrVersion.nupkg")) { + $host.ui.WriteErrorLine("Couldn't locate NuGet package: $JarPath") + exit 1 + } + + If (-not (Test-Path ".\nuget\$package.$AntlrVersion.symbols.nupkg")) { + $host.ui.WriteErrorLine("Couldn't locate NuGet symbols package: $JarPath") + exit 1 + } +} + +$nuget = '..\runtime\CSharp\.nuget\NuGet.exe' +ForEach ($package in $packages) { + &$nuget 'push' ".\nuget\$package.$AntlrVersion.nupkg" +} diff --git a/build/version.ps1 b/build/version.ps1 new file mode 100644 index 000000000..457481bd8 --- /dev/null +++ b/build/version.ps1 @@ -0,0 +1 @@ +$AntlrVersion = "4.5.1" diff --git a/contributors.txt b/contributors.txt index 91c887522..203aa1eec 100644 --- a/contributors.txt +++ b/contributors.txt @@ -55,3 +55,41 @@ YYYY/MM/DD, github id, Full name, email 2013/01/29, metadave, Dave Parfitt, diparfitt@gmail.com 2013/03/06, bkiers, Bart Kiers, bkiers@gmail.com 2013/08/20, cayhorstmann, Cay Horstmann, cay@horstmann.com +2014/03/18, aphyr, Kyle Kingsbury, aphyr@aphyr.com +2014/06/07, ericvergnaud, Eric Vergnaud, eric.vergnaud@wanadoo.fr +2014/07/04, jimidle, Jim Idle, jimi@Idle.ws +2014/09/04. jeduden, Jan-Eric Duden, jeduden@gmail.com +2014/09/27, petrbel, Petr Bělohlávek, antlr@petrbel.cz +2014/10/18, sergiusignacius, Sérgio Silva, serge.a.silva@gmail.com +2014/10/26, bdkearns, Brian Kearns, bdkearns@gmail.com +2014/10/27, michaelpj, Michael Peyton Jones, michaelpj@gmail.com +2015/01/29, TomLottermann, Thomas Lottermann, tomlottermann@gmail.com +2015/02/15, pavlo, Pavlo Lysov, pavlikus@gmail.com +2015/03/07, RedTailedHawk, Lawrence Parker, larry@answerrocket.com +2015/04/03, rljacobson, Robert Jacobson, rljacobson@gmail.com +2015/04/06, ojakubcik, Ondrej Jakubcik, ojakubcik@gmail.com +2015/04/29, jszheng, Jinshan Zheng, zheng_js@hotmail.com +2015/05/08, ViceIce, Michael Kriese, michael.kriese@gmx.de +2015/05/09, lkraz, Luke Krasnoff, luke.krasnoff@gmail.com +2015/05/12, Pursuit92, Josh Chase, jcjoshuachase@gmail.com +2015/05/20, peturingi, Pétur Ingi Egilsson, petur@petur.eu +2015/05/27, jcbrinfo, Jean-Christophe Beaupré, jcbrinfo@users.noreply.github.com +2015/06/29, jvanzyl, Jason van Zyl, jason@takari.io +2015/08/18, krzkaczor, Krzysztof Kaczor, krzysztof@kaczor.io +2015/09/18, worsht, Rajiv Subrahmanyam, rajiv.public@gmail.com +2015/09/24, HSorensen, Henrik Sorensen, henrik.b.sorensen@gmail.com +2015/10/06, brwml, Bryan Wilhelm, bryan.wilhelm@microsoft.com +2015/10/08, fedotovalex, Alex Fedotov, me@alexfedotov.com +2015/10/12, KvanTTT, Ivan Kochurkin, ivan.kochurkin@gmail.com +2015/10/21, martin-probst, Martin Probst, martin-probst@web.de +2015/10/21, hkff, Walid Benghabrit, walid.benghabrit@mines-nantes.fr +2015/11/12, cooperra, Robbie Cooper, cooperra@users.noreply.github.com +2015/11/25, abego, Udo Borkowski, ub@abego.org +2015/12/17, sebadur, Sebastian Badur, sebadur@users.noreply.github.com +2015/12/23, pboyer, Peter Boyer, peter.b.boyer@gmail.com +2015/12/24, dtymon, David Tymon, david.tymon@gmail.com +2016/02/18, reitzig, Raphael Reitzig, reitzig[at]cs.uni-kl.de +2016/03/10, mike-lischke, Mike Lischke, mike@lischke-online.de +2016/03/27, beardlybread, Bradley Steinbacher, bradley.j.steinbacher@gmail.com +2016/03/29, msteiger, Martin Steiger, antlr@martin-steiger.de +2016/03/28, gagern, Martin von Gagern, gagern@ma.tum.de diff --git a/doc/IDEs.md b/doc/IDEs.md new file mode 100644 index 000000000..c36d68f10 --- /dev/null +++ b/doc/IDEs.md @@ -0,0 +1,8 @@ +# Integrating ANTLR into Development Systems + +The Java target is the reference implementation mirrored by other targets. The following pages help you integrate ANTLR into development environments and build systems appropriate for your target language. As of January 2015, we have Java, C#, Python 2, Python 3, and JavaScript targets. + +The easiest thing is probably just to use an [ANTLR plug-in](http://www.antlr.org/tools.html) for your favorite development environment. + +Java IDE Integration +C# IDE Integration diff --git a/doc/ace-javascript-target.md b/doc/ace-javascript-target.md new file mode 100644 index 000000000..fbba9e9b9 --- /dev/null +++ b/doc/ace-javascript-target.md @@ -0,0 +1,247 @@ +# Integrating ANTLR JavaScript parsers with ACE editor + +Having the ability to parse code other than JavaScript is great, but nowadays users expect to be able to edit code with nice edit features such as keyword highlighting, indentation and brace matching, and advanced ones such as syntax checking. + +I have been through the process of integrating an ANTLR parser with ACE, the dominant code editor for web based code editing. Information about ACE can be found on their web site. + +This page describes my experience, and humbly aims to help you get started. It is not however a reference guide, and no support is provided. + +## Architecture + +The ACE editor is organized as follows + +1. The editor itself is a
which once initialized comprises a number of elements. This UI element is responsible for the display, and the generation of edit events. +1. The editor relies on a Session, which manages events and configuration. +1. The code itself is stored in a Document. Any insertion or deletion of text is reflected in the Document. +1. Keyword highlighting, indentation and brace matching are delegated to a mode. There is no direct equivalent of an ACE mode in ANTLR. While keywords are the equivalent of ANTLR lexer tokens, indentation and brace matching are edit tasks, not parsing ones. A given ACE editor can only have one mode, which corresponds to the language being edited. There is no need for ANTLR integration to support keyword highlighting, indentation and brace matching. +1. Syntax checking is delegated to a worker. This is where ANTLR integration is needed. If syntax checking is enabled, ACE asks the mode to create a worker. In JavaScript, workers run in complete isolation i.e. they don't share code or variables with other workers, or with the HTML page itself. +1. The below diagram describes how the whole system works. In green are the components *you* need to provide. You'll notice that there is no need to load ANTLR in the HTML page itself. You'll also notice that ACE maintains a document in each thread. This is done through low level events sent by the ACE session to the worker which describe the delta. Once applied to the worker document, a high level event is triggered, which is easy to handle since at this point the worker document is a perfect copy of the UI document. + + + +## Step-by-step guide + +The first thing to do is to create an editor in your html page. This is thoroughly described in the ACE documentation, so we'll just sum it up here: + +```xml + + +``` + +This should give you a working editor. You may want to control its sizing using CSS. I personally load the editor in an iframe and set its style to position: absolute, top: 0, left: 0 etc... but I'm sure you know better than me how to achieve results. + +The second thing to do is to configure the ACE editor to use your mode i.e. language configuration. A good place to start is to inherit from the built-in TextMode. The following is a very simple example, which only caters for comments, literals, and a limited subset of separators and keywords : + +```javascript +ace.define('ace/mode/my-mode',["require","exports","module","ace/lib/oop","ace/mode/text","ace/mode/text_highlight_rules", "ace/worker/worker_client" ], function(require, exports, module) { + var oop = require("ace/lib/oop"); + var TextMode = require("ace/mode/text").Mode; + var TextHighlightRules = require("ace/mode/text_highlight_rules").TextHighlightRules; + + var MyHighlightRules = function() { + var keywordMapper = this.createKeywordMapper({ + "keyword.control": "if|then|else", + "keyword.operator": "and|or|not", + "keyword.other": "class", + "storage.type": "int|float|text", + "storage.modifier": "private|public", + "support.function": "print|sort", + "constant.language": "true|false" + }, "identifier"); + this.$rules = { + "start": [ + { token : "comment", regex : "//" }, + { token : "string", regex : '["](?:(?:\\\\.)|(?:[^"\\\\]))*?["]' }, + { token : "constant.numeric", regex : "0[xX][0-9a-fA-F]+\\b" }, + { token : "constant.numeric", regex: "[+-]?\\d+(?:(?:\\.\\d*)?(?:[eE][+-]?\\d+)?)?\\b" }, + { token : "keyword.operator", regex : "!|%|\\\\|/|\\*|\\-|\\+|~=|==|<>|!=|<=|>=|=|<|>|&&|\\|\\|" }, + { token : "punctuation.operator", regex : "\\?|\\:|\\,|\\;|\\." }, + { token : "paren.lparen", regex : "[[({]" }, + { token : "paren.rparen", regex : "[\\])}]" }, + { token : "text", regex : "\\s+" }, + { token: keywordMapper, regex: "[a-zA-Z_$][a-zA-Z0-9_$]*\\b" } + ] + }; + }; + oop.inherits(MyHighlightRules, TextHighlightRules); + + var MyMode = function() { + this.HighlightRules = MyHighlightRules; + }; + oop.inherits(MyMode, TextMode); + + (function() { + + this.$id = "ace/mode/my-mode"; + + }).call(MyMode.prototype); + + exports.Mode = MyMode; +}); +``` + +Now if you store the above in a file called "my-mode.js", setting the ACE Editor becomes straightforward: + +```xml + + + +``` + +At this point you should have a working editor, able to highlight keywords. You may wonder why you need to set the tokens when you have already done so in your ANTLR lexer grammar. First, ACE expects a classification (control, operator, type...) which does not exist in ANTLR. Second, there is no need for ANTLR to achieve this, since ACE comes with its own lexer. + +Ok, now that we have a working editor comes the time where we need syntax validation. This is where the worker comes in the picture. + +Creating the worker is the responsibility of the mode you provide. So you need to enhance it with something like the following: + +```javascript +var WorkerClient = require("ace/worker/worker_client").WorkerClient; +this.createWorker = function(session) { + this.$worker = new WorkerClient(["ace"], "ace/worker/my-worker", "MyWorker", "../js/my-worker.js"); + this.$worker.attachToDocument(session.getDocument()); + + this.$worker.on("errors", function(e) { + session.setAnnotations(e.data); + }); + + this.$worker.on("annotate", function(e) { + session.setAnnotations(e.data); + }); + + this.$worker.on("terminate", function() { + session.clearAnnotations(); + }); + + return this.$worker; + +}; +``` + +The above code needs to be placed in the existing worker, after: + +```javascript +this.$id = "ace/mode/my-mode"; +``` + +Please note that the mode code runs on the UI side, not the worker side. The event handlers here are for events sent by the worker, not to the worker. + +Obviously the above won't work out of the box, because you need to provide the "my-worker.js" file. + +Creating a worker from scratch is not something I've tried. Simply put, your worker needs to handle all messages sent by ACE using the WorkerClient created by the mode. This is not a simple task, and is better delegated to existing ACE code, so we can focus on tasks specific to our language. + +What I did is I started from "mode-json.js", a rather simple worker which comes with ACE, stripped out all JSON validation related stuff out of it, and saved the remaining code in a file name "worker-base.js" which you can find [here](resources/worker-base.js). Once this done, I was able to create a simple worker, as follows: + +```javascript +importScripts("worker-base.js"); +ace.define('ace/worker/my-worker',["require","exports","module","ace/lib/oop","ace/worker/mirror"], function(require, exports, module) { + "use strict"; + + var oop = require("ace/lib/oop"); + var Mirror = require("ace/worker/mirror").Mirror; + + var MyWorker = function(sender) { + Mirror.call(this, sender); + this.setTimeout(200); + this.$dialect = null; + }; + + oop.inherits(MyWorker, Mirror); + + (function() { + + this.onUpdate = function() { + var value = this.doc.getValue(); + var annotations = validate(value); + this.sender.emit("annotate", annotations); + }; + + }).call(MyWorker.prototype); + + exports.MyWorker = MyWorker; +}); + +var validate = function(input) { + return [ { row: 0, column: 0, text: "MyMode says Hello!", type: "error" } ]; +}; +``` + +At this point, you should have an editor which displays an error icon next to the first line. When you hover over the error icon, it should display: MyMode says Hello!. Is that not a friendly worker? Yum. + +What remains to be done is have our validate function actually validate the input. Finally ANTLR comes in the picture! + +To start with, let's load ANTLR and your parser, listener etc.. Easy, since you could write: + +```js +var antlr4 = require('antlr4/index'); +``` + +This may work, but it's actually unreliable. The reason is that the require function used by ANTLR, which exactly mimics the NodeJS require function, uses a different syntax than the require function that comes with ACE. So we need to bring in a require function that conforms to the NodeJS syntax. I personally use one that comes from Torben Haase's Honey project, which you can find here. But hey, now we're going to have 2 'require' functions not compatible with each other! Indeed, this is why you need to take special care, as follows: + +```js +// load nodejs compatible require +var ace_require = require; +require = undefined; +var Honey = { 'requirePath': ['..'] }; // walk up to js folder, see Honey docs +importScripts("../lib/require.js"); +var antlr4_require = require; +require = ace_require; +``` +Now it's safe to load antlr, and the parsers generated for your language. Assuming that your language files (generated or hand-built) are in a folder with an index.js file that calls require for each file, your parser loading code can be as simple as follows: +```js +// load antlr4 and myLanguage +var antlr4, mylanguage; +try { + require = antlr4_require; + antlr4 = require('antlr4/index'); + mylanguage = require('mylanguage/index'); +} finally { + require = ace_require; +} +``` +Please note the try-finally construct. ANTLR uses 'require' synchronously so it's perfectly safe to ignore the ACE 'require' while running ANTLR code. ACE itself does not guarantee synchronous execution, so you are much safer always switching 'require' back to 'ace_require'. +Now detecting deep syntax errors in your code is a task for your ANTLR listener or visitor or whatever piece of code you've delegated this to. We're not going to describe this here, since it would require some knowledge of your language. However, detecting grammar syntax errors is something ANTLR does beautifully (isn't that why you went for ANTLR in the first place?). So what we will illustrate here is how to report grammar syntax errors. I have no doubt that from there, you will be able to extend the validator to suit your specific needs. +Whenever ANTLR encounters an unexpected token, it fires an error. By default, the error is routed to an error listener which simply writes to the console. +What we need to do is replace this listener by our own listener, se we can route errors to the ACE editor. First, let's create such a listener: +```js +// class for gathering errors and posting them to ACE editor +var AnnotatingErrorListener = function(annotations) { + antlr4.error.ErrorListener.call(this); + this.annotations = annotations; + return this; +}; + +AnnotatingErrorListener.prototype = Object.create(antlr4.error.ErrorListener.prototype); +AnnotatingErrorListener.prototype.constructor = AnnotatingErrorListener; + +AnnotatingErrorListener.prototype.syntaxError = function(recognizer, offendingSymbol, line, column, msg, e) { + this.annotations.push({ + row: line - 1, + column: column, + text: msg, + type: "error" + }); +}; +``` +With this, all that remains to be done is plug the listener in when we parse the code. Here is how I do it: +```js +var validate = function(input) { + var stream = new antlr4.InputStream(input); + var lexer = new mylanguage.MyLexer(stream); + var tokens = new antlr4.CommonTokenStream(lexer); + var parser = new mylanguage.MyParser(tokens); + var annotations = []; + var listener = new AnnotatingErrorListener(annotations) + parser.removeErrorListeners(); + parser.addErrorListener(listener); + parser.parseMyRule(); + return annotations; +}; +``` +You know what? That's it! You now have an ACE editor that does syntax validation using ANTLR! I hope you find this useful, and simple enough to get started. +What I did not address here is packaging, not something I'm an expert at. The good news is that it makes development simple, since I don't have to run any compilation process. I just edit my code, reload my editor page, and check how it goes. +Now wait, hey! How do you debug this? Well, as usual, using Chrome, since neither Firefox or Safari are able to debug worker code. What a shame... diff --git a/doc/actions.md b/doc/actions.md new file mode 100644 index 000000000..91b6de1e4 --- /dev/null +++ b/doc/actions.md @@ -0,0 +1,204 @@ +# Actions and Attributes + +In Chapter 10, Attributes and Actions, we learned how to embed actions within grammars and looked at the most common token and rule attributes. This section summarizes the important syntax and semantics from that chapter and provides a complete list of all available attributes. (You can learn more about actions in the grammar from the free excerpt on listeners and actions.) + +Actions are blocks of text written in the target language and enclosed in curly braces. The recognizer triggers them according to their locations within the grammar. For example, the following rule emits "found a decl" after the parser has seen a valid declaration: + +``` +decl: type ID ';' {System.out.println("found a decl");} ; +type: 'int' | 'float' ; +``` + +Most often, actions access the attributes of tokens and rule references: + +``` +decl: type ID ';' + {System.out.println("var "+$ID.text+":"+$type.text+";");} + | t=ID id=ID ';' + {System.out.println("var "+$id.text+":"+$t.text+";");} + ; +``` + +## Token Attributes + +All tokens have a collection of predefined, read-only attributes. The attributes include useful token properties such as the token type and text matched for a token. Actions can access these attributes via $ label.attribute where label labels a particular instance of a token reference (a and b in the example below are used in the action code as $a and $b). Often, a particular token is only referenced once in the rule, in which case the token name itself can be used unambiguously in the action code (token INT can be used as $INT in the action). The following example illustrates token attribute expression syntax: + +``` +r : INT {int x = $INT.line;} + ( ID {if ($INT.line == $ID.line) ...;} )? + a=FLOAT b=FLOAT {if ($a.line == $b.line) ...;} + ; +``` + +The action within the `(...)?` subrule can see the `INT` token matched before it in the outer level. + +Because there are two references to the `FLOAT` token, a reference to `$FLOAT` in an action is not unique; you must use labels to specify which token reference you’re interested in. + +Token references within different alternatives are unique because only one of them can be matched for any invocation of the rule. For example, in the following rule, actions in both alternatives can reference $ID directly without using a label: + +``` + r : ... ID {System.out.println($ID.text);} + | ... ID {System.out.println($ID.text);} + ; +``` + +To access the tokens matched for literals, you must use a label: + +``` + stat: r='return' expr ';' {System.out.println("line="+$r.line);} ; +``` + +Most of the time you access the attributes of the token, but sometimes it is useful to access the Token object itself because it aggregates all the attributes. Further, you can use it to test whether an optional subrule matched a token: + +``` + stat: 'if' expr 'then' stat (el='else' stat)? + {if ( $el!=null ) System.out.println("found an else");} + | ... + ; +``` + +`$T` and `$L` evaluate to `Token` objects for token name `T` and token label `L`. `$ll` evaluates to `List` for list label `ll`. `$T.attr` evaluates to the type and value specified in the following table for attribute `attr`: + + +|Attribute|Type|Description| +|---------|----|-----------| +|text|String|The text matched for the token; translates to a call to getText. Example: $ID.text.| +|type|int|The token type (nonzero positive integer) of the token such as INT; translates to a call to getType. Example: $ID.type.| +|line|int|The line number on which the token occurs, counting from 1; translates to a call to getLine. Example: $ID.line.| +|pos|int|The character position within the line at which the token’s first character occurs counting from zero; translates to a call togetCharPositionInLine. Example: $ID.pos.| +|index|int|The overall index of this token in the token stream, counting from zero; translates to a call to getTokenIndex. Example: $ID.index.| +|channel|int|The token’s channel number. The parser tunes to only one channel, effectively ignoring off-channel tokens. The default channel is 0 (Token.DEFAULT_CHANNEL), and the default hidden channel is Token.HIDDEN_CHANNEL. Translates to a call to getChannel. Example: $ID.channel.| +|int|int|The integer value of the text held by this token; it assumes that the text is a valid numeric string. Handy for building calculators and so on. Translates to Integer.valueOf(text-of-token). Example: $INT.int.| + +## Parser Rule Attributes + +ANTLR predefines a number of read-only attributes associated with parser rule references that are available to actions. Actions can access rule attributes only for references that precede the action. The syntax is $ r.attr for rule name r or a label assigned to a rule reference. For example, $expr.text returns the complete text matched by a preceding invocation of rule expr: + +``` +returnStat : 'return' expr {System.out.println("matched "+$expr.text);} ; +``` + +Using a rule label looks like this: + +``` +returnStat : 'return' e=expr {System.out.println("matched "+e.text);} ; +``` + +You can also use `$ followed by the name of the attribute to access the value associated with the currently executing rule. For example, `$start` is the starting token of the current rule. + +``` +returnStat : 'return' expr {System.out.println("first token "+$start.getText());} ; +``` + +`$r` and `$rl` evaluate to `ParserRuleContext` objects of type `RContext` for rule name `r` and rule label `rl`. `$rll` evaluates to `List` for rule list label `rll`. `$r.attr` evaluates to the type and value specified in the following table for attribute `attr`: + +|Attribute|Type|Description| +|---------|----|-----------| +|text|String|The text matched for a rule or the text matched from the start of the rule up until the point of the `$text` expression evaluation. Note that this includes the text for all tokens including those on hidden channels, which is what you want because usually that has all the whitespace and comments. When referring to the current rule, this attribute is available in any action including any exception actions.| +|start|Token|The first token to be potentially matched by the rule that is on the main token channel; in other words, this attribute is never a hidden token. For rules that end up matching no tokens, this attribute points at the first token that could have been matched by this rule. When referring to the current rule, this attribute is available to any action within the rule.| +|stop|Token|The last nonhidden channel token to be matched by the rule. When referring to the current rule, this attribute is available only to the after and finally actions.| +|ctx|ParserRuleContext|The rule context object associated with a rule invocation. All of the other attributes are available through this attribute. For example, `$ctx.start` accesses the start field within the current rules context object. It’s the same as `$start`.| + +## Dynamically-Scoped Attributes + +You can pass information to and from rules using parameters and return values, just like functions in a general-purpose programming language. Programming languages don’t allow functions to access the local variables or parameters of invoking functions, however. For example, the following reference to local variable xfrom a nested method call is illegal in Java: + +```java +void f() { + int x = 0; + g(); +} +void g() { + h(); +} +void h() { + int y = x; // INVALID reference to f's local variable x +} +``` + +Variable x is available only within the scope of f, which is the text lexically delimited by curly brackets. For this reason, Java is said to use lexical scoping. Lexical scoping is the norm for most programming languages. Languages that allow methods further down in the call chain to access local variables defined earlier are said to use dynamic scoping. The term dynamic refers to the fact that a compiler cannot statically determine the set of visible variables. This is because the set of variables visible to a method changes depending on who calls that method. + +It turns out that, in the grammar realm, distant rules sometimes need to communicate with each other, mostly to provide context information to rules matched below in the rule invocation chain. (Naturally, this assumes that you are using actions directly in the grammar instead of the parse-tree listener event mechanism.) ANTLR allows dynamic scoping in that actions can access attributes from invoking rules using syntax `$r::x` where `r` is a rule name and `x` is an attribute within that rule. It is up to the programmer to ensure that `r` is in fact an invoking rule of the current rule. A runtime exception occurs if `r` is not in the current call chain when you access `$r::x`. + +To illustrate the use of dynamic scoping, consider the real problem of defining variables and ensuring that variables in expressions are defined. The following grammar defines the symbols attribute where it belongs in the block rule but adds variable names to it in rule `decl`. Rule `stat` then consults the list to see whether variables have been defined. + +``` +grammar DynScope; + +prog: block ; + +block + /* List of symbols defined within this block */ + locals [ + List symbols = new ArrayList() + ] + : '{' decl* stat+ '}' + // print out all symbols found in block + // $block::symbols evaluates to a List as defined in scope + {System.out.println("symbols="+$symbols);} + ; + +/** Match a declaration and add identifier name to list of symbols */ +decl: 'int' ID {$block::symbols.add($ID.text);} ';' ; + +/** Match an assignment then test list of symbols to verify + * that it contains the variable on the left side of the assignment. + * Method contains() is List.contains() because $block::symbols + * is a List. + */ +stat: ID '=' INT ';' + { + if ( !$block::symbols.contains($ID.text) ) { + System.err.println("undefined variable: "+$ID.text); + } + } + | block + ; + +ID : [a-z]+ ; +INT : [0-9]+ ; +WS : [ \t\r\n]+ -> skip ; +``` + +Here’s a simple build and test sequence: + +```bash +$ antlr4 DynScope.g4 +$ javac DynScope*.java +$ grun DynScope prog +=> { +=> int i; +=> i = 0; +=> j = 3; +=> } +=> EOF +<= undefined variable: j + symbols=[i] +``` + +There’s an important difference between a simple field declaration in a `@members` action and dynamic scoping. symbols is a local variable and so there is a copy for each invocation of rule `block`. That’s exactly what we want for nested blocks so that we can reuse the same input variable name in an inner block. For example, the following nested code block redefines i in the inner scope. This new definition must hide the definition in the outer scope. + +``` +{ + int i; + int j; + i = 0; + { + int i; + int x; + x = 5; + } + x = 3; +} +``` + +Here’s the output generated for that input by DynScope: + +```bash +$ grun DynScope prog nested-input +symbols=[i, x] +undefined variable: x +symbols=[i, j] +``` + +Referencing `$block::symbols` accesses the `symbols` field of the most recently invoked `block`’s rule context object. If you need access to a symbols instance from a rule invocation farther up the call chain, you can walk backwards starting at the current context, `$ctx`. Use `getParent` to walk up the chain. diff --git a/doc/adding-tests.md b/doc/adding-tests.md new file mode 100644 index 000000000..769383d31 --- /dev/null +++ b/doc/adding-tests.md @@ -0,0 +1,119 @@ +# Adding unit tests + +## Generating Runtime Tests + +Because ANTLR supports multiple target languages, the unit tests are broken into two groups: the unit tests that test the tool itself (in `tool-testsuite`) and the unit tests that test the parser runtimes (in antlr4/runtime-testsuite). To avoid a lot of cut-and-paste, we generate all **runtime** tests from a set of templates using [runtime-testsuite/src/org/antlr/v4/testgen/TestGenerator.java](../runtime-testsuite/src/org/antlr/v4/testgen/TestGenerator.java). The `mvn` command is simple to use: + +``` +$ cd ~/antlr/code/antlr4/runtime-testsuite +$ mvn -Pgen generate-test-sources +... +rootDir = /Users/parrt/antlr/code/antlr4/runtime-testsuite +outputDir = /Users/parrt/antlr/code/antlr4/runtime-testsuite/test +templates = /Users/parrt/antlr/code/antlr4/runtime-testsuite/resources/org/antlr/v4/test/runtime/templates +target = ALL +browsers = false +viz = false +``` + +It basically runs the Java program: + +```bash +$ java org.antlr.v4.testgen.TestGenerator \ + -root ~/antlr/code/antlr4/runtime-testsuite \ + -outdir ~/antlr/code/antlr4/runtime-testsuite/test \ + -templates ~/antlr/code/antlr4/runtime-testsuite/resources/org/antlr/v4/test/runtime/templates +``` + +## Adding a runtime test + +For each target, you will find an `Index.stg` file with a dictionary of all test groups. E.g., `runtime-testsuite/resources/org/antlr/v4/test/runtime/templates/Index.stg` looks like: + +``` +TestFolders ::= [ + "CompositeLexers": [], + "CompositeParsers": [], + "FullContextParsing": [], + "LeftRecursion": [], + "LexerErrors": [], + "LexerExec": [], + "Listeners": [], + "ParserErrors": [], + "ParserExec": [], + "ParseTrees": [], + "Performance": [], + "SemPredEvalLexer": [], + "SemPredEvalParser": [], + "Sets": [] +] +``` + +Then each group has a subdirectory with another index. E.g., `Sets/Index.stg` looks like: + +``` +TestTemplates ::= [ + "SeqDoesNotBecomeSet": [], + "ParserSet": [], + "ParserNotSet": [], + "ParserNotToken": [], + "ParserNotTokenWithLabel": [], + "RuleAsSet": [], + "NotChar": [], + "OptionalSingleElement": [], +... +``` + +For every name mentioned, you will find a `.stg` file with the actual test. E.g., `Sets/StarSet.stg`: + +``` +TestType() ::= "Parser" + +Options ::= [ + "Debug": false +] + +Grammar ::= [ + "T": {} +] + +Input() ::= "abaac" + +Rule() ::= "a" + +Output() ::= << +abaac<\n> +>> + +Errors() ::= "" + +grammar(grammarName) ::= << +grammar ; +a : ('a'|'b')* 'c' {} ; +>> +``` + +### Cross-language actions embedded within grammars + +To get: + +``` +System.out.println($set.stop); +``` + +Use instead the language-neutral: + +``` + +``` + +File `runtime-testsuite/resources/org/antlr/v4/test/runtime/java/Java.test.stg` has templates like: + +``` +writeln(s) ::= <);>> +``` + +## Adding an ANTLR tool unit test + +Just go into the appropriate Java test class in dir `antlr4/tool-testsuite/test/org/antlr/v4/test/tool` and add your unit test. + + diff --git a/doc/building-antlr.md b/doc/building-antlr.md new file mode 100644 index 000000000..f6780d970 --- /dev/null +++ b/doc/building-antlr.md @@ -0,0 +1,125 @@ +# Building ANTLR + +Most programmers do not need the information on this page because they will simply download the appropriate jar(s) or use ANTLR through maven (via ANTLR's antlr4-maven-plugin). If you would like to fork the project and fix bugs or tweak the runtime code generation, then you will almost certainly need to build ANTLR itself. There are two components: + + 1. the tool that compiles grammars down into parsers and lexers in one of the target languages + 1. the runtime used by those generated parsers and lexers. + +I will assume that the root directory is `/tmp` for the purposes of explaining how to build ANTLR in this document. + +# Get the source + +The first step is to get the Java source code from the ANTLR 4 repository at github. You can download the repository from github, but the easiest thing to do is simply clone the repository on your local disk: + +```bash +$ cd /tmp +/tmp $ git clone git@github.com:antlr/antlr4.git +Cloning into 'antlr4'... +remote: Counting objects: 43273, done. +remote: Compressing objects: 100% (57/57), done. +remote: Total 43273 (delta 26), reused 0 (delta 0) +Receiving objects: 100% (43273/43273), 18.76 MiB | 1.60 MiB/s, done. +Resolving deltas: 100% (22419/22419), done. +Checking connectivity... done. +``` + +# Compile + +```bash +$ cd /tmp +$ git clone git@github.com:antlr/antlr4.git +Cloning into 'antlr4'... +remote: Counting objects: 59858, done. +remote: Compressing objects: 100% (57/57), done. +remote: Total 59858 (delta 28), reused 9 (delta 9), pack-reused 59786 +Receiving objects: 100% (59858/59858), 31.10 MiB | 819.00 KiB/s, done. +Resolving deltas: 100% (31898/31898), done. +Checking connectivity... done. +$ cd antlr4 +$ mvn compile +.. +[INFO] Reactor Summary: +[INFO] +[INFO] ANTLR 4 ............................................ SUCCESS [ 0.447 s] +[INFO] ANTLR 4 Runtime .................................... SUCCESS [ 3.113 s] +[INFO] ANTLR 4 Tool ....................................... SUCCESS [ 14.408 s] +[INFO] ANTLR 4 Maven plugin ............................... SUCCESS [ 1.276 s] +[INFO] ANTLR 4 Runtime Test Generator ..................... SUCCESS [ 0.773 s] +[INFO] ANTLR 4 Tool Tests ................................. SUCCESS [ 6.920 s] +[INFO] ------------------------------------------------------------------------ +[INFO] BUILD SUCCESS +... +``` + +# Testing tool and targets + +In order to perform the tests on all target languages, make sure that you have `mono` and `nodejs` installed. For example, on OS X: + +```bash +$ brew install mono +$ brew install node +``` + +To run the tests and **install into local repository** `~/.m2/repository/org/antlr`, do this: + +```bash +$ mvn install +... +------------------------------------------------------- + T E S T S +------------------------------------------------------- +Running org.antlr.v4.test.runtime.csharp.TestCompositeLexers +dir /var/folders/s1/h3qgww1x0ks3pb30l8t1wgd80000gn/T/TestCompositeLexers-1446068612451 +Starting build /usr/bin/xbuild /p:Configuration=Release /var/folders/s1/h3qgww1x0ks3pb30l8t1wgd80000gn/T/TestCompositeLexers-1446068612451/Antlr4.Test.mono.csproj +dir /var/folders/s1/h3qgww1x0ks3pb30l8t1wgd80000gn/T/TestCompositeLexers-1446068615081 +Starting build /usr/bin/xbuild /p:Configuration=Release /var/folders/s1/h3qgww1x0ks3pb30l8t1wgd80000gn/T/TestCompositeLexers-1446068615081/Antlr4.Test.mono.csproj +Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.451 sec +Running org.antlr.v4.test.runtime.csharp.TestCompositeParsers +dir /var/folders/s1/h3qgww1x0ks3pb30l8t1wgd80000gn/T/TestCompositeParsers-1446068615864 +antlr reports warnings from [-visitor, -Dlanguage=CSharp, -o, /var/folders/s1/h3qgww1x0ks3pb30l8t1wgd80000gn/T/TestCompositeParsers-1446068615864, -lib, /var/folders/s1/h3qgww1x0ks3pb30l8t1wgd80000gn/T/TestCompositeParsers-1446068615864, -encoding, UTF-8, /var/folders/s1/h3qgww1x0ks3pb30l8t1wgd80000gn/T/TestCompositeParsers-1446068615864/M.g4] +... +[INFO] ------------------------------------------------------------------------ +[INFO] Reactor Summary: +[INFO] +[INFO] ANTLR 4 ............................................ SUCCESS [ 0.462 s] +[INFO] ANTLR 4 Runtime .................................... SUCCESS [ 9.163 s] +[INFO] ANTLR 4 Tool ....................................... SUCCESS [ 3.683 s] +[INFO] ANTLR 4 Maven plugin ............................... SUCCESS [ 1.897 s] +[INFO] ANTLR 4 Runtime Test Generator ..................... SUCCESS [07:11 min] +[INFO] ANTLR 4 Tool Tests ................................. SUCCESS [ 16.694 s] +[INFO] ------------------------------------------------------------------------ +[INFO] BUILD SUCCESS +[INFO] ------------------------------------------------------------------------ +[INFO] Total time: 07:43 min +... +``` + +You should see these jars (building 4.5.2-SNAPSHOT): + +```bash +/Users/parrt/.m2/repository/org/antlr $ find antlr4* -name '*.jar' +antlr4/4.5/antlr4-4.5.jar +antlr4/4.5.2-SNAPSHOT/antlr4-4.5.2-SNAPSHOT-tests.jar +antlr4/4.5.2-SNAPSHOT/antlr4-4.5.2-SNAPSHOT.jar +antlr4-maven-plugin/4.5/antlr4-maven-plugin-4.5.jar +antlr4-maven-plugin/4.5.2-SNAPSHOT/antlr4-maven-plugin-4.5.2-SNAPSHOT.jar +antlr4-runtime/4.5/antlr4-runtime-4.5.jar +antlr4-runtime/4.5.2-SNAPSHOT/antlr4-runtime-4.5.2-SNAPSHOT.jar +antlr4-runtime-testsuite/4.5.2-SNAPSHOT/antlr4-runtime-testsuite-4.5.2-SNAPSHOT-tests.jar +antlr4-runtime-testsuite/4.5.2-SNAPSHOT/antlr4-runtime-testsuite-4.5.2-SNAPSHOT.jar +antlr4-tool-testsuite/4.5.2-SNAPSHOT/antlr4-tool-testsuite-4.5.2-SNAPSHOT.jar +``` + +Note that ANTLR is written in itself, which is why maven downloads antlr4-4.5.jar for boostrapping 4.5.2-SNAPSHOT purposes. + +To build without running the tests (saves about 8 minutes), do this: + +```bash +mvn -DskipTests install +``` + +## Building ANTLR in Intellij IDE + +After download ANTLR source, just "import project from existing sources" and click on the "Maven Projects" tab in right gutter of IDE. It should build stuff in the background automatically and look like: + + \ No newline at end of file diff --git a/doc/creating-a-language-target.md b/doc/creating-a-language-target.md new file mode 100644 index 000000000..fe6150cb0 --- /dev/null +++ b/doc/creating-a-language-target.md @@ -0,0 +1,22 @@ +# Creating an ANTLR Language Target + +This document describes how to make ANTLR generate parsers in a new language, *X*. + +## Overview + +Creating a new target involves the following key elements: + +1. For the tool, create class *X*Target as a subclass of class `Target` in package `org.antlr.v4.codegen.target`. This class describes language specific details about escape characters and strings and so on. There is very little to do here typically. +1. Create *X*.stg in directory tool/resources/org/antlr/v4/tool/templates/codegen/*X*/*X*.stg. This is a [StringTemplate](http://www.stringtemplate.org/) group file (`.stg`) that tells ANTLR how to express all of the parsing elements needed to generate code. You will see templates called `ParserFile`, `Parser`, `Lexer`, `CodeBlockForAlt`, `AltBlock`, etc... Each of these must be described how to build the indicated chunk of code. Your best bet is to find the closest existing target, copy that template file, and tweak to suit. +1. Create a runtime library to support the parsers generated by ANTLR. Under directory runtime/*X*, you are in complete control of the directory structure as dictated by common usage of that target language. For example, Java has: `runtime/Java/lib` and `runtime/Java/src` directories. Under `src`, you will find a directory structure for package `org.antlr.v4.runtime` and below. +1. Create a template file for runtime tests. All you have to do is provide a few simple templates that indicate how to print values and declare variables. Our runtime test mechanism in dir `runtime-testsuite` will automatically generate code in a new target and check the results. All it needs to know is how to generate a test rig (i.e., a `main` program), how to define various class fields, compare members and so on. You must create a *X* directory underneath `runtime-testsuite/resources/org/antlr/v4/test/runtime`. Again, your best bet is to copy the templates from the closest language to your target and tweak it to suit. + +## Getting started + +1. Fork the `antlr/antlr4` repository at github to your own user so that you have repository `username/antlr4`. +2. Clone `username/antlr4`, forked repository, to your local disk. Your remote `origin` will be the forked repository on GitHub. Add a remote `upstream` to the original `antlr/antlr4` repository (URL `https://github.com/antlr/antlr4.git`). Changes that you would like to contribute back to the project are done with [pull requests](https://help.github.com/articles/using-pull-requests/). +3. Try to build it before doing anything +```bash +$ mvn compile +``` +That should proceed with success. See [Building ANTLR](building-antlr.md) for more details. (That link does not currently work as I have that documentation in a branch. see https://github.com/parrt/antlr4/blob/move-doc-to-repo/doc/building-antlr.md for now.) diff --git a/doc/csharp-target.md b/doc/csharp-target.md new file mode 100644 index 000000000..05c2df7c1 --- /dev/null +++ b/doc/csharp-target.md @@ -0,0 +1,99 @@ +# C♯ + +See also [Sam Harwell's Alternative C# target](https://github.com/tunnelvisionlabs/antlr4cs) + +### Which frameworks are supported? + +The C# runtime is CLS compliant, and only requires a corresponding 3.5 .Net framework. + +In practice, the runtime has been extensively tested against: + +* Microsoft .Net 3.5 framework +* Mono .Net 3.5 framework + +No issue was found, so you should find that the runtime works pretty much against any recent .Net framework. + +### How do I get started? + +You will find full instructions on the Git web page for ANTLR C# runtime. + +### How do I use the runtime from my project? + +(i.e., How do I run the generated lexer and/or parser?) + +Let's suppose that your grammar is named, as above, "MyGrammar". + +Let's suppose this parser comprises a rule named "StartRule" + +The tool will have generated for you the following files: + +* MyGrammarLexer.cs +* MyGrammarParser.cs +* MyGrammarListener.cs (if you have not activated the -no-listener option) +* MyGrammarBaseListener.js (if you have not activated the -no-listener option) +* MyGrammarVisitor.js (if you have activated the -visitor option) +* MyGrammarBaseVisitor.js (if you have activated the -visitor option) + +Now a fully functioning code might look like the following: + +``` +using Antlr4.Runtime; + +public void MyParseMethod() { + String input = "your text to parse here"; + AntlrInputStream stream = new InputStream(input); + ITokenSource lexer = new MyGrammarLexer(stream); + ITokenStream tokens = new CommonTokenStream(lexer); + MyGrammarParser parser = new MyGrammarParser(tokens); + parser.buildParseTrees = true; + IParseTree tree = parser.StartRule(); +} +``` + +This program will work. But it won't be useful unless you do one of the following: + +* you visit the parse tree using a custom listener +* you visit the parse tree using a custom visitor +* your grammar comprises production code (like AntLR3) + +(please note that production code is target specific, so you can't have multi target grammars that include production code) + +### How do I create and run a custom listener? + +Let's suppose your MyGrammar grammar comprises 2 rules: "key" and "value". + +The antlr4 tool will have generated the following listener (only partial code shown here): + +``` +interface IMyGrammarParserListener : IParseTreeListener { + void EnterKey (MyGrammarParser.KeyContext context); + void ExitKey (MyGrammarParser.KeyContext context); + void EnterValue (MyGrammarParser.ValueContext context); + void ExitValue (MyGrammarParser.ValueContext context); +} +``` + +In order to provide custom behavior, you might want to create the following class: + +``` +class KeyPrinter : MyGrammarBaseListener { + // override default listener behavior + void ExitKey (MyGrammarParser.KeyContext context) { + Console.WriteLine("Oh, a key!"); + } +} +``` + +In order to execute this listener, you would simply add the following lines to the above code: + + +``` +... +IParseTree tree = parser.StartRule() - only repeated here for reference +KeyPrinter printer = new KeyPrinter(); +ParseTreeWalker.DEFAULT.walk(printer, tree); +``` + +Further information can be found from The Definitive ANTLR Reference book. + +The C# implementation of ANTLR is as close as possible to the Java one, so you shouldn't find it difficult to adapt the examples for C#. diff --git a/doc/faq/actions-preds.md b/doc/faq/actions-preds.md new file mode 100644 index 000000000..46fe9759c --- /dev/null +++ b/doc/faq/actions-preds.md @@ -0,0 +1,11 @@ +# Actions and semantic predicates + +## How do I test if an optional rule was matched? + +For optional rule references such as the initialization clause in the following + +``` +decl : 'var' ID (EQUALS expr)? ; +``` + +testing to see if that clause was matched can be done using `$EQUALS!=null` or `$expr.ctx!=null` where `$expr.ctx` points to the context or parse tree created for that reference to rule expr. \ No newline at end of file diff --git a/doc/faq/error-handling.md b/doc/faq/error-handling.md new file mode 100644 index 000000000..ebc329909 --- /dev/null +++ b/doc/faq/error-handling.md @@ -0,0 +1,5 @@ +# Error handling + +## How do I perform semantic checking with ANTLR? + +See [How to implement error handling in ANTLR4](http://stackoverflow.com/questions/21613421/how-to-implement-error-handling-in-antlr4/21615751#21615751). diff --git a/doc/faq/general.md b/doc/faq/general.md new file mode 100644 index 000000000..fb9c386f5 --- /dev/null +++ b/doc/faq/general.md @@ -0,0 +1,100 @@ +# General + +## Why do we need ANTLR v4? + +*Oliver Zeigermann asked me some questions about v4. Here is our conversation.* + +*See the [preface from the book](http://media.pragprog.com/titles/tpantlr2/preface.pdf)* + +**Q: Why is the new version of ANTLR also called “honey badger”?** + +ANTLR v4 is called the honey badger release after the fearless hero of the YouTube sensation, The Crazy Nastyass Honey Badger. + +**Q: Why did you create a new version of ANTLR?** + +Well, I start creating a new version because v3 had gotten very messy on the inside and also relied on grammars written in ANTLR v2. Unfortunately, v2's open-source license was unclear and so projects such as Eclipse could not include v3 because of its dependency on v2. In the end, Sam Harwell converted all of the v2 grammars into v3 so that v3 was written in itself. Because v3 has a very clean BSD license, the Eclipse project okayed for inclusion in that project in the summer of 2011. + +As I was rewriting ANTLR, I wanted to experiment with a new variation of the LL(\*) parsing algorithm. As luck would have it, I came up with a cool new version called adaptive LL(\*) that pushes all of the grammar analysis effort to runtime. The parser warms up like Java does with its JIT on-the-fly compiler; the code gets faster and faster the longer it runs. The benefit is that the adaptive algorithm is much stronger than the static LL(\*) grammar analysis algorithm in v3. Honey Badger takes any grammar that you give it; it just doesn't give a damn. (v4 accepts even left recursive grammars, except for indirectly left recursive grammars where x calls y which calls x). + +v4 is the culmination of 25 years of research into parsers and parser generators. I think I finally know what I want to build. :) + +**Q: What makes you excited about ANTLR4?** + +The biggest thing is the new adaptive parsing strategy, which lets us accept any grammar we care to write. That gives us a huge productivity boost because we can now write much more natural expression rules (which occur in almost every grammar). For example, bottom-up parser generators such as yacc let you write very natural grammars like this: + +``` +e : e '*' e + | e '+' e + | INT + ; +``` + +ANTLR v4 will also take that grammar now, translating it secretly to a non-left recursive version. + +Another big thing with v4 is that my goal has shifted from performance to ease-of-use. For example, ANTLR automatically can build parse trees for you and generate listeners and visitors. This is not only a huge productivity win, but also an important step forward in building grammars that don't depend on embedded actions. Those embedded actions (raw Java code or whatever) locked the grammar into use with only one language. If we keep all of the actions out of the grammar and put them into external visitors, we can reuse the same grammar to generate code in any language for which we have an ANTLR target. + +**Q: What do you think are the things people had problems with in ANTLR3?** + +The biggest problem was figuring out why ANTLR did not like their grammar. The static analysis often could not figure out how to generate a parser for the grammar. This problem totally goes away with the honey badger because it will take just about anything you give it without a whimper. + +**Q: And what with other compiler generator tools?** + +The biggest problem for the average practitioner is that most parser generators do not produce code you can load into a debugger and step through. This immediately removes bottom-up parser generators and the really powerful GLR parser generators from consideration by the average programmer. There are a few other tools that generate source code like ANTLR does, but they don't have v4's adaptive LL(\*) parsers. You will be stuck with contorting your grammar to fit the needs of the tool's weaker, say, LL(k) parsing strategy. PEG-based tools have a number of weaknesses, but to mention one, they have essentially no error recovery because they cannot report an error and until they have parsed the entire input. + +**Q: What are the main design decisions in ANTLR4?** + +Ease-of-use over performance. I will worry about performance later. Simplicity over complexity. For example, I have taken out explicit/manual AST construction facilities and the tree grammar facilities. For 20 years I've been trying to get people to go that direction, but I've since decided that it was a mistake. It's much better to give people a parser generator that can automatically build trees and then let them use pure code to do whatever tree walking they want. People are extremely familiar and comfortable with visitors, for example. + +**Q: What do you think people will like most on ANTLR4?** + +The lack of errors when you run your grammar through ANTLR. The automatic tree construction and listener/visitor generation. + +**What do you think are the problems people will try to solve with ANTLR4?** + +In my experience, almost no one uses parser generators to build commercial compilers. So, people are using ANTLR for their everyday work, building everything from configuration files to little scripting languages. + +In response to a question about this entry from stackoverflow.com: I believe that compiler developers are very concerned with parsing speed, error reporting, and error recovery. For that, they want absolute control over their parser. Also, some languages are so complicated, such as C++, that parser generators might build parsers slower than compiler developers want. The compiler developers also like the control of a recursive-descent parser for predicating the parse to handle context-sensitive constructs such as `T(i)` in C++. + +There is also likely a sense that parsing is the easy part of building a compiler so they don't immediately jump automatically to parser generators. I think this is also a function of previous generation parser generators. McPeak's Elkhound GLR-based parser generator is powerful enough and fast enough, in the hands of someone that knows what they're doing, to be suitable for compilers. I can also attest to the fact that ANTLR v4 is now powerful enough and fast enough to compete well with handbuilt parsers. E.g., after warm-up, it's now taking just 1s to parse the entire JDK java/\* library. + +## What is the difference between ANTLR 3 and 4? + +The biggest difference between ANTLR 3 and 4 is that ANTLR 4 takes any grammar you give it unless the grammar had indirect left recursion. That means we don't need syntactic predicates or backtracking so ANTLR 4 does not support that syntax; you will get a warning for using it. ANTLR 4 allows direct left recursion so that expressing things like arithmetic expression syntax is very easy and natural: + +``` +expr : expr '*' expr + | expr '+' expr + | INT + ; +``` + +ANTLR 4 automatically constructs parse trees for you and abstract syntax tree (AST) construction is no longer an option. See also What if I need ASTs not parse trees for a compiler, for example? + +Another big difference is that we discourage the use of actions directly within the grammar because ANTLR 4 automatically generates [listeners and visitors](https://raw.githubusercontent.com/antlr/antlr4/master/doc/listeners.md) for you to use that trigger method calls when some phrases of interest are recognized during a tree walk after parsing. See also [Parse Tree Matching and XPath](https://raw.githubusercontent.com/antlr/antlr4/master/doc/tree-matching.md). + +Semantic predicates are still allowed in both the parser and lexer rules as our actions. For efficiency sake keep semantic predicates to the right edge of lexical rules. + +There are no tree grammars because we use listeners and visitors instead. + +## Why is my expression parser slow? + +Make sure to use two-stage parsing. See example in [bug report](https://github.com/antlr/antlr4/issues/374). + +```Java + +CharStream input = new ANTLRFileStream(args[0]); +ExprLexer lexer = new ExprLexer(input); +CommonTokenStream tokens = new CommonTokenStream(lexer); +ExprParser parser = new ExprParser(tokens); +parser.getInterpreter().setPredictionMode(PredictionMode.SLL); +try { + parser.stat(); // STAGE 1 +} +catch (Exception ex) { + tokens.reset(); // rewind input stream + parser.reset(); + parser.getInterpreter().setPredictionMode(PredictionMode.LL); + parser.stat(); // STAGE 2 + // if we parse ok, it's LL not SLL +} +``` diff --git a/doc/faq/getting-started.md b/doc/faq/getting-started.md new file mode 100644 index 000000000..bd0d4ffae --- /dev/null +++ b/doc/faq/getting-started.md @@ -0,0 +1,11 @@ +# Getting started + +## How to I install and run a simple grammar? + +See [Getting Started with ANTLR v4](https://raw.githubusercontent.com/antlr/antlr4/master/doc/getting-started.md). + +## Why does my parser test program hang? + +Your test program is likely not hanging but simply waiting for you to type some input for standard input. Don't forget that you need to type the end of file character, generally on a line by itself, at the end of the input. On a Mac or Linux machine it is ctrl-D, as gawd intended, or ctrl-Z on a Windows machine. + +See [Getting Started with ANTLR v4](https://raw.githubusercontent.com/antlr/antlr4/master/doc/getting-started.md). \ No newline at end of file diff --git a/doc/faq/index.md b/doc/faq/index.md new file mode 100644 index 000000000..734fc6c13 --- /dev/null +++ b/doc/faq/index.md @@ -0,0 +1,50 @@ +# Frequently-Asked Questions (FAQ) + +This is the main landing page for the ANTLR 4 FAQ. The links below will take you to the appropriate file containing all answers for that subcategory. + +*To add to or improve this FAQ, [fork](https://help.github.com/articles/fork-a-repo/) the [antlr/antlr4 repo](https://github.com/antlr/antlr4) then update this `doc/faq/index.md` or file(s) in that directory. Submit a [pull request](https://help.github.com/articles/creating-a-pull-request/) to get your changes incorporated into the main repository. Do not mix code and FAQ updates in the sample pull request.* **You must sign the contributors.txt certificate of origin with your pull request if you've not done so before.** + +## Getting Started + +* [How to I install and run a simple grammar?](getting-started.md) +* [Why does my parser test program hang?](getting-started.md) + +## Installation + +* [Why can't ANTLR (grun) find my lexer or parser?](installation.md) +* [Why can't I run the ANTLR tool?](installation.md) +* [Why doesn't my parser compile?](installation.md) + +## General + +* [Why do we need ANTLR v4?](general.md) +* [What is the difference between ANTLR 3 and 4?](general.md) +* [Why is my expression parser slow?](general.md) + +## Grammar syntax + +## Lexical analysis + +* [How can I parse non-ASCII text and use characters in token rules?](lexical.md) +* [How do I replace escape characters in string tokens?](lexical.md) +* [Why are my keywords treated as identifiers?](lexical.md) +* [Why are there no whitespace tokens in the token stream?](lexical.md) + +## Parse Trees + +* [How do I get the input text for a parse-tree subtree?](parse-trees.md) +* [What if I need ASTs not parse trees for a compiler, for example?](parse-trees.md) +* [When do I use listener/visitor vs XPath vs Tree pattern matching?](parse-trees.md) + +## Translation + +* [ASTs vs parse trees](parse-trees.md) +* [Decoupling input walking from output generation](parse-trees.md) + +## Actions and semantic predicates + +* [How do I test if an optional rule was matched?](actions-preds.md) + +## Error handling + +* [How do I perform semantic checking with ANTLR?](error-handling.md) diff --git a/doc/faq/installation.md b/doc/faq/installation.md new file mode 100644 index 000000000..fb945dac8 --- /dev/null +++ b/doc/faq/installation.md @@ -0,0 +1,60 @@ +# Installation + +Please read carefully: [Getting Started with ANTLR v4](https://raw.githubusercontent.com/antlr/antlr4/master/doc/getting-started.md). + +## Why can't ANTLR (grun) find my lexer or parser? + +If you see "Can't load Hello as lexer or parser", it's because you don't have '.' (current directory) in your CLASSPATH. + +```bash +$ alias antlr4='java -jar /usr/local/lib/antlr-4.2.2-complete.jar' +$ alias grun='java org.antlr.v4.runtime.misc.TestRig' +$ export CLASSPATH="/usr/local/lib/antlr-4.2.2-complete.jar" +$ antlr4 Hello.g4 +$ javac Hello*.java +$ grun Hello r -tree +Can't load Hello as lexer or parser +$ +``` + +For mac/linux, use: + +```bash +export CLASSPATH=".:/usr/local/lib/antlr-4.2.2-complete.jar:$CLASSPATH" +``` + +or for Windows: + +``` +SET CLASSPATH=.;C:\Javalib\antlr4-complete.jar;%CLASSPATH% +``` + +**See the dot at the beginning?** It's critical. + +## Why can't I run the ANTLR tool? + +If you get a no class definition found error, you are missing the ANTLR jar in your `CLASSPATH` (or you might only have the runtime jar): + +```bash +/tmp $ java org.antlr.v4.Tool Hello.g4 +Exception in thread "main" java.lang.NoClassDefFoundError: org/antlr/v4/Tool +Caused by: java.lang.ClassNotFoundException: org.antlr.v4.Tool + at java.net.URLClassLoader$1.run(URLClassLoader.java:202) + at java.security.AccessController.doPrivileged(Native Method) + at java.net.URLClassLoader.findClass(URLClassLoader.java:190) + at java.lang.ClassLoader.loadClass(ClassLoader.java:306) + at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) + at java.lang.ClassLoader.loadClass(ClassLoader.java:247) +``` + +## Why doesn't my parser compile? + +If you see these kinds of errors, it's because you don't have the runtime or complete ANTLR library in your CLASSPATH. + +```bash +/tmp $ javac Hello*.java +HelloBaseListener.java:3: package org.antlr.v4.runtime does not exist +import org.antlr.v4.runtime.ParserRuleContext; + ^ +... +``` diff --git a/doc/faq/lexical.md b/doc/faq/lexical.md new file mode 100644 index 000000000..371d1f93b --- /dev/null +++ b/doc/faq/lexical.md @@ -0,0 +1,63 @@ +# Lexical analysis + +## How can I parse non-ASCII text and use characters in token rules? + +See [Using non-ASCII characters in token rules](http://stackoverflow.com/questions/28126507/antlr4-using-non-ascii-characters-in-token-rules/28129510#28129510). + +## How do I replace escape characters in string tokens? + +Unfortunately, manipulating the text of the token matched by a lexical rule is cumbersome (as of 4.2). You have to build up a buffer and then set the text at the end. Actions in the lexer execute at the associated position in the input just like they do in the parser. Here's an example that does escape character replacement in strings. It's not pretty but it works. + +``` +grammar Foo; + +@members { +StringBuilder buf = new StringBuilder(); // can't make locals in lexer rules +} + +STR : '"' + ( '\\' + ( 'r' {buf.append('\r');} + | 'n' {buf.append('\n');} + | 't' {buf.append('\t');} + | '\\' {buf.append('\\');} + | '\"' {buf.append('"');} + ) + | ~('\\'|'"') {buf.append((char)_input.LA(-1));} + )* + '"' + {setText(buf.toString()); buf.setLength(0); System.out.println(getText());} + ; +``` + +It's easier and more efficient to return original input string and then use a small function to rewrite the string later during a parse tree walk or whatever. But, here's how to do it from within the lexer. + +Lexer actions don't work in the interpreter, which includes xpath and tree patterns. + +For more on the argument against doing complicated things in the lexer, see the [related lexer-action issue at github](https://github.com/antlr/antlr4/issues/483#issuecomment-37326067). + +## Why are my keywords treated as identifiers? + +Keywords such as `begin` are also valid identifiers lexically and so that input is ambiguous. To resolve ambiguities, ANTLR gives precedence to the lexical rules specified first. That implies that you must put the identifier rule after all of your keywords: + +``` +grammar T; + +decl : DEF 'int' ID ';' + +DEF : 'def' ; // ambiguous with ID as is 'int' +ID : [a-z]+ ; +``` + +Notice that literal `'int'` is also physically before the ID rule and will also get precedence. + +## Why are there no whitespace tokens in the token stream? + +The lexer is not sending white space to the parser, which means that the rewrite stream doesn't have access to the tokens either. It is because of the skip lexer command: + +``` +WS : [ \t\r\n\u000C]+ -> skip + ; +``` + +You have to change all those to `-> channel(HIDDEN)` which will send them to the parser on a different channel, making them available in the token stream, but invisible to the parser. \ No newline at end of file diff --git a/doc/faq/parse-trees.md b/doc/faq/parse-trees.md new file mode 100644 index 000000000..5a243cedb --- /dev/null +++ b/doc/faq/parse-trees.md @@ -0,0 +1,73 @@ +# Parse Trees + +## How do I get the input text for a parse-tree subtree? + +In ParseTree, you have this method: + +```java +/** Return the combined text of all leaf nodes. Does not get any + * off-channel tokens (if any) so won't return whitespace and + * comments if they are sent to parser on hidden channel. + */ +String getText(); +``` + +But, you probably want this method from TokenStream: + +```java +/** + * Return the text of all tokens in the source interval of the specified + * context. This method behaves like the following code, including potential + * exceptions from the call to {@link #getText(Interval)}, but may be + * optimized by the specific implementation. + * + *

If {@code ctx.getSourceInterval()} does not return a valid interval of + * tokens provided by this stream, the behavior is unspecified.

+ * + *
+ * TokenStream stream = ...;
+ * String text = stream.getText(ctx.getSourceInterval());
+ * 
+ * + * @param ctx The context providing the source interval of tokens to get + * text for. + * @return The text of all tokens within the source interval of {@code ctx}. + */ +public String getText(RuleContext ctx); +``` + +That is, do this: + +``` +mytokens.getText(mySubTree); +``` + +## What if I need ASTs not parse trees for a compiler, for example? + +For writing a compiler, either generate [LLVM-type static-single-assignment](http://llvm.org/docs/LangRef.html) form or construct an AST from the parse tree using a listener or visitor. Or, use actions in grammar, turning off auto-parse-tree construction. + +## When do I use listener/visitor vs XPath vs Tree pattern matching? + +### XPath + +XPath works great when you need to find specific nodes, possibly in certain contexts. The context is limited to the parents on the way to the root of the tree. For example, if you want to find all ID nodes, use path `//ID`. If you want all variable declarations, you might use path `//vardecl`. If you only want fields declarations, then you can use some context information via path `/classdef/vardecl`, which would only find vardecls that our children of class definitions. You can merge the results of multiple XPath `findAll()`s simulating a set union for XPath. The only caveat is that the order from the original tree is not preserved when you union multiple `findAll()` sets. + +### Tree pattern matching + +Use tree pattern matching when you want to find specific subtree structures such as all assignments to 0 using pattern `x = 0;`. (Recall that these are very convenient because you specify the tree structure in the concrete syntax of the language described by the grammar.) If you want to find all assignments of any kind, you can use pattern `x = ;` where `` will find any expression. This works great for matching particular substructures and therefore gives you a bit more ability to specify context. I.e., instead of just finding all identifiers, you can find all identifiers on the left hand side of an expression. + +### Listeners/Visitors + +Using the listener or visitor interfaces give you the most power but require implementing more methods. It might be more challenging to discover the emergent behavior of the listener than a simple tree pattern matcher that says *go find me X under node Y*. + +Listeners are great when you want to visit many nodes in a tree. + +Listeners allow you to compute and save context information necessary for processing at various nodes. For example, when building a symbol table manager for a compiler or translator, you need to compute symbol scopes such as globals, class, function, and code block. When you enter a class or function, you push a new scope and then pop it when you exit that class or function. When you see a symbol, you need to define it or look it up in the proper scope. By having enter/exit listener functions push and pop scopes, listener functions for defining variables simply say something like: + +```java +scopeStack.peek().define(new VariableSymbol("foo")) +``` + +That way each listener function does not have to compute its appropriate scope. + +Examples: [DefScopesAndSymbols.java](https://github.com/mantra/compiler/blob/master/src/java/mantra/semantics/DefScopesAndSymbols.java) and [SetScopeListener.java](https://github.com/mantra/compiler/blob/master/src/java/mantra/semantics/SetScopeListener.java) and [VerifyListener.java](https://github.com/mantra/compiler/blob/master/src/java/mantra/semantics/VerifyListener.java) \ No newline at end of file diff --git a/doc/faq/translation.md b/doc/faq/translation.md new file mode 100644 index 000000000..3f452128f --- /dev/null +++ b/doc/faq/translation.md @@ -0,0 +1,9 @@ +# Translation + +## ASTs vs parse trees + +I used to do specialized AST (**abstract** syntax tree) nodes rather than (concrete) parse trees because I used to think more about compilation and generating bytecode/assembly code. When I started thinking more about translation, I started using parse trees. For v4, I realized that I did mostly translation. I guess what I'm saying is that maybe parse trees are not as good as ASTs for generating bytecodes. Personally, I would rather see `(+ 3 4)` rather than `(expr 3 + 4)` for generating byte codes, but it's not the end of the world. (*Can someone fill this in?*) + +## Decoupling input walking from output generation + +I suggest creating an intermediate model that represents your output. You walk the parse tree to collect information and create your model. Then, you could almost certainly automatically walk this internal model to generate output based upon stringtemplates that match the class names of the internal model. In other words, define a special `IFStatement` object that has all of the fields you want and then create them as you walk the parse tree. This decoupling of the input from the output is very powerful. Just because we have a parse tree listener doesn't mean that the parse tree itself is necessarily the best data structure to hold all information necessary to generate code. Imagine a situation where the output is the exact reverse of the input. In that case, you really want to walk the input just to collect data. Generating output should be driven by the internal model not the way it was represented in the input. \ No newline at end of file diff --git a/doc/getting-started.md b/doc/getting-started.md new file mode 100644 index 000000000..2530ba4a4 --- /dev/null +++ b/doc/getting-started.md @@ -0,0 +1,131 @@ +# Getting Started with ANTLR v4 + +Hi and welcome to the version 4 release of ANTLR! It's named after the fearless hero of the [Crazy Nasty-Ass Honey Badger](http://www.youtube.com/watch?v=4r7wHMg5Yjg) since ANTLR v4 takes whatever you give it--it just doesn't give a crap! See [Why do we need ANTLR v4?](faq/general.md) and the [preface of the ANTLR v4 book](http://media.pragprog.com/titles/tpantlr2/preface.pdf). + +## Installation + +ANTLR is really two things: a tool that translates your grammar to a parser/lexer in Java (or other target language) and the runtime needed by the generated parsers/lexers. Even if you are using the ANTLR Intellij plug-in or ANTLRWorks to run the ANTLR tool, the generated code will still need the runtime library. + +The first thing you should do is probably download and install a development tool plug-in. Even if you only use such tools for editing, they are great. Then, follow the instructions below to get the runtime environment available to your system to run generated parsers/lexers. In what follows, I talk about antlr-4.5-complete.jar, which has the tool and the runtime and any other support libraries (e.g., ANTLR v4 is written in v3). + +If you are going to integrate ANTLR into your existing build system using mvn, ant, or want to get ANTLR into your IDE such as eclipse or intellij, see Integrating ANTLR into Development Systems. + +### UNIX + +0. Install Java (version 1.6 or higher) +1. Download +``` +$ cd /usr/local/lib +$ curl -O http://www.antlr.org/download/antlr-4.5-complete.jar +``` +Or just download in browser from website: + [http://www.antlr.org/download.html](http://www.antlr.org/download.html) +and put it somewhere rational like `/usr/local/lib`. +2. Add `antlr-4.5-complete.jar` to your `CLASSPATH`: +``` +$ export CLASSPATH=".:/usr/local/lib/antlr-4.5-complete.jar:$CLASSPATH" +``` +It's also a good idea to put this in your `.bash_profile` or whatever your startup script is. +3. Create aliases for the ANTLR Tool, and `TestRig`. +``` +$ alias antlr4='java -Xmx500M -cp "/usr/local/lib/antlr-4.5-complete.jar:$CLASSPATH" org.antlr.v4.Tool' +$ alias grun='java org.antlr.v4.runtime.misc.TestRig' +``` + +### WINDOWS + +(*Thanks to Graham Wideman*) + +0. Install Java (version 1.6 or higher) +1. Download antlr-4.5-complete.jar (or whatever version) from [http://www.antlr.org/download/](http://www.antlr.org/download/) +Save to your directory for 3rd party Java libraries, say `C:\Javalib` +2. Add `antlr-4.5-complete.jar` to CLASSPATH, either: + * Permanently: Using System Properties dialog > Environment variables > Create or append to `CLASSPATH` variable + * Temporarily, at command line: +``` +SET CLASSPATH=.;C:\Javalib\antlr-4.5-complete.jar;%CLASSPATH% +``` +3. Create short convenient commands for the ANTLR Tool, and TestRig, using batch files or doskey commands: + * Batch files (in directory in system PATH) antlr4.bat and grun.bat +``` +java org.antlr.v4.Tool %* +``` +``` +java org.antlr.v4.runtime.misc.TestRig %* +``` + * Or, use doskey commands: +``` +doskey antlr4=java org.antlr.v4.Tool $* +doskey grun =java org.antlr.v4.runtime.misc.TestRig $* +``` + +### Testing the installation + +Either launch org.antlr.v4.Tool directly: + +``` +$ java org.antlr.v4.Tool +ANTLR Parser Generator Version 4.5 +-o ___ specify output directory where all output is generated +-lib ___ specify location of .tokens files +... +``` + +or use -jar option on java: + +``` +$ java -jar /usr/local/lib/antlr-4.5-complete.jar +ANTLR Parser Generator Version 4.5 +-o ___ specify output directory where all output is generated +-lib ___ specify location of .tokens files +... +``` + +## A First Example + +In a temporary directory, put the following grammar inside file Hello.g4: +Hello.g4 + +``` +// Define a grammar called Hello +grammar Hello; +r : 'hello' ID ; // match keyword hello followed by an identifier +ID : [a-z]+ ; // match lower-case identifiers +WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines +``` + +Then run ANTLR the tool on it: + +``` +$ cd /tmp +$ antlr4 Hello.g4 +$ javac Hello*.java +``` + +Now test it: + +``` +$ grun Hello r -tree +hello parrt +^D +(r hello parrt) +(That ^D means EOF on unix; it's ^Z in Windows.) The -tree option prints the parse tree in LISP notation. +It's nicer to look at parse trees visually. +$ grun Hello r -gui +hello parrt +^D +``` + +That pops up a dialog box showing that rule `r` matched keyword `hello` followed by identifier `parrt`. + +![](images/hello-parrt.png) + +## Book source code + +The book has lots and lots of examples that should be useful to. You can download them here for free: + +[http://pragprog.com/titles/tpantlr2/source_code](http://pragprog.com/titles/tpantlr2/source_code) + +Also, there is a large collection of grammars for v4 at github: + +[https://github.com/antlr/grammars-v4](https://github.com/antlr/grammars-v4) diff --git a/doc/grammars.md b/doc/grammars.md new file mode 100644 index 000000000..c40d974b6 --- /dev/null +++ b/doc/grammars.md @@ -0,0 +1,184 @@ +# Grammar Structure + +A grammar is essentially a grammar declaration followed by a list of rules, but has the general form: + +``` +/** Optional javadoc style comment */ +grammar Name; ① +options {...} +import ... ; + +tokens {...} +channels {...} // lexer only +@actionName {...} + +rule1 // parser and lexer rules, possibly intermingled +... +ruleN +``` + +The file name containing grammar `X` must be called `X.g4`. You can specify options, imports, token specifications, and actions in any order. There can be at most one each of options, imports, and token specifications. All of those elements are optional except for the header ① and at least one rule. Rules take the basic form: + +``` +ruleName : alternative1 | ... | alternativeN ; +``` + +Parser rule names must start with a lowercase letter and lexer rules must start with a capital letter. + +Grammars defined without a prefix on the `grammar` header are combined grammars that can contain both lexical and parser rules. To make a parser grammar that only allows parser rules, use the following header. + +``` +parser grammar Name; +... +``` + +And, naturally, a pure lexer grammar looks like this: + +``` +lexer grammar Name; +... +``` + +Only lexer grammars can contain `mode` specifications. + +Only lexer grammars can contain custom channels specifications + +``` +channels { + WHITESPACE_CHANNEL, + COMMENTS_CHANNEL +} +``` + +Those channels can then be used like enums within lexer rules: + +``` +WS : [ \r\t\n]+ -> channel(WHITESPACE_CHANNEL) ; +``` + +Sections 15.5, [Lexer Rules](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference) and Section 15.3, [Parser Rules](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference) contain details on rule syntax. Section 15.8, Options describes grammar options and Section 15.4, Actions and Attributes has information on grammar-level actions. + +## Grammar Imports + +Grammar `imports` let you break up a grammar into logical and reusable chunks, as we saw in [Importing Grammars](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference). ANTLR treats imported grammars very much like object-oriented programming languages treat superclasses. A grammar inherits all of the rules, tokens specifications, and named actions from the imported grammar. Rules in the “main grammar” override rules from imported grammars to implement inheritance. + +Think of `import` as more like a smart include statement (which does not include rules that are already defined). The result of all imports is a single combined grammar; the ANTLR code generator sees a complete grammar and has no idea there were imported grammars. + +To process a main grammar, the ANTLR tool loads all of the imported grammars into subordinate grammar objects. It then merges the rules, token types, and named actions from the imported grammars into the main grammar. In the diagram below, the grammar on the right illustrates the effect of grammar `MyELang` importing grammar `ELang`. + + + +`MyELang` inherits rules `stat`, `WS`, and `ID`, but overrides rule `expr` and adds `INT`. Here’s a sample build and test run that shows `MyELang` can recognize integer expressions whereas the original `ELang` can’t. The third, erroneous input statement triggers an error message that also demonstrates the parser was looking for `MyELang`’s expr not `ELang`’s. + +``` +$ antlr4 MyELang.g4 +$ javac MyELang*.java +$ grun MyELang stat +=> 34; +=> a; +=> ; +=> EOF +<= line 3:0 extraneous input ';' expecting {INT, ID} +``` + +If there were any `tokens` specifications, the main grammar would merge the token sets. Any named actions such as `@members` would be merged. In general, you should avoid named actions and actions within rules in imported grammars since that limits their reuse. ANTLR also ignores any options in imported grammars. + +Imported grammars can also import other grammars. ANTLR pursues all imported grammars in a depth-first fashion. If two or more imported grammars define rule `r`, ANTLR chooses the first version of `r` it finds. In the following diagram, ANTLR examines grammars in the following order `Nested`, `G1`, `G3`, `G2`. + + + +`Nested` includes the `r` rule from `G3` because it sees that version before the `r` in `G2`. + +Not every kind of grammar can import every other kind of grammar: + +* Lexer grammars can import lexers. +* Parsers can import parsers. +* Combined grammars can import lexers or parsers. + +ANTLR adds imported rules to the end of the rule list in a main lexer grammar. That means lexer rules in the main grammar get precedence over imported rules. For example, if a main grammar defines rule `IF : ’if’ ;` and an imported grammar defines rule `ID : [a-z]+ ;` (which also recognizes `if`), the imported `ID` won’t hide the main grammar’s `IF` token definition. + +## Tokens Section + +The purpose of the `tokens` section is to define token types needed by a grammar for which there is no associated lexical rule. The basic syntax is: + +``` +tokens { Token1, ..., TokenN } +``` + +Most of the time, the tokens section is used to define token types needed by actions in the grammar as shown in Section 10.3, [Recognizing Languages whose Keywords Aren’t Fixed](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference): + +``` +// explicitly define keyword token types to avoid implicit definition warnings +tokens { BEGIN, END, IF, THEN, WHILE } + +@lexer::members { // keywords map used in lexer to assign token types +Map keywords = new HashMap() {{ + put("begin", KeywordsParser.BEGIN); + put("end", KeywordsParser.END); + ... +}}; +} +``` + +The `tokens` section really just defines a set of tokens to add to the overall set. + +``` +$ cat Tok.g4 +grammar Tok; +tokens { A, B, C } +a : X ; +$ antlr4 Tok.g4 +warning(125): Tok.g4:3:4: implicit definition of token X in parser +$ cat Tok.tokens +A=1 +B=2 +C=3 +X=4 +``` + +## Actions at the Grammar Level + +Currently there are only two defined named actions (for the Java target) used outside of grammar rules: `header` and `members`. The former injects code into the generated recognizer class file, before the recognizer class definition, and the latter injects code into the recognizer class definition, as fields and methods. + +For combined grammars, ANTLR injects the actions into both the parser and the lexer. To restrict an action to the generated parser or lexer, use `@parser::name` or `@lexer::name`. + +Here’s an example where the grammar specifies a package for the generated code: + +``` +grammar Count; + +@header { +package foo; +} + +@members { +int count = 0; +} + +list +@after {System.out.println(count+" ints");} +: INT {count++;} (',' INT {count++;} )* +; + +INT : [0-9]+ ; +WS : [ \r\t\n]+ -> skip ; +``` + +The grammar itself then should be in directory `foo` so that ANTLR generates code in that same `foo` directory (at least when not using the `-o` ANTLR tool option): + +``` +$ cd foo +$ antlr4 Count.g4 # generates code in the current directory (foo) +$ ls +Count.g4 CountLexer.java CountParser.java +Count.tokens CountLexer.tokens +CountBaseListener.java CountListener.java +$ javac *.java +$ cd .. +$ grun foo.Count list +=> 9, 10, 11 +=> EOF +<= 3 ints +``` + +The Java compiler expects classes in package `foo` to be in directory `foo`. diff --git a/doc/images/ACE-Architecture.001.png b/doc/images/ACE-Architecture.001.png new file mode 100644 index 000000000..ef36f98a8 Binary files /dev/null and b/doc/images/ACE-Architecture.001.png differ diff --git a/doc/images/combined.png b/doc/images/combined.png new file mode 100644 index 000000000..8131bb31b Binary files /dev/null and b/doc/images/combined.png differ diff --git a/doc/images/foreign.png b/doc/images/foreign.png new file mode 100644 index 000000000..daaa8d83d Binary files /dev/null and b/doc/images/foreign.png differ diff --git a/doc/images/hello-parrt.png b/doc/images/hello-parrt.png new file mode 100644 index 000000000..5bc0596ed Binary files /dev/null and b/doc/images/hello-parrt.png differ diff --git a/doc/images/idea-prefs-after-install.png b/doc/images/idea-prefs-after-install.png new file mode 100644 index 000000000..2d5c8bde7 Binary files /dev/null and b/doc/images/idea-prefs-after-install.png differ diff --git a/doc/images/idea-prefs.png b/doc/images/idea-prefs.png new file mode 100644 index 000000000..ab375b86c Binary files /dev/null and b/doc/images/idea-prefs.png differ diff --git a/doc/images/intellij-maven.png b/doc/images/intellij-maven.png new file mode 100644 index 000000000..775e532ef Binary files /dev/null and b/doc/images/intellij-maven.png differ diff --git a/doc/images/nested-fuzzy.png b/doc/images/nested-fuzzy.png new file mode 100644 index 000000000..4da816c52 Binary files /dev/null and b/doc/images/nested-fuzzy.png differ diff --git a/doc/images/nested.png b/doc/images/nested.png new file mode 100644 index 000000000..9c9bb1de9 Binary files /dev/null and b/doc/images/nested.png differ diff --git a/doc/images/nonascii.png b/doc/images/nonascii.png new file mode 100644 index 000000000..c431feedc Binary files /dev/null and b/doc/images/nonascii.png differ diff --git a/doc/images/nonnested-fuzzy.png b/doc/images/nonnested-fuzzy.png new file mode 100644 index 000000000..d17594255 Binary files /dev/null and b/doc/images/nonnested-fuzzy.png differ diff --git a/doc/images/process.png b/doc/images/process.png new file mode 100644 index 000000000..5185aaec5 Binary files /dev/null and b/doc/images/process.png differ diff --git a/doc/images/teronbook.png b/doc/images/teronbook.png new file mode 100644 index 000000000..3eb53d8d8 Binary files /dev/null and b/doc/images/teronbook.png differ diff --git a/doc/images/tertalk.png b/doc/images/tertalk.png new file mode 100644 index 000000000..069db0a83 Binary files /dev/null and b/doc/images/tertalk.png differ diff --git a/doc/images/tpantlr2.png b/doc/images/tpantlr2.png new file mode 100644 index 000000000..fb2e5c61e Binary files /dev/null and b/doc/images/tpantlr2.png differ diff --git a/doc/images/tpdsl.png b/doc/images/tpdsl.png new file mode 100644 index 000000000..05f137a03 Binary files /dev/null and b/doc/images/tpdsl.png differ diff --git a/doc/images/xyz.png b/doc/images/xyz.png new file mode 100644 index 000000000..ab08138d3 Binary files /dev/null and b/doc/images/xyz.png differ diff --git a/doc/images/xyz_opt.png b/doc/images/xyz_opt.png new file mode 100644 index 000000000..61d98948e Binary files /dev/null and b/doc/images/xyz_opt.png differ diff --git a/doc/images/xyz_plus.png b/doc/images/xyz_plus.png new file mode 100644 index 000000000..2f6fb984b Binary files /dev/null and b/doc/images/xyz_plus.png differ diff --git a/doc/images/xyz_star.png b/doc/images/xyz_star.png new file mode 100644 index 000000000..7a7841efd Binary files /dev/null and b/doc/images/xyz_star.png differ diff --git a/doc/index.md b/doc/index.md new file mode 100644 index 000000000..96d557377 --- /dev/null +++ b/doc/index.md @@ -0,0 +1,66 @@ +# ANTLR 4 Documentation + +Please check [Frequently asked questions (FAQ)](faq/index.md) before asking questions on stackoverflow or antlr-discussion list. + +Notes: +
    +
  • To add to or improve this documentation, fork the antlr/antlr4 repo then update this `doc/index.md` or file(s) in that directory. Submit a pull request to get your changes incorporated into the main repository. Do not mix code and documentation updates in the sample pull request. You must sign the contributors.txt certificate of origin with your pull request if you've not done so before.
  • + +
  • Copyright © 2012, The Pragmatic Bookshelf. Pragmatic Bookshelf grants a nonexclusive, irrevocable, royalty-free, worldwide license to reproduce, distribute, prepare derivative works, and otherwise use this contribution as part of the ANTLR project and associated documentation.
  • + +
  • This text was copied with permission from the The Definitive ANTLR 4 Reference, though it is being morphed over time as the tool changes.
  • +
+ +Links in the documentation refer to various sections of the book but have been redirected to the general book page on the publisher's site. There are two excerpts on the publisher's website that might be useful to you without having to purchase the book: [Let's get Meta](http://media.pragprog.com/titles/tpantlr2/picture.pdf) and [Building a Translator with a Listener](http://media.pragprog.com/titles/tpantlr2/listener.pdf). You should also consider reading the following books (the vid describes the reference book): + + + + + +This documentation is a reference and summarizes grammar syntax and the key semantics of ANTLR grammars. The source code for all examples in the book, not just this chapter, are free at the publisher's website. The following video is a general tour of ANTLR 4 and includes a description of how to use parse tree listeners to process Java files easily: + + + +## Sections + +* [Getting Started with ANTLR v4](getting-started.md) + +* [Grammar Lexicon](lexicon.md) + +* [Grammar Structure](grammars.md) + +* [Parser Rules](parser-rules.md) + +* [Left-recursive rules](left-recursion.md) + +* [Actions and Attributes](actions.md) + +* [Lexer Rules](lexer-rules.md) + +* [Wildcard Operator and Nongreedy Subrules](wildcard.md) + +* [Parse Tree Listeners](listeners.md) + +* [Parse Tree Matching and XPath](tree-matching.md) + +* [Semantic Predicates](predicates.md) + +* [Options](options.md) + +* [ANTLR Tool Command Line Options](tool-options.md) + +* [Runtime Libraries and Code Generation Targets](targets.md) + +* [Parser and lexer interpreters](interpreters.md) + +* [Resources](resources.md) + +# Building / releasing ANTLR itself + +* [Building ANTLR itself](building-antlr.md) + +* [Cutting an ANTLR Release](releasing-antlr.md) + +* [Adding ANTLR unit tests](adding-tests.md) + +* [Creating an ANTLR Language Target](creating-a-language-target.md) diff --git a/doc/interpreters.md b/doc/interpreters.md new file mode 100644 index 000000000..c99e6d580 --- /dev/null +++ b/doc/interpreters.md @@ -0,0 +1,79 @@ +# Parser and lexer interpreters + +*Since ANTLR 4.2* + +For small parsing tasks it is sometimes convenient to use ANTLR in interpreted mode, rather than generating a parser in a particular target, compiling it and running it as part of your application. Here's some sample code that creates lexer and parser Grammar objects and then creates interpreters. Once we have a ParserInterpreter, we can use it to parse starting in any rule we like, given a rule index (which the Grammar can provide). + +```java +LexerGrammar lg = new LexerGrammar( + "lexer grammar L;\n" + + "A : 'a' ;\n" + + "B : 'b' ;\n" + + "C : 'c' ;\n"); +Grammar g = new Grammar( + "parser grammar T;\n" + + "s : (A|B)* C ;\n", + lg); +LexerInterpreter lexEngine = + lg.createLexerInterpreter(new ANTLRInputStream(input)); +CommonTokenStream tokens = new CommonTokenStream(lexEngine); +ParserInterpreter parser = g.createParserInterpreter(tokens); +ParseTree t = parser.parse(g.rules.get(startRule).index); +``` + +You can also load combined grammars from a file: + +```java +public static ParseTree parse(String fileName, + String combinedGrammarFileName, + String startRule) + throws IOException +{ + final Grammar g = Grammar.load(combinedGrammarFileName); + LexerInterpreter lexEngine = g.createLexerInterpreter(new ANTLRFileStream(fileName)); + CommonTokenStream tokens = new CommonTokenStream(lexEngine); + ParserInterpreter parser = g.createParserInterpreter(tokens); + ParseTree t = parser.parse(g.getRule(startRule).index); + System.out.println("parse tree: "+t.toStringTree(parser)); + return t; +} +``` + +Then: + +```java +ParseTree t = parse("T.om", + MantraGrammar, + "compilationUnit"); +``` + +To load separate lexer/parser grammars, do this: + +```java +public static ParseTree parse(String fileNameToParse, + String lexerGrammarFileName, + String parserGrammarFileName, + String startRule) + throws IOException +{ + final LexerGrammar lg = (LexerGrammar) Grammar.load(lexerGrammarFileName); + final Grammar pg = Grammar.load(parserGrammarFileName, lg); + ANTLRFileStream input = new ANTLRFileStream(fileNameToParse); + LexerInterpreter lexEngine = lg.createLexerInterpreter(input); + CommonTokenStream tokens = new CommonTokenStream(lexEngine); + ParserInterpreter parser = pg.createParserInterpreter(tokens); + ParseTree t = parser.parse(pg.getRule(startRule).index); + System.out.println("parse tree: " + t.toStringTree(parser)); + return t; +} +``` + +Then: + +```java +ParseTree t = parse(fileName, XMLLexerGrammar, XMLParserGrammar, "document"); +``` + +This is also how we will integrate instantaneous parsing into ANTLRWorks2 and development environment plug-ins. + +See [TestParserInterpreter.java](https://github.com/antlr/antlr4/blob/master/tool-testsuite/test/org/antlr/v4/test/tool/TestParserInterpreter.java). diff --git a/doc/java-target.md b/doc/java-target.md new file mode 100644 index 000000000..f6e615482 --- /dev/null +++ b/doc/java-target.md @@ -0,0 +1,244 @@ +# Java + +## Development environments + +### Intellij + +There is a very complete and useful plug-in for intellij 12-14, you can grab at the [download page](https://plugins.jetbrains.com/plugin/7358?pr=). Check the [plugin readme](https://github.com/antlr/intellij-plugin-v4) for feature set. Just go to the preferences and click on the "Install plug-in from disk..." button from this dialog box: + + + +Select the intellij-plugin-1.x.zip (or whatever version) file and hit okay or apply. It will ask you to restart the IDE. If you look at the plug-ins again, you will see: + + + +Also, I have prepared a [video](https://youtu.be/eW4WFgRtFeY) that will help you generate grammars and so on using ANTLR v4 in Intellij (w/o the plugin). + +### Eclipse + +Edgar Espina has created an [eclipse plugin for ANTLR v4](https://youtu.be/eW4WFgRtFeY). Features: Advanced Syntax Highlighting, Automatic Code Generation (on save), Manual Code Generation (through External Tools menu), Code Formatter (Ctrl+Shift+F), Syntax Diagrams, Advanced Rule Navigation between files (F3), Quick fixes. + +### NetBeans + +Sam Harwell's [ANTLRWorks2](http://tunnelvisionlabs.com/products/demo/antlrworks) works also as a plug-in, not just a stand-alone tool built on top of NetBeans. + +## Build systems + +### ant + +### mvn + +*Maven Plugin Reference* + +The reference pages for the latest version of the Maven plugin for ANTLR 4 can be found here: + +[http://www.antlr.org/api/maven-plugin/latest/index.html](http://www.antlr.org/api/maven-plugin/latest/index.html) + +*Walkthrough* + +This section describes how to create a simple Antlr 4 project and build it using maven. We are going to use the ArrayInit.g4 example from chapter 3 of the book, and bring it under maven. We will need to rename files and modify them. We will conclude by building a portable stand alone application. + +Generate the skeleton. To generate the maven skeleton, type these commands: + +```bash +mkdir SimpleAntlrMavenProject +cd SimpleAntlrMavenProject +mvn archetype:generate -DgroupId=org.abcd.examples -DartifactId=array-example -Dpackage=org.abcd.examples.ArrayInit -Dversion=1.0 +# Accept all the default values +cd array-example +``` + +Maven will ask a series of questions, simply accept the default answers by hitting enter. + +Move into the directory created by maven: + +```bash +cd array-example +``` + +We can use the find command to see the files created by maven: + +```bash +$ find . -type f +./pom.xml +./src/test/java/org/abcd/examples/ArrayInit/AppTest.java +./src/main/java/org/abcd/examples/ArrayInit/App.java +``` + +We need to edit the pom.xml file extensively. The App.java will be renamed to ArrayInit.java and will contain the main ANTLR java program which we will download from the book examples. The AppTest.java file will be renamed ArrayInitTest.java but will remain the empty test as created by maven. We will also be adding the grammar file ArrayInit.g4 from the book examples in there. + +Get the examples for the book and put them in the Downloads folder. To obtain the ArrayInit.g4 grammar from the book, simply download it: + +```bash +pushd ~/Downloads +wget http://media.pragprog.com/titles/tpantlr2/code/tpantlr2-code.tgz +tar xvfz tpantlr2-code.tgz +popd +``` + +Copy the grammar to the maven project. The grammar file goes into a special folder under the src/ directory. The folder name must match the maven package name org.abcd.examples.ArrayInit. + +```bash +mkdir -p src/main/antlr4/org/abcd/examples/ArrayInit +cp ~/Downloads/code/starter/ArrayInit.g4 src/main/antlr4/org/abcd/examples/ArrayInit +``` + +Copy the main program to the maven project. We replace the maven App.java file with the main java program from the book. In the book, that main program is called Test.java, we rename it to ArrayInit.java: + +```bash +# Remove the maven file +rm ./src/main/java/org/abcd/examples/ArrayInit/App.java +# Copy and rename the example from the book +cp ~/Downloads/code/starter/Test.java ./src/main/java/org/abcd/examples/ArrayInit/ArrayInit.java +``` + +Spend a few minutes to read the main program. Notice that it reads the standard input stream. We need to remember this when we run the application. + +Edit the ArrayInit.java file. We need to add a package declaration and to rename the class. Edit the file ./src/main/java/org/abcd/examples/ArrayInit/ArrayInit.java in your favorite editor. The head of the file should look like this when you are done: + +```java +package org.abcd.examples.ArrayInit; +import org.antlr.v4.runtime.*; +import org.antlr.v4.runtime.tree.*; + +public class ArrayInit { +... +``` + +Edit the ArrayInitTest.java file. Maven creates a test file called AppTest.java, we need to rename it to match the name of our application: + +```bash +pushd src/test/java/org/abcd/examples/ArrayInit +mv AppTest.java ArrayInitTest.java +sed 's/App/ArrayInit/g' ArrayInitTest.java >ArrayInitTest.java.tmp +mv ArrayInitTest.java.tmp ArrayInitTest.java +popd +``` + +Edit the pom.xml file. Now we need to extensively modify the pom.xml file. The final product looks like this: + +```xml + + 4.0.0 + org.abcd.examples + array-init + 1.0 + jar + array-init + http://maven.apache.org + + UTF-8 + + + + org.antlr + antlr4-runtime + 4.5 + + + junit + junit + 3.8.1 + + + + + + + org.apache.maven.plugins + maven-compiler-plugin + 3.1 + + 1.7 + 1.7 + + + + + org.antlr + antlr4-maven-plugin + 4.5 + + + + antlr4 + + + + + + + maven-assembly-plugin + + + jar-with-dependencies + + + + + simple-command + package + + attached + + + + + + + +``` + +This concludes the changes we had to make. We can look at the list of files we have with the find command: + +```bash +$ find . -type f +./pom.xml +./src/test/java/org/abcd/examples/ArrayInit/ArrayInitTest.java +./src/main/antlr4/org/abcd/examples/ArrayInit/ArrayInit.g4 +./src/main/java/org/abcd/examples/ArrayInit/ArrayInit.java +``` + +Building a stand alone application. With all the files now in place, we can ask maven to create a standalone application. The following command does this: + +```bash +mvn package +``` + +Maven creates a self-contained jar file called target/array-init-1.0-jar-with-dependencies.jar. We can execute the jar file, but remember that it expects some input on the command line, which means the command will hang on the command line until we feed it some input: + +```bash +java -cp target/array-init-1.0-jar-with-dependencies.jar org.abcd.examples.ArrayInit.ArrayInit +``` + +And let's feed it the following input: + +```bash +{1,2,3} +^D +``` + +The ^D signals the end of the input to the standard input stream and gets the rest of the application going. You should see the following output: + +```bash +(init { (value 1) , (value 2) , (value 3) }) +``` + +You can also build a jar file without the dependencies, and execute it with a maven command instead: + +```bash +mvn install +mvn exec:java -Dexec.mainClass=org.abcd.examples.ArrayInit.ArrayInit +{1,2,3} +^D +``` \ No newline at end of file diff --git a/doc/javascript-target.md b/doc/javascript-target.md new file mode 100644 index 000000000..97ab22160 --- /dev/null +++ b/doc/javascript-target.md @@ -0,0 +1,157 @@ +# JavaScript + +## Which browsers are supported? + +In theory, all browsers supporting ECMAScript 5.1. + +In practice, this target has been extensively tested against: + +* Firefox 34.0.5 +* Safari 8.0.2 +* Chrome 39.0.2171 +* Explorer 11.0.3 + +The tests were conducted using Selenium. No issue was found, so you should find that the runtime works pretty much against any recent JavaScript engine. + +## Is NodeJS supported? + +The runtime has also been extensively tested against Node.js 0.10.33. No issue was found. + +## How to create a JavaScript lexer or parser? + +This is pretty much the same as creating a Java lexer or parser, except you need to specify the language target, for example: + +```bash +$ antlr4 -Dlanguage=JavaScript MyGrammar.g4 +``` + +For a full list of antlr4 tool options, please visit the [tool documentation page](tool-options.md). + +## Where can I get the runtime? + +Once you've generated the lexer and/or parser code, you need to download the runtime. + +The JavaScript runtime is available from the ANTLR web site [download section](http://www.antlr.org/download/index.html). The runtime is provided in the form of source code, so no additional installation is required. + +We will not document here how to refer to the runtime from your project, since this would differ a lot depending on your project type and IDE. + +## How do I get the runtime in my browser? + +The runtime is quite big and is currently maintained in the form of around 50 scripts, which follow the same structure as the runtimes for other targets (Java, C#, Python...). + +This structure is key in keeping code maintainable and consistent across targets. + +However, it would be a bit of a problem when it comes to get it into a browser. Nobody wants to write 50 times: + +``` + +``` + +This will load the runtime asynchronously. + +## How do I get the runtime in Node.js? + +Right now, there is no npm package available, so you need to register a link instead. This can be done by running the following command from the antlr4 directory: + +```bash +$ npm link antlr4 +``` + +This will install antlr4 using the package.son descriptor that comes with the script. + +## How do I run the generated lexer and/or parser? + +Let's suppose that your grammar is named, as above, "MyGrammar". Let's suppose this parser comprises a rule named "StartRule". The tool will have generated for you the following files: + +* MyGrammarLexer.js +* MyGrammarParser.js +* MyGrammarListener.js (if you have not activated the -no-listener option) +* MyGrammarVisitor.js (if you have activated the -visitor option) + +(Developers used to Java/C# ANTLR will notice that there is no base listener or visitor generated, this is because JavaScript having no support for interfaces, the generated listener and visitor are fully fledged classes) + +Now a fully functioning script might look like the following: + +```javascript + var input = "your text to parse here" + var chars = new antlr4.InputStream(input); + var lexer = new MyGrammarLexer.MyGrammarLexer(chars); + var tokens = new antlr4.CommonTokenStream(lexer); + var parser = new MyGrammarParser.MyGrammarParser(tokens); + parser.buildParseTrees = true; + var tree = parser.MyStartRule(); +``` + +This program will work. But it won't be useful unless you do one of the following: + +* you visit the parse tree using a custom listener +* you visit the parse tree using a custom visitor +* your grammar comprises production code (like AntLR3) + +(please note that production code is target specific, so you can't have multi target grammars that include production code) + +## How do I create and run a custom listener? + +Let's suppose your MyGrammar grammar comprises 2 rules: "key" and "value". The antlr4 tool will have generated the following listener: + +```javascript + MyGrammarListener = function(ParseTreeListener) { + // some code here + } + // some code here + MyGrammarListener.prototype.enterKey = function(ctx) {}; + MyGrammarListener.prototype.exitKey = function(ctx) {}; + MyGrammarListener.prototype.enterValue = function(ctx) {}; + MyGrammarListener.prototype.exitValue = function(ctx) {}; +``` + +In order to provide custom behavior, you might want to create the following class: + +```javascript + KeyPrinter = function() { + MyGrammarListener.call(this); // inherit default listener + return this; + }; + +// inherit default listener +KeyPrinter.prototype = Object.create(MyGrammarListener.prototype); +KeyPrinter.prototype.constructor = KeyPrinter; + +// override default listener behavior + KeyPrinter.prototype.exitKey = function(ctx) { + console.log("Oh, a key!"); + }; +``` + +In order to execute this listener, you would simply add the following lines to the above code: + +```javascript + ... + tree = parser.StartRule() - only repeated here for reference + var printer = new KeyPrinter(); + antlr4.tree.ParseTreeWalker.DEFAULT.walk(printer, tree); +``` + +## How do I integrate my parser with ACE editor? + +This specific task is described in this [dedicated page](ace-javascript-target.md). + +## How can I learn more about ANTLR? + + +Further information can be found from "The definitive ANTLR 4 reference" book. + +The JavaScript implementation of ANTLR is as close as possible to the Java one, so you shouldn't find it difficult to adapt the book's examples to JavaScript. diff --git a/doc/left-recursion.md b/doc/left-recursion.md new file mode 100644 index 000000000..3430e10e9 --- /dev/null +++ b/doc/left-recursion.md @@ -0,0 +1,50 @@ +# Left-recursive rules + +The most natural expression of some common language constructs is left recursive. For example C declarators and arithmetic expressions. Unfortunately, left recursive specifications of arithmetic expressions are typically ambiguous but much easier to write out than the multiple levels required in a typical top-down grammar. Here is a sample ANTLR 4 grammar with a left recursive expression rule: + +``` +stat: expr '=' expr ';' // e.g., x=y; or x=f(x); + | expr ';' // e.g., f(x); or f(g(x)); + ; +expr: expr '*' expr + | expr '+' expr + | expr '(' expr ')' // f(x) + | id + ; +``` + +In straight context free grammars, such a rule is ambiguous because `1+2*3` it can interpret either operator as occurring first, but ANTLR rewrites that to be non-left recursive and unambiguous using semantic predicates: + +``` +expr[int pr] : id + ( {4 >= $pr}? '*' expr[5] + | {3 >= $pr}? '+' expr[4] + | {2 >= $pr}? '(' expr[0] ')' + )* + ; +``` + +The predicates resolve ambiguities by comparing the precedence of the current operator against the precedence of the previous operator. An expansion of expr[pr] can match only those subexpressions whose precedence meets or exceeds pr. + +## Formal rules + +The formal 4.0, 4.1 ANTLR left-recursion elimination rules were changed (simplified) for 4.2 and are laid out in the [ALL(*) tech report](http://www.antlr.org/papers/allstar-techreport.pdf): + +* Binary expressions are expressions which contain a recursive invocation of the rule as the first and last element of the alternative. +* Suffix expressions contain a recursive invocation of the rule as the first element of the alternative, but not as the last element. +* Prefix expressions contain a recursive invocation of the rule as the last element of the alternative, but not as the first element. + +There is no such thing as a "ternary" expression--they are just binary expressions in disguise. + +The right associativity specifiers used to be on the individual tokens but it's done on alternative basis anyway so the option is now on the individual alternative; e.g., + +``` +e : e '*' e + | e '+' e + | e '?' e ':' e + | e '=' e + | INT + ; +``` + +If your 4.0 or 4.1 grammar uses a right-associative ternary operator, you will need to update your grammar to include `` on the alternative operator. To smooth the transition, `` is still allowed on token references but it is ignored. diff --git a/doc/lexer-rules.md b/doc/lexer-rules.md new file mode 100644 index 000000000..adda9e8b0 --- /dev/null +++ b/doc/lexer-rules.md @@ -0,0 +1,283 @@ +# Lexer Rules + +A lexer grammar is composed of lexer rules, optionally broken into multiple modes. Lexical modes allow us to split a single lexer grammar into multiple sublexers. The lexer can only return tokens matched by rules from the current mode. + +Lexer rules specify token definitions and more or less follow the syntax of parser rules except that lexer rules cannot have arguments, return values, or local variables. Lexer rule names must begin with an uppercase letter, which distinguishes them from parser rule names: + +``` +/** Optional document comment */ +TokenName : alternative1 | ... | alternativeN ; +``` + +You can also define rules that are not tokens but rather aid in the recognition of tokens. These fragment rules do not result in tokens visible to the parser: + +``` +fragment +HelperTokenRule : alternative1 | ... | alternativeN ; +``` + +For example, `DIGIT` is a pretty common fragment rule: + +``` +INT : DIGIT+ ; // references the DIGIT helper rule +fragment DIGIT : [0-9] ; // not a token by itself +``` + +## Lexical Modes + +Modes allow you to group lexical rules by context, such as inside and outside of XML tags. It’s like having multiple sublexers, one for context. The lexer can only return tokens matched by entering a rule in the current mode. Lexers start out in the so-called default mode. All rules are considered to be within the default mode unless you specify a mode command. Modes are not allowed within combined grammars, just lexer grammars. (See grammar `XMLLexer` from [Tokenizing XML](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference).) + +``` +rules in default mode +... +mode MODE1; +rules in MODE1 +... +mode MODEN; +rules in MODEN +... +``` + +## Lexer Rule Elements + +Lexer rules allow two constructs that are unavailable to parser rules: the .. range operator and the character set notation enclosed in square brackets, [characters]. Don’t confuse character sets with arguments to parser rules. [characters] only means character set in a lexer. Here’s a summary of all lexer rule elements: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SyntaxDescription
T +Match token T at the current input position. Tokens always begin with a capital letter.
’literal’ +Match that character or sequence of characters. E.g., ’while’ or ’=’.
[char set] +Match one of the characters specified in the character set. Interpret x-y as set of characters between range x and y, inclusively. The following escaped characters are interpreted as single special characters: \n, \r, \b, \t, and \f. To get ], \, or - you must escape them with \. You can also use Unicode character specifications: \uXXXX. Here are a few examples: + +
+WS : [ \n\u000D] -> skip ; // same as [ \n\r]
+ 	
+ID : [a-zA-Z] [a-zA-Z0-9]* ; // match usual identifier spec
+ 	
+DASHBRACK : [\-\]]+ ; // match - or ] one or more times
+
+
’x’..’y’ +Match any single character between range x and y, inclusively. E.g., ’a’..’z’. ’a’..’z’ is identical to [a-z].
T +Invoke lexer rule T; recursion is allowed in general, but not left recursion. T can be a regular token or fragment rule. + +
+ID : LETTER (LETTER|'0'..'9')* ;
+ 	
+fragment
+LETTER : [a-zA-Z\u0080-\u00FF_] ;
+
+
. +The dot is a single-character wildcard that matches any single character. Example: +
+ESC : '\\' . ; // match any escaped \x character
+
+
{«action»} +Lexer actions can appear anywhere as of 4.2, not just at the end of the outermost alternative. The lexer executes the actions at the appropriate input position, according to the placement of the action within the rule. To execute a single action for a role that has multiple alternatives, you can enclose the alts in parentheses and put the action afterwards: + +
+END : ('endif'|'end') {System.out.println("found an end");} ;
+
+ +

The action conforms to the syntax of the target language. ANTLR copies the action’s contents into the generated code verbatim; there is no translation of expressions like $x.y as there is in parser actions.

+

+Only actions within the outermost token rule are executed. In other words, if STRING calls ESC_CHAR and ESC_CHAR has an action, that action is not executed when the lexer starts matching in STRING.

{«p»}? +Evaluate semantic predicate «p». If «p» evaluates to false at runtime, the surrounding rule becomes “invisible” (nonviable). Expression «p» conforms to the target language syntax. While semantic predicates can appear anywhere within a lexer rule, it is most efficient to have them at the end of the rule. The one caveat is that semantic predicates must precede lexer actions. See Predicates in Lexer Rules.
~x +Match any single character not in the set described by x. Set x can be a single character literal, a range, or a subrule set like ~(’x’|’y’|’z’) or ~[xyz]. Here is a rule that uses ~ to match any character other than characters using ~[\r\n]*: +
 	
+COMMENT : '#' ~[\r\n]* '\r'? '\n' -> skip ;
+
+
+ +Just as with parser rules, lexer rules allow subrules in parentheses and EBNF operators: `?`, `*`, `+`. The `COMMENT` rule illustrates the `*` and `?` operators. A common use of `+` is `[0-9]+` to match integers. Lexer subrules can also use the nongreedy `?` suffix on those EBNF operators. + +## Recursive Lexer Rules + +ANTLR lexer rules can be recursive, unlike most lexical grammar tools. This comes in really handy when you want to match nested tokens like nested action blocks: `{...{...}...}`. + +``` +lexer grammar Recur; + +ACTION : '{' ( ACTION | ~[{}] )* '}' ; + +WS : [ \r\t\n]+ -> skip ; +``` + +## Redundant String Literals + +Be careful that you don’t specify the same string literal on the right-hand side of multiple lexer rules. Such literals are ambiguous and could match multiple token types. ANTLR makes this literal unavailable to the parser. The same is true for rules across modes. For example, the following lexer grammar defines two tokens with the same character sequence: + +``` +lexer grammar L; +AND : '&' ; +mode STR; +MASK : '&' ; +``` + +A parser grammar cannot reference literal ’&’, but it can reference the name of the tokens: + +``` +parser grammar P; +options { tokenVocab=L; } +a : '&' // results in a tool error: no such token + AND // no problem + MASK // no problem + ; +``` + +Here’s a build and test sequence: + +```bash +$ antlr4 L.g4 # yields L.tokens file needed by tokenVocab option in P.g4 +$ antlr4 P.g4 +error(126): P.g4:3:4: cannot create implicit token for string literal '&' in non-combined grammar +``` + +## Lexer Rule Actions + +An ANTLR lexer creates a Token object after matching a lexical rule. Each request for a token starts in `Lexer.nextToken`, which calls `emit` once it has identified a token. `emit` collects information from the current state of the lexer to build the token. It accesses fields `_type`, `_text`, `_channel`, `_tokenStartCharIndex`, `_tokenStartLine`, and `_tokenStartCharPositionInLine`. You can set the state of these with the various setter methods such as `setType`. For example, the following rule turns `enum` into an identifier if `enumIsKeyword` is false. + +``` +ENUM : 'enum' {if (!enumIsKeyword) setType(Identifier);} ; +``` + +ANTLR does no special `$x` attribute translations in lexer actions (unlike v3). + +There can be at most a single action for a lexical rule, regardless of how many alternatives there are in that rule. + +## Lexer Commands + +To avoid tying a grammar to a particular target language, ANTLR supports lexer commands. Unlike arbitrary embedded actions, these commands follow specific syntax and are limited to a few common commands. Lexer commands appear at the end of the outermost alternative of a lexer rule definition. Like arbitrary actions, there can only be one per token rule. A lexer command consists of the `->` operator followed by one or more command names that can optionally take parameters: + +``` +TokenName : «alternative» -> command-name +TokenName : «alternative» -> command-name («identifier or integer») +``` + +An alternative can have more than one command separated by commas. Here are the valid command names: + +* skip +* more +* popMode +* mode( x ) +* pushMode( x ) +* type( x ) +* channel( x ) + +See the book source code for usage, some examples of which are shown here: + +### skip + +A 'skip' command tells the lexer to get another token and throw out the current text. + +``` +ID : [a-zA-Z]+ ; // match identifiers +INT : [0-9]+ ; // match integers +NEWLINE:'\r'? '\n' ; // return newlines to parser (is end-statement signal) +WS : [ \t]+ -> skip ; // toss out whitespace +``` + +### mode(), pushMode(), popMode, and more + +The mode commands alter the mode stack and hence the mode of the lexer. The 'more' command forces the lexer to get another token but without throwing out the current text. The token type will be that of the "final" rule matched (i.e., the one without a more or skip command). + +``` +// Default "mode": Everything OUTSIDE of a tag +COMMENT : '' ; +CDATA : '' ;OPEN : '<' -> pushMode(INSIDE) ; + ... +XMLDeclOpen : ' pushMode(INSIDE) ; +SPECIAL_OPEN: ' more, pushMode(PROC_INSTR) ; +// ----------------- Everything INSIDE of a tag --------------------- +mode INSIDE; +CLOSE : '>' -> popMode ; +SPECIAL_CLOSE: '?>' -> popMode ; // close +SLASH_CLOSE : '/>' -> popMode ; +``` + +Also check out: + +``` +lexer grammar Strings; +LQUOTE : '"' -> more, mode(STR) ; +WS : [ \r\t\n]+ -> skip ; +mode STR; +STRING : '"' -> mode(DEFAULT_MODE) ; // token we want parser to see +TEXT : . -> more ; // collect more text for string +``` + +Popping the bottom layer of a mode stack will result in an exception. Switching modes with `mode` changes the current stack top. More than one `more` is the same as just one and the position does not matter. + +### type() + +``` +lexer grammar SetType; +tokens { STRING } +DOUBLE : '"' .*? '"' -> type(STRING) ; +SINGLE : '\'' .*? '\'' -> type(STRING) ; +WS : [ \r\t\n]+ -> skip ; +``` + +For multiple 'type()' commands, only the rightmost has an effect. + +### channel() + +``` +BLOCK_COMMENT + : '/*' .*? '*/' -> channel(HIDDEN) + ; +LINE_COMMENT + : '//' ~[\r\n]* -> channel(HIDDEN) + ; +... +// ---------- +// Whitespace +// +// Characters and character constructs that are of no import +// to the parser and are used to make the grammar easier to read +// for humans. +// +WS : [ \t\r\n\f]+ -> channel(HIDDEN) ; +``` + +As of 4.5, you can also define channel names like enumerations with the following construct above the lexer rules: + +``` +channels { WSCHANNEL, MYHIDDEN } +``` diff --git a/doc/lexicon.md b/doc/lexicon.md new file mode 100644 index 000000000..804a7456c --- /dev/null +++ b/doc/lexicon.md @@ -0,0 +1,110 @@ +# Grammar Lexicon + +The lexicon of ANTLR is familiar to most programmers because it follows the syntax of C and its derivatives with some extensions for grammatical descriptions. + +## Comments + +There are single-line, multiline, and Javadoc-style comments: + +``` +/** This grammar is an example illustrating the three kinds + * of comments. + */ +grammar T; +/* a multi-line + comment +*/ + +/** This rule matches a declarator for my language */ +decl : ID ; // match a variable name +``` + +The Javadoc comments are hidden from the parser and are ignored at the moment. They are intended to be used only at the start of the grammar and any rule. + +## Identifiers + +Token names always start with a capital letter and so do lexer rules as defined by Java’s `Character.isUpperCase` method. Parser rule names always start with a lowercase letter (those that fail `Character.isUpperCase`). The initial character can be followed by uppercase and lowercase letters, digits, and underscores. Here are some sample names: + +``` +ID, LPAREN, RIGHT_CURLY // token names/rules +expr, simpleDeclarator, d2, header_file // rule names +``` + +Like Java, ANTLR accepts Unicode characters in ANTLR names: + + + +To support Unicode parser and lexer rule names, ANTLR uses the following rule: + +``` +ID : a=NameStartChar NameChar* + { + if ( Character.isUpperCase(getText().charAt(0)) ) setType(TOKEN_REF); + else setType(RULE_REF); + } + ; +``` + +Rule `NameChar` identifies the valid identifier characters: + +``` +fragment +NameChar + : NameStartChar + | '0'..'9' + | '_' + | '\u00B7' + | '\u0300'..'\u036F' + | '\u203F'..'\u2040' + ; +fragment +NameStartChar + : 'A'..'Z' | 'a'..'z' + | '\u00C0'..'\u00D6' + | '\u00D8'..'\u00F6' + | '\u00F8'..'\u02FF' + | '\u0370'..'\u037D' + | '\u037F'..'\u1FFF' + | '\u200C'..'\u200D' + | '\u2070'..'\u218F' + | '\u2C00'..'\u2FEF' + | '\u3001'..'\uD7FF' + | '\uF900'..'\uFDCF' + | '\uFDF0'..'\uFFFD' + ; +``` + +Rule `NameStartChar` is the list of characters that can start an identifier (rule, token, or label name): +These more or less correspond to `isJavaIdentifierPart` and `isJavaIdentifierStart` in Java’s Character class. Make sure to use the `-encoding` option on the ANTLR tool if your grammar file is not in UTF-8 format, so that ANTLR reads characters properly. + +## Literals + +ANTLR does not distinguish between character and string literals as most languages do. All literal strings one or more characters in length are enclosed in single quotes such as `’;’`, `’if’`, `’>=’`, and `’\’'` (refers to the one-character string containing the single quote character). Literals never contain regular expressions. + +Literals can contain Unicode escape sequences of the form `\uXXXX`, where XXXX is the hexadecimal Unicode character value. For example, `’\u00E8’` is the French letter with a grave accent: `’è’`. ANTLR also understands the usual special escape sequences: `’\n’` (newline), `’\r’` (carriage return), `’\t’` (tab), `’\b’` (backspace), and `’\f’` (form feed). You can use Unicode characters directly within literals or use the Unicode escape sequences: + +``` +grammar Foreign; +a : '外' ; +``` + +The recognizers that ANTLR generates assume a character vocabulary containing all Unicode characters. The input file encoding assumed by the runtime library depends on the target language. For the Java target, the runtime library assumes files are in UTF-8. Using the constructors, you can specify a different encoding. See, for example, ANTLR’s `ANTLRFileStream`. + +## Actions + +Actions are code blocks written in the target language. You can use actions in a number of places within a grammar, but the syntax is always the same: arbitrary text surrounded by curly braces. You don’t need to escape a closing curly character if it’s in a string or comment: `"}"` or `/*}*/`. If the curlies are balanced, you also don’t need to escape }: `{...}`. Otherwise, escape extra curlies with a backslash: `\{` or `\}`. The action text should conform to the target language as specified with thelanguage option. + +Embedded code can appear in: `@header` and `@members` named actions, parser and lexer rules, exception catching specifications, attribute sections for parser rules (return values, arguments, and locals), and some rule element options (currently predicates). + +The only interpretation ANTLR does inside actions relates to grammar attributes; see [Token Attributes](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference) and Chapter 10, [Attributes and Actions](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference). Actions embedded within lexer rules are emitted without any interpretation or translation into generated lexers. + +## Keywords + +Here’s a list of the reserved words in ANTLR grammars: + +``` +import, fragment, lexer, parser, grammar, returns, +locals, throws, catch, finally, mode, options, tokens +``` + +Also, although it is not a keyword, do not use the word `rule` as a rule name. Further, do not use any keyword of the target language as a token, label, or rule name. For example, rule `if` would result in a generated function called `if`. That would not compile obviously. diff --git a/doc/listeners.md b/doc/listeners.md new file mode 100644 index 000000000..c3bcad9c1 --- /dev/null +++ b/doc/listeners.md @@ -0,0 +1,38 @@ +# Parse Tree Listeners + +*Partially taken from publically visible [excerpt from ANTLR 4 book](http://media.pragprog.com/titles/tpantlr2/picture.pdf)* + +By default, ANTLR-generated parsers build a data structure called a parse tree or syntax tree that records how the parser recognized the structure of the input sentence and component phrases. + + + +The interior nodes of the parse tree are phrase names that group and identify their children. The root node is the most abstract phrase name, in this case `stat` (short for statement). The leaves of a parse tree are always the input tokens. Parse trees sit between a language recognizer and an interpreter or translator implementation. They are extremely effective data structures because they contain all of the input and complete knowledge of how the parser grouped the symbols into phrases. Better yet, they are easy to understand and the parser generates them automatically (unless you turn them off with `parser.setBuildParseTree(false)`). + +Because we specify phrase structure with a set of rules, parse tree subtree roots correspond to grammar rule names. ANTLR has a ParseTreeWalker that knows how to walk these parse trees and trigger events in listener implementation objects that you can create. The ANTLR tool generates listener interfaces for you also, unless you turn that off with a commandline option. You can also have it generate visitors. For example from a Java.g4 grammar, ANTLR generates: + +```java +public interface JavaListener extends ParseTreeListener { + void enterClassDeclaration(JavaParser.ClassDeclarationContext ctx); + void exitClassDeclaration(JavaParser.ClassDeclarationContext ctx); + void enterMethodDeclaration(JavaParser.MethodDeclarationContext ctx); + ... +} +``` + +where there is an enter and exit method for each rule in the parser grammar. ANTLR also generates a base listener with the fall empty implementations of all listener interface methods, in this case called JavaBaseListener. You can build your listener by subclassing this base and overriding the methods of interest. + +Assuming you've created a listener object called `MyListener`, here is how to call the Java parser and walk the parse tree: + +```java +JavaLexer lexer = new JavaLexer(input); +CommonTokenStream tokens = new CommonTokenStream(lexer); +JavaParser parser = new JavaParser(tokens); +JavaParser.CompilationUnitContext tree = parser.compilationUnit(); // parse a compilationUnit + +MyListener extractor = new MyListener(parser); +ParseTreeWalker.DEFAULT.walk(extractor, tree); // initiate walk of tree with listener in use of default walker +``` + +Listeners and visitors are great because they keep application-specific code out of grammars, making grammars easier to read and preventing them from getting entangled with a particular application. + +See the book for more information on listeners and to learn how to use visitors. (The biggest difference between the listener and visitor mechanisms is that listener methods are called independently by an ANTLR-provided walker object, whereas visitor methods must walk their children with explicit visit calls. Forgetting to invoke visitor methods on a node’s children, means those subtrees don’t get visited.) diff --git a/doc/options.md b/doc/options.md new file mode 100644 index 000000000..7ce277551 --- /dev/null +++ b/doc/options.md @@ -0,0 +1,101 @@ +# Options + +There are a number of options that you can specify at the grammar and rule element level. (There are currently no rule options.) These change how ANTLR generates code from your grammar. The general syntax is: + +``` +options { name1=value1; ... nameN=valueN; } // ANTLR not target language syntax +``` + +where a value can be an identifier, a qualified identifier (for example, a.b.c), a string, a multi-line string in curly braces `{...}`, and an integer. + +## Grammar Options + +All grammars can use the following options. In combined grammars, all options except language pertain only to the generated parser. Options may be set either within the grammar file using the options syntax (described above) or when invoking ANTLR on the command line, using the `-D` option. (see Section 15.9, [ANTLR Tool Command Line Options](tool-options.md).) The following examples demonstrate both mechanisms; note that `-D` overrides options within the grammar. + +* `superClass`. Set the superclass of the generated parser or lexer. For combined grammars, it sets the superclass of the parser. +``` +$ cat Hi.g4 +grammar Hi; +a : 'hi' ; +$ antlr4 -DsuperClass=XX Hi.g4 +$ grep 'public class' HiParser.java +public class HiParser extends XX { +$ grep 'public class' HiLexer.java +public class HiLexer extends Lexer { +``` +* `language` Generate code in the indicated language, if ANTLR is able to do so. Otherwise, you will see an error message like this: +``` +$ antlr4 -Dlanguage=C MyGrammar.g4 +error(31): ANTLR cannot generate C code as of version 4.0 +``` +* `tokenVocab` ANTLR assigns token type numbers to the tokens as it encounters them in a file. To use different token type values, such as with a separate lexer, use this option to have ANTLR pull in the tokens file. ANTLR generates a tokens file from each grammar. +``` +$ cat SomeLexer.g4 +lexer grammar SomeLexer; +ID : [a-z]+ ; +$ cat R.g4 +parser grammar R; +options {tokenVocab=SomeLexer;} +tokens {A,B,C} // normally, these would be token types 1, 2, 3 +a : ID ; +$ antlr4 SomeLexer.g4 +$ cat SomeLexer.tokens +ID=1 +$ antlr4 R.g4 +$ cat R.tokens +A=2 +B=3 +C=4 +ID=1 +``` +* `TokenLabelType` ANTLR normally uses type Token when it generates variables referencing tokens. If you have passed a TokenFactory to your parser and lexer so that they create custom tokens, you should set this option to your specific type. This ensures that the context objects know your type for fields and method return values. +``` +$ cat T2.g4 +grammar T2; +options {TokenLabelType=MyToken;} +a : x=ID ; +$ antlr4 T2.g4 +$ grep MyToken T2Parser.java + public MyToken x; +``` +* `contextSuperClass`. Specify the super class of parse tree internal nodes. Default is `ParserRuleContext`. Should derive from ultimately `RuleContext` at minimum. +Java target can use `contextSuperClass=org.antlr.v4.runtime.RuleContextWithAltNum` for convenience. It adds a backing field for `altNumber`, the alt matched for the associated rule node. + +## Rule Options + +There are currently no valid rule-level options, but the tool still supports the following syntax for future use: + +``` +rulename +options {...} + : ... + ; +``` + +## Rule Element Options + +Token options have the form `T` as we saw in Section 5.4, [Dealing with Precedence, Left Recursion, and Associativity](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference). The only token option is `assoc`, and it accepts values `left` and `right`. Here’s a sample grammar with a left-recursive expression rule that specifies a token option on the `^` exponent operator token: + +``` +grammar ExprLR; + +expr : expr '^' expr + | expr '*' expr // match subexpressions joined with '*' operator + | expr '+' expr // match subexpressions joined with '+' operator + | INT // matches simple integer atom + ; + +INT : '0'..'9'+ ; +WS : [ \n]+ -> skip ; +``` + +Semantic predicates also accept an option, per [Catching failed semantic predicates](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference). The only valid option is the `fail` option, which takes either a string literal in double-quotes or an action that evaluates to a string. The string literal or string result from the action should be the message to emit upon predicate failure. + +``` +ints[int max] + locals [int i=1] + : INT ( ',' {$i++;} {$i<=$max}? INT )* + ; +``` + +The action can execute a function as well as compute a string when a predicate fails: `{...}?` diff --git a/doc/parser-rules.md b/doc/parser-rules.md new file mode 100644 index 000000000..2fa7d6b5c --- /dev/null +++ b/doc/parser-rules.md @@ -0,0 +1,480 @@ +# Parser Rules + +Parsers consist of a set of parser rules either in a parser or a combined grammar. A Java application launches a parser by invoking the rule function, generated by ANTLR, associated with the desired start rule. The most basic rule is just a rule name followed by a single alternative terminated with a semicolon: + +``` + /** Javadoc comment can precede rule */ + retstat : 'return' expr ';' ; +``` + +Rules can also have alternatives separated by the | + +``` +operator: + stat: retstat + | 'break' ';' + | 'continue' ';' + ; +``` + +Alternatives are either a list of rule elements or empty. For example, here’s a rule with an empty alternative that makes the entire rule optional: + +``` +superClass + : 'extends' ID + | // empty means other alternative(s) are optional + ; +``` + +## Alternative Labels + +As we saw in Section 7.4, Labeling Rule Alternatives for Precise Event Methods, we can get more precise parse-tree listener events by labeling the outermost alternatives of a rule using the # operator. All alternatives within a rule must be labeled, or none of them. Here are two rules with labeled alternatives. + +``` +grammar T; +stat: 'return' e ';' # Return + | 'break' ';' # Break + ; +e : e '*' e # Mult + | e '+' e # Add + | INT # Int + ; +``` + +Alternative labels do not have to be at the end of the line and there does not have to be a space after the # symbol. +ANTLR generates a rule context class definition for each label. For example, here is the listener that ANTLR generates: + +```java +public interface AListener extends ParseTreeListener { + void enterReturn(AParser.ReturnContext ctx); + void exitReturn(AParser.ReturnContext ctx); + void enterBreak(AParser.BreakContext ctx); + void exitBreak(AParser.BreakContext ctx); + void enterMult(AParser.MultContext ctx); + void exitMult(AParser.MultContext ctx); + void enterAdd(AParser.AddContext ctx); + void exitAdd(AParser.AddContext ctx); + void enterInt(AParser.IntContext ctx); + void exitInt(AParser.IntContext ctx); +} +``` + +There are enter and exit methods associated with each labeled alternative. The parameters to those methods are specific to alternatives. + +You can reuse the same label on multiple alternatives to indicate that the parse tree walker should trigger the same event for those alternatives. For example, here’s a variation on rule e from grammar A above: + +``` + e : e '*' e # BinaryOp + | e '+' e # BinaryOp + | INT # Int + ; +``` + +ANTLR would generate the following listener methods for e: + +```java + void enterBinaryOp(AParser.BinaryOpContext ctx); + void exitBinaryOp(AParser.BinaryOpContext ctx); + void enterInt(AParser.IntContext ctx); + void exitInt(AParser.IntContext ctx); + ``` + +ANTLR gives errors if an alternative name conflicts with a rule name. Here’s another rewrite of rule e where two +alternative labels conflict with rule names: + +``` + e : e '*' e # e + | e '+' e # Stat + | INT # Int + ; +``` + +The context objects generated from rule names and labels get capitalized and so label Stat conflicts with rule stat: + +```bash + $ antlr4 A.g4 + error(124): A.g4:5:23: rule alt label e conflicts with rule e + error(124): A.g4:6:23: rule alt label Stat conflicts with rule stat + warning(125): A.g4:2:13: implicit definition of token INT in parser +``` + +## Rule Context Objects + +ANTLR generates methods to access the rule context objects (parse tree nodes) associated with each rule reference. For rules with a single rule reference, ANTLR generates a method with no arguments. Consider the following rule. + +``` + inc : e '++' ; +``` + +ANTLR generates this context class: + +```java +public static class IncContext extends ParserRuleContext { + public EContext e() { ... } // return context object associated with e + ... +} +``` + +ANTLR also provide support to access context objects when there is more than a single reference to a rule: + +``` +field : e '.' e ; +``` + +ANTLR generates a method with an index to access the ith element as well as a method to get context for all references to that rule: + +```java +public static class FieldContext extends ParserRuleContext { + public EContext e(int i) { ... } // get ith e context + public List e() { ... } // return ALL e contexts + ... +} +``` + +If we had another rule, s, that references field, an embedded action could access the list of e rule matches performed by field: + +``` +s : field + { + List x = $field.ctx.e(); + ... + } +; +``` + +A listener or visitor could do the same thing. Given a pointer to a FieldContext object, f, f.e() would return List. + +## Rule Element Labels + +You can label rule elements using the = operator to add fields to the rule context objects: + +``` +stat: 'return' value=e ';' # Return + | 'break' ';' # Break + ; +``` + +Here value is the label for the return value of rule e, which is defined elsewhere. +Labels become fields in the appropriate parse tree node class. In this case, label value becomes a field in ReturnContext because of the Return alternative label: + +```java +public static class ReturnContext extends StatContext { + public EContext value; + ... +} +``` + +It’s often handy to track a number of tokens, which you can do with the += “list label” operator. For example, the following rule creates a list of the Token objects matched for a simple array construct: + +``` + array : '{' el+=INT (',' el+=INT)* '}' ; +``` + +ANTLR generates a List field in the appropriate rule context class: + +``` + public static class ArrayContext extends ParserRuleContext { + public List el = new ArrayList(); + ... + } +``` + +These list labels also work for rule references: + +``` + elist : exprs+=e (',' exprs+=e)* ; +``` + +ANTLR generates a field holding the list of context objects: + +``` + public static class ElistContext extends ParserRuleContext { + public List exprs = new ArrayList(); + ... + } +``` + +## Rule Elements + +Rule elements specify what the parser should do at a given moment just like statements in a programming language. The elements can be rule, token, string literal like expression, ID, and ’return’. Here’s a complete list of the rule elements (we’ll look at actions and predicates in more detail later): + + + + + + + + + + + + + + + + + + + + + + + + + + +
SyntaxDescription
T +Match token T at the current input position. Tokens always begin with a capital letter.
’literal’ +Match the string literal at the current input position. A string literal is simply a token with a fixed string.
r +Match rule r at current input position, which amounts to invoking the rule just like a function call. Parser rule names always begin with a lowercase letter.
r [«args»] +Match rule r at current input position, passing in a list of arguments just like a function call. The arguments inside the square brackets are in the syntax of the target language and are usually a comma-separated list of expressions.
{«action»} +Execute an action immediately after the preceding alternative element and immediately before the following alternative element. The action conforms to the syntax of the target language. ANTLR copies the action code to the generated class verbatim, except for substituting attribute and token references such as $x and $x.y.
{«p»}? +Evaluate semantic predicate «p». Do not continue parsing past a predicate if «p» evaluates to false at runtime. Predicates encountered during prediction, when ANTLR distinguishes between alternatives, enable or disable the alternative(s) surrounding the predicate(s).
. +Match any single token except for the end of file token. The “dot” operator is called the wildcard.
+ +When you want to match everything but a particular token or set of tokens, use the `~` “not” operator. This operator is rarely used in the parser but is available. `~INT` matches any token except the `INT` token. `~’,’` matches any token except the comma. `~(INT|ID)` matches any token except an INT or an ID. + +Token, string literal, and semantic predicate rule elements can take options. See Rule Element Options. + +## Subrules + +A rule can contain alternative blocks called subrules (as allowed in Extended BNF Notation: EBNF). A subrule is like a rule that lacks a name and is enclosed in parentheses. Subrules can have one or more alternatives inside the parentheses. Subrules cannot define attributes with locals and returns like rules can. There are four kinds of subrules (x, y, and z represent grammar fragments): + + + + + + + + + + + + + + +
SyntaxDescription
(x|y|z). +Match any alternative within the subrule exactly once. Example: +
+ +returnType : (type | 'void') ; + +
(x|y|z)? +Match nothing or any alternative within subrule. Example: +
+ +classDeclaration + : 'class' ID (typeParameters)? ('extends' type)? + ('implements' typeList)? + classBody + ; + +
(x|y|z)* +Match an alternative within subrule zero or more times. Example: +
+ +annotationName : ID ('.' ID)* ; + +
(x|y|z)+ +Match an alternative within subrule one or more times. Example: +
+ +annotations : (annotation)+ ; + +
+ +You can suffix the `?`, `*`, and `+` subrule operators with the nongreedy operator, which is also a question mark: `??`, `*?`, and `+?`. See Section 15.6, Wildcard Operator and Nongreedy Subrules. + +As a shorthand, you can omit the parentheses for subrules composed of a single alternative with a single rule element reference. For example, `annotation+` is the same as `(annotation)+` and `ID+` is the same as `(ID)+`. Labels also work with the shorthand. `ids+=INT+` make a list of `INT` token objects. + +## Catching Exceptions + +When a syntax error occurs within a rule, ANTLR catches the exception, reports the error, attempts to recover (possibly by consuming more tokens), and then returns from the rule. Every rule is wrapped in a `try/catch/finally` statement: + +``` +void r() throws RecognitionException { + try { + rule-body + } + catch (RecognitionException re) { + _errHandler.reportError(this, re); + _errHandler.recover(this, re); + } + finally { + exitRule(); + } +} +``` + +In Section 9.5, Altering ANTLR’s Error Handling Strategy, we saw how to use a strategy object to alter ANTLR’s error handling. Replacing the strategy changes the strategy for all rules, however. To alter the exception handling for a single rule, specify an exception after the rule definition: + +``` +r : ... + ; + catch[RecognitionException e] { throw e; } +``` + +That example shows how to avoid default error reporting and recovery. r rethrows the exception, which is useful when it makes more sense for a higher-level rule to report the error. Specifying any exception clause, prevents ANTLR from generating a clause to handle `RecognitionException`. + +You can specify other exceptions as well: + +``` +r : ... + ; + catch[FailedPredicateException fpe] { ... } + catch[RecognitionException e] { ... } +``` + +The code snippets inside curly braces and the exception “argument” actions must be written in the target language; Java, in this case. +When you need to execute an action even if an exception occurs, put it into the `finally` clause: + +``` +r : ... + ; + // catch blocks go first + finally { System.out.println("exit rule r"); } +``` + +The finally clause executes right before the rule triggers `exitRule` before returning. If you want to execute an action after the rule finishes matching the alternatives but before it does its cleanup work, use an `after` action. + +Here’s a complete list of exceptions: + + + + + + + + + + + + + + + + + + + + +
Exception nameDescription
RecognitionException +The superclass of all exceptions thrown by an ANTLR-generated recognizer. It’s a subclass of RuntimeException to avoid the hassles of checked exceptions. This exception records where the recognizer (lexer or parser) was in the input, where it was in the ATN (internal graph data structure representing the grammar), the rule invocation stack, and what kind of problem occurred.
NoViableAltException +Indicates that the parser could not decide which of two or more paths to take by looking at the remaining input. This exception tracks the starting token of the offending input and also knows where the parser was in the various paths when the error occurred.
LexerNoViableAltException +The equivalent of NoViableAltException but for lexers only.
InputMismatchException +The current input Token does not match what the parser expected.
FailedPredicateException +A semantic predicate that evaluates to false during prediction renders the surrounding alternative nonviable. Prediction occurs when a rule is predicting which alternative to take. If all viable paths disappear, parser will throw NoViableAltException. This predicate gets thrown by the parser when a semantic predicate evaluates to false outside of prediction, during the normal parsing process of matching tokens and calling rules.
+ +## Rule Attribute Definitions + +There are a number of action-related syntax elements associated with rules to be aware of. Rules can have arguments, return values, and local variables just like functions in a programming language. (Rules can have actions embedded among the rule elements, as we’ll see in Section 15.4, Actions and Attributes.) ANTLR collects all of the variables you define and stores them in the rule context object. These variables are usually called attributes. Here’s the general syntax showing all possible attribute definition locations: + +``` +rulename[args] returns [retvals] locals [localvars] : ... ; +``` + +The attributes defined within those [...] can be used like any other variable. Here is a sample rule that copies parameters to return values: + +``` +// Return the argument plus the integer value of the INT token +add[int x] returns [int result] : '+=' INT {$result = $x + $INT.int;} ; +``` + +As with the grammar level, you can specify rule-level named actions. For rules, the valid names are init and after. As the names imply, parsers execute init actions immediately before trying to match the associated rule and execute after actions immediately after matching the rule. ANTLR after actions do not execute as part of the finally code block of the generated rule function. Use the ANTLR finally action to place code in the generated rule function finally code block. +The actions come after any argument, return value, or local attribute definition actions. The row rule preamble from Section 10.2, Accessing Token and Rule Attributes illustrates the syntax nicely: +actions/CSV.g4 + +``` +/** Derived from rule "row : field (',' field)* '\r'? '\n' ;" */ +row[String[] columns] + returns [Map values] + locals [int col=0] + @init { + $values = new HashMap(); + } + @after { + if ($values!=null && $values.size()>0) { + System.out.println("values = "+$values); + } + } + : ... + ; +``` + +Rule row takes argument columns, returns values, and defines local variable col. The “actions” in square brackets are copied directly into the generated code: + +```java +public class CSVParser extends Parser { + ... + public static class RowContext extends ParserRuleContext { + public String [] columns; + public Map values; + public int col=0; + ... + } + ... +} +``` + +The generated rule functions also specify the rule arguments as function arguments, but they are quickly copied into the local RowContext object: + +```java +public class CSVParser extends Parser { + ... + public final RowContext row(String [] columns) throws RecognitionException { + RowContext _localctx = new RowContext(_ctx, 4, columns); + enterRule(_localctx, RULE_row); + ... + } + ... +} +``` + +ANTLR tracks nested `[...]` within the action so that `String[]` columns is parsed properly. It also tracks angle brackets so that commas within generic type parameters do not signify the start of another attribute. `Map` values is one attribute definition. + +There can be multiple attributes in each action, even for return values. Use a comma to separate attributes within the same action: + +``` +a[Map x, int y] : ... ; +``` + +ANTLR interprets that action to define two arguments, x and y: + +```java +public final AContext a(Map x, int y) + throws RecognitionException +{ + AContext _localctx = new AContext(_ctx, 0, x, y); + enterRule(_localctx, RULE_a); + ... +} +``` + +## Start Rules and EOF + +A start rule is the rule engaged first by the parser; it’s the rule function called by the language application. For example, a language application that parsed to Java code might call `parser.compilationUnit()` for a `JavaParser` object called `parser`. Any rule in the grammar can act as a start rule. + +Start rules don’t necessarily consume all of the input. They consume only as much input as needed to match an alternative of the rule. For example, consider the following rule that matches one, two, or three tokens, depending on the input. + +``` +s : ID + | ID '+' + | ID '+' INT + ; +``` + +Upon `a+3`, rule `s` matches the third alternative. Upon `a+b`, it matches the second alternative and ignores the final `b` token. Upon `a b`, it matches the first alternative, ignoring the `b` token. The parser does not consume the complete input in the latter two cases because rule `s` doesn’t explicitly say that end of file must occur after matching an alternative of the rule. + +This default functionality is very useful for building things like IDEs. Imagine the IDE wanting to parse a method somewhere in the middle of a big Java file. Calling rule `methodDeclaration` should try to match just a method and ignore whatever comes next. + +On the other hand, rules that describe entire input files should reference special predefined-token `EOF`. If they don’t, you might scratch your head for a while wondering why the start rule doesn’t report errors for any input no matter what you give it. Here’s a rule that’s part of a grammar for reading configuration files: + +``` +config : element*; // can "match" even with invalid input. +``` + +Invalid input would cause `config` to return immediately without matching any input and without reporting an error. Here’s the proper specification: + +``` +file : element* EOF; // don't stop early. must match all input +``` \ No newline at end of file diff --git a/doc/predicates.md b/doc/predicates.md new file mode 100644 index 000000000..09998c425 --- /dev/null +++ b/doc/predicates.md @@ -0,0 +1,164 @@ +# Semantic Predicates + +Semantic predicates, `{...}?`, are boolean expressions written in the target language that indicate the validity of continuing the parse along the path "guarded" by the predicate. Predicates can appear anywhere within a parser rule just like actions can, but only those appearing on the left edge of alternatives can affect prediction (choosing between alternatives). This section provides all of the fine print regarding the use of semantic predicates in parser and lexer rules. Let's start out by digging deeper into how the parser incorporates predicates into parsing decisions. + +## Making Predicated Parsing Decisions + +ANTLR's general decision-making strategy is to find all viable alternatives and then ignore the alternatives guarded with predicates that currently evaluate to false. (A viable alternative is one that matches the current input.) If more than one viable alternative remains, the parser chooses the alternative specified first in the decision. + +Consider a variant of C++ where array references also use parentheses instead of square brackets. If we only predicate one of the alternatives, we still have an ambiguous decision in expr: + +``` +expr: ID '(' expr ')' // array reference (ANTLR picks this one) + | {istype()}? ID '(' expr ')' // ctor-style typecast + | ID '(' expr ')' // function call + ; +``` + +In this case, all three alternatives are viable for input `x(i)`. When `x` is not a type name, the predicate evaluates to false, leaving only the first and third alternatives as possible matches for expr. ANTLR automatically chooses the first alternative matching the array reference to resolve the ambiguity. Leaving ANTLR with more than one viable alternative because of too few predicates is probably not a good idea. It's best to cover n viable alternatives with at least n-1 predicates. In other words, don't build rules like expr with too few predicates. + +Sometimes, the parser finds multiple visible predicates associated with a single choice. No worries. ANTLR just combines the predicates with appropriate logical operators to conjure up a single meta-predicate on-the-fly. + +For example, the decision in rule `stat` joins the predicates from both alternatives of expr with the `||` operator to guard the second stat alternative: + +``` +stat: decl | expr ; +decl: ID ID ; +expr: {istype()}? ID '(' expr ')' // ctor-style typecast + | {isfunc()}? ID '(' expr ')' // function call + ; +``` + +The parser will only predict an expr from stat when `istype()||isfunc()` evaluates to true. This makes sense because the parser should only choose to match an expression if the upcoming `ID` is a type name or function name. It wouldn't make sense to just test one of the predicates in this case. Note that, when the parser gets to `expr` itself, the parsing decision tests the predicates individually, one for each alternative. + +If multiple predicates occur in a sequence, the parser joins them with the `&&` operator. For example, consider changing `stat` to include a predicate before the call `toexpr`: + +``` +stat: decl | {java5}? expr ; +``` + +Now, the parser would only predict the second alternative if `java5&&(istype()||isfunc())` evaluated to true. + +Turning to the code inside the predicates themselves now, keep in mind the following guidelines. + +Even when the parser isn't making decisions, predicates can deactivate alternatives, causing rules to fail. This happens when a rule only has a single alternative. There is no choice to make, but ANTLR evaluates the predicate as part of the normal parsing process, just like it does for actions. That means that the following rule always fails to match. + +``` +prog: {false}? 'return' INT ; // throws FailedPredicateException +``` + +ANTLR converts `{false}?` in the grammar to a conditional in the generated parser: + +``` +if ( !false ) throw new FailedPredicateException(...); +``` + +So far, all of the predicates we've seen have been visible and available to the prediction process, but that's not always the case. + +## Finding Visible Predicates + +The parser will not evaluate predicates during prediction that occur after an action or token reference. Let's think about the relationship between actions and predicates first. + +ANTLR has no idea what's inside the raw code of an action and so it must assume any predicate could depend on side effects of that action. Imagine an action that computed value `x` and a predicate that tested `x`. Evaluating that predicate before the action executed to create `x` would violate the implied order of operations within the grammar. + +More importantly, the parser can't execute actions until it has decided which alternative to match. That's because actions have side effects and we can't undo things like print statements. For example, in the following rule, the parser can't execute the action in front of the `{java5}?` predicate before committing to that alternative. + +``` +@members {boolean allowgoto=false;} +stat: {System.out.println("goto"); allowgoto=true;} {java5}? 'goto' ID ';' + | ... + ; +``` + +If we can't execute the action during prediction, we shouldn't evaluate the `{java5}?` predicate because it depends on that action. + +The prediction process also can't see through token references. Token references have the side effect of advancing the input one symbol. A predicate that tested the current input symbol would find itself out of sync if the parser shifted it over the token reference. For example, in the following grammar, the predicates expect `getCurrentToken` to return an `ID` token. + +``` +stat: '{' decl '}' + | '{' stat '}' + ; +decl: {istype(getCurrentToken().getText())}? ID ID ';' ; +expr: {isvar(getCurrentToken().getText())}? ID ; +``` + +The decision in stat can't test those predicates because, at the start of stat, the current token is a left curly. To preserve the semantics, ANTLR won't test the predicates in that decision. + +Visible predicates are those that prediction encounters before encountering an action or token. The prediction process ignores nonvisible predicates, treating them as if they don't exist. + +In rare cases, the parser won't be able to use a predicate, even if it's visible to a particular decision. That brings us to our next fine print topic. + +## Using Context-Dependent Predicates + +A predicate that depends on a parameter or local variable of the surrounding rule, is considered a context-dependent predicate. Clearly, we can only evaluate such predicates within the rules in which they're defined. For example, it makes no sense for the decision in prog below to test context-dependent predicate `{$i<=5}?`. That `$i` local variable is not even defined in `prog`. + +``` +prog: vec5 + | ... + ; +vec5 +locals [int i=1] + : ( {$i<=5}? INT {$i++;} )* // match 5 INTs + ; +``` + +ANTLR ignores context-dependent predicates that it can't evaluate in the proper context. Normally the proper context is simply the rule defining the predicate, but sometimes the parser can't even evaluate a context-dependent predicate from within the same rule! Detecting these cases is done on-the-fly at runtime during adaptive LL(*) prediction. + +For example, prediction for the optional branch of the else subrule in stat below "falls off" the end of stat and continues looking for symbols in the invoking prog rule. + +``` +prog: stat+ ; // stat can follow stat +stat +locals [int i=0] + : {$i==0}? 'if' expr 'then' stat {$i=5;} ('else' stat)? + | 'break' ';' + ; +``` + +The prediction process is trying to figure out what can follow an if statement other than an else clause. Since the input can have multiple stats in a row, the prediction for the optional branch of the else subrule reenters stat. This time, of course, it gets a new copy of `$i` with a value of 0, not 5. ANTLR ignores context-dependent predicate `{$i==0}?` because it knows that the parser isn't in the original stat call. The predicate would test a different version of `$i` so the parser can't evaluate it. + +The fine print for predicates in the lexer more or less follow these same guidelines, except of course lexer rules can't have parameters and local variables. Let's look at all of the lexer-specific guidelines in the next section. + +## Predicates in Lexer Rules + +In parser rules, predicates must appear on the left edge of alternatives to aid in alternative prediction. Lexers, on the other hand, prefer predicates on the right edge of lexer rules because they choose rules after seeing a token's entire text. Predicates in lexer rules can technically be anywhere within the rule. Some positions might be more or less efficient than others; ANTLR makes no guarantees about the optimal spot. A predicate in a lexer rule might be executed multiple times even during a single token match. You can embed multiple predicates per lexer rule and they are evaluated as the lexer reaches them during matching. + +Loosely speaking, the lexer's goal is to choose the rule that matches the most input characters. At each character, the lexer decides which rules are still viable. Eventually, only a single rule will be still viable. At that point, the lexer creates a token object according the rule's token type and matched text. + +Sometimes the lexer is faced with more than a single viable matching rule. For example, input enum would match an `ENUM` rule and an `ID` rule. If the next character after enum is a space, neither rule can continue. The lexer resolves the ambiguity by choosing the viable rule specified first in the grammar. That's why we have to place keyword rules before an identifier rule like this: + +``` +ENUM : 'enum' ; +ID : [a-z]+ ; +``` + +If, on the other hand, the next character after input `enum` is a letter, then only `ID` is viable. + +Predicates come into play by pruning the set of viable lexer rules. When the lexer encounters a false predicate, it deactivates that rule just like parsers deactivate alternatives with false predicates. + +Like parser predicates, lexer predicates can't depend on side effects from lexer actions. That's because actions can only execute after the lexer positively identifies the rule to match. Since predicates are part of the rule selection process, they can't rely on action side effects. Lexer actions must appear after predicates in lexer rules. As an example, here's another way to match enum as a keyword in the lexer: + +``` +ENUM: [a-z]+ {getText().equals("enum")}? + {System.out.println("enum!");} + ; +ID : [a-z]+ {System.out.println("ID "+getText());} ; +``` + +The print action in `ENUM` appears last and executes only if the current input matches `[a-z]+` and the predicate is true. Let's build and test `Enum3` to see if it distinguishes between enum and an identifier: + +```bash +$ antlr4 Enum3.g4 +$ javac Enum3.java +$ grun Enum3 tokens +=> enum abc +=> EOF +<= enum! + ID abc +``` + +That works great, but it's really just for instructional purposes. It's easier to understand and more efficient to match enum keywords with a simple rule like this: + +``` +ENUM : 'enum' ; +``` diff --git a/doc/python-target.md b/doc/python-target.md new file mode 100644 index 000000000..f081cb099 --- /dev/null +++ b/doc/python-target.md @@ -0,0 +1,128 @@ +# Python (2 and 3) + +The examples from the ANTLR 4 book converted to Python are [here](https://github.com/jszheng/py3antlr4book). + +There are 2 Python targets: `Python2` and `Python3`. This is because there is only limited compatibility between those 2 versions of the language. Please refer to the [Python documentation](https://wiki.python.org/moin/Python2orPython3) for full details. + +How to create a Python lexer or parser? +This is pretty much the same as creating a Java lexer or parser, except you need to specify the language target, for example: + +``` +$ antlr4 -Dlanguage=Python2 MyGrammar.g4 +``` + +or + +``` +$ antlr4 -Dlanguage=Python3 MyGrammar.g4 +``` + +For a full list of antlr4 tool options, please visit the tool documentation page. + +## Where can I get the runtime? + +Once you've generated the lexer and/or parser code, you need to download the runtime. The Python runtimes are available from PyPI: + +* https://pypi.python.org/pypi/antlr4-python2-runtime/ +* https://pypi.python.org/pypi/antlr4-python3-runtime/ + +The runtimes are provided in the form of source code, so no additional installation is required. + +We will not document here how to refer to the runtime from your Python project, since this would differ a lot depending on your project type and IDE. + +## How do I run the generated lexer and/or parser? + +Let's suppose that your grammar is named, as above, "MyGrammar". Let's suppose this parser comprises a rule named "StartRule". The tool will have generated for you the following files: + +* MyGrammarLexer.py +* MyGrammarParser.py +* MyGrammarListener.py (if you have not activated the -no-listener option) +* MyGrammarVisitor.py (if you have activated the -visitor option) + +(Developers used to Java/C# AntLR will notice that there is no base listener or visitor generated, this is because Python having no support for interfaces, the generated listener and visitor are fully fledged classes) + +Now a fully functioning script might look like the following: + +```python +from antlr4 import * +from MyGrammarLexer import MyGrammarLexer +from MyGrammarParser import MyGrammarParser + +def main(argv): + input = FileStream(argv[1]) + lexer = MyGrammarLexer(input) + stream = CommonTokenStream(lexer) + parser = MyGrammarParser(stream) + tree = parser.StartRule() + +if __name__ == '__main__': + main(sys.argv) +``` + +This program will work. But it won't be useful unless you do one of the following: + +* you visit the parse tree using a custom listener +* you visit the parse tree using a custom visitor +* your grammar comprises production code (like ANTLR3) + +(please note that production code is target specific, so you can't have multi target grammars that include production code, except for very limited use cases, see below) + +## How do I create and run a custom listener? + +Let's suppose your MyGrammar grammar comprises 2 rules: "key" and "value". The antlr4 tool will have generated the following listener: + +```python +class MyGrammarListener(ParseTreeListener): + def enterKey(self, ctx): + pass + def exitKey(self, ctx): + pass + def enterValue(self, ctx): + pass + def exitValue(self, ctx): + pass +``` + +In order to provide custom behavior, you might want to create the following class: + +```python +class KeyPrinter(MyGrammarListener): + def exitKey(self, ctx): + print("Oh, a key!") +``` + +In order to execute this listener, you would simply add the following lines to the above code: + +``` + ... + tree = parser.StartRule() - only repeated here for reference + printer = KeyPrinter() + walker = ParseTreeWalker() + walker.walk(printer, tree) +``` + +Further information can be found from the ANTLR 4 definitive guide. + +The Python implementation of ANTLR is as close as possible to the Java one, so you shouldn't find it difficult to adapt the examples for Python. + +## Target agnostic grammars + +If your grammar is targeted to Python only, you may ignore the following. But if your goal is to get your Java parser to also run in Python, then you might find it useful. + +1. Do not embed production code inside your grammar. This is not portable and will not be. Move all your code to listeners or visitors. +1. The only production code absolutely required to sit with the grammar should be semantic predicates, like: +``` +ID {$text.equals("test")}? +``` + +Unfortunately, this is not portable, but you can work around it. The trick involves: + +* deriving your parser from a parser you provide, such as BaseParser +* implementing utility methods in this BaseParser, such as "isEqualText" +* adding a "self" field to the Java/C# BaseParser, and initialize it with "this" + +Thanks to the above, you should be able to rewrite the above semantic predicate as follows: + +``` +ID {$self.isEqualText($text,"test")}? +``` diff --git a/doc/releasing-antlr.md b/doc/releasing-antlr.md new file mode 100644 index 000000000..05815086e --- /dev/null +++ b/doc/releasing-antlr.md @@ -0,0 +1,268 @@ +# Cutting an ANTLR Release + +## Github + +Create a pre-release or full release at github; [Example 4.5-rc-1](https://github.com/antlr/antlr4/releases/tag/4.5-rc-1). + +Wack any existing tag as mvn will create one and it fails if already there. + +``` +$ git tag -d 4.5.2 +$ git push origin :refs/tags/4.5.2 +$ git push upstream :refs/tags/4.5.2 +``` + +## Bump version + +Edit the repository looking for 4.5 or whatever and update it. Bump version in the following files: + + * runtime/Java/src/org/antlr/v4/runtime/RuntimeMetaData.java + * runtime/Python2/setup.py + * runtime/Python2/src/antlr4/Recognizer.py + * runtime/Python3/setup.py + * runtime/Python3/src/antlr4/Recognizer.py + * runtime/CSharp/runtime/CSharp/Antlr4.Runtime/Properties/AssemblyInfo.cs + * runtime/JavaScript/src/antlr4/package.json + * runtime/JavaScript/src/antlr4/Recognizer.js + * tool/src/org/antlr/v4/codegen/target/CSharpTarget.java + * tool/src/org/antlr/v4/codegen/target/JavaScriptTarget.java + * tool/src/org/antlr/v4/codegen/target/Python2Target.java + * tool/src/org/antlr/v4/codegen/target/Python3Target.java + +Here is a simple script to display any line from the critical files with, say, `4.5` in it: + +```bash +find /tmp/antlr4 -type f -exec grep -l '4\.5' {} \; +``` + +Commit to repository. + +## Maven Repository Settings + +First, make sure you have maven set up to communicate with staging servers etc... Create file `~/.m2/settings.xml` with appropriate username/password for staging server and gpg.keyname/passphrase for signing. Make sure it has strict visibility privileges to just you. On unix, it looks like: + +```bash +beast:~/.m2 $ ls -l settings.xml +-rw------- 1 parrt staff 914 Jul 15 14:42 settings.xml +``` + +Here is the file template + +```xml + + + + + + sonatype-nexus-staging + sonatype-username + XXX + + + sonatype-nexus-snapshots + sonatype-username + XXX + + + + + + false + + + UUU + XXX + + + + +``` + +## Maven release + +The maven deploy lifecycle phased deploys the artifacts and the poms for the ANTLR project to the [sonatype remote staging server](https://oss.sonatype.org/content/repositories/snapshots/). + +```bash +mvn deploy -DskipTests +``` + +With JDK 1.7 (not 6 or 8), do this: + +```bash +mvn release:prepare -Darguments="-DskipTests" +``` + +It will start out by asking you the version number: + +``` +... +What is the release version for "ANTLR 4"? (org.antlr:antlr4-master) 4.5.2: : 4.5.2 +What is the release version for "ANTLR 4 Runtime"? (org.antlr:antlr4-runtime) 4.5.2: : +What is the release version for "ANTLR 4 Tool"? (org.antlr:antlr4) 4.5.2: : +What is the release version for "ANTLR 4 Maven plugin"? (org.antlr:antlr4-maven-plugin) 4.5.2: : +What is the release version for "ANTLR 4 Runtime Test Generator"? (org.antlr:antlr4-runtime-testsuite) 4.5.2: : +What is the release version for "ANTLR 4 Tool Tests"? (org.antlr:antlr4-tool-testsuite) 4.5.2: : +What is SCM release tag or label for "ANTLR 4"? (org.antlr:antlr4-master) antlr4-master-4.5.2: : 4.5.2 +What is the new development version for "ANTLR 4"? (org.antlr:antlr4-master) 4.5.3-SNAPSHOT: +... +``` + +Maven will go through your pom.xml files to update versions from 4.5.2-SNAPSHOT to 4.5.2 for release and then to 4.5.3-SNAPSHOT after release, which is done with: + +```bash +mvn release:perform -Darguments="-DskipTests" +``` + +Maven will use git to push pom.xml changes. (big smile) + +Now, go here: + +    [https://oss.sonatype.org/#welcome](https://oss.sonatype.org/#welcome) + +and on the left click "Staging Repositories". You click the staging repo and close it, then you refresh, click it and release it. It's done when you see it here: + +    [http://repo1.maven.org/maven2/org/antlr/antlr4-runtime/](http://repo1.maven.org/maven2/org/antlr/antlr4-runtime/) + +Copy the jars to antlr.org site and update download/index.html + +```bash +cp ~/.m2/repository/org/antlr/antlr4-runtime/4.5.2/antlr4-runtime-4.5.2.jar ~/antlr/sites/website-antlr4/download/antlr-runtime-4.5.2.jar +cp ~/.m2/repository/org/antlr/antlr4/4.5.2/antlr4-4.5.2.jar ~/antlr/sites/website-antlr4/download/antlr-4.5.2-complete.jar +cd ~/antlr/sites/website-antlr4/download +git add antlr-4.5.2-complete.jar +git add antlr-runtime-4.5.2.jar +``` + +Update on site: + +* download.html +* index.html +* api/index.html +* download/index.html +* scripts/topnav.js + +``` +git commit -a -m 'add 4.5.2 jars' +git push origin gh-pages +``` + +## Deploying Targets + +### JavaScript + +```bash +cd runtime/JavaScript/src +zip -r /tmp/antlr-javascript-runtime-4.5.2.zip antlr4 +cp /tmp/antlr-javascript-runtime-4.5.2.zip ~/antlr/sites/website-antlr4/download +# git add, commit, push +``` + +Move target to website + +```bash +pushd ~/antlr/sites/website-antlr4/download +git add antlr-javascript-runtime-4.5.2.zip +git commit -a -m 'update JS runtime' +git push origin gh-pages +popd +``` + +### CSharp + +```bash +cd ~/antlr/code/antlr4/runtime/CSharp/runtime/CSharp +# kill previous ones manually as "xbuild /t:Clean" didn't seem to do it +rm Antlr4.Runtime/bin/net20/Release/Antlr4.Runtime.dll +rm Antlr4.Runtime/obj/net20/Release/Antlr4.Runtime.dll +# build +xbuild /p:Configuration=Release Antlr4.Runtime/Antlr4.Runtime.mono.csproj +# zip it up to get a version number on zip filename +zip --junk-paths /tmp/antlr-csharp-runtime-4.5.2.zip Antlr4.Runtime/bin/net35/Release/Antlr4.Runtime.dll +cp /tmp/antlr-csharp-runtime-4.5.2.zip ~/antlr/sites/website-antlr4/download +``` + +Move target to website + +```bash +pushd ~/antlr/sites/website-antlr4/download +git add antlr-csharp-runtime-4.5.2.zip +git commit -a -m 'update C# runtime' +git push origin gh-pages +popd +``` + +### Python + +The Python targets get deployed with `setup.py`. First, set up `~/.pypirc` with tight privileges: + +```bash +beast:~ $ ls -l ~/.pypirc +-rw------- 1 parrt staff 267 Jul 15 17:02 /Users/parrt/.pypirc +``` + +``` +[distutils] # this tells distutils what package indexes you can push to +index-servers = + pypi + pypitest + +[pypi] +repository: https://pypi.python.org/pypi +username: parrt +password: XXX + +[pypitest] +repository: https://testpypi.python.org/pypi +username: parrt +``` + +Then run the usual python set up stuff: + +```bash +cd ~/antlr/code/antlr4/runtime/Python2 +# assume you have ~/.pypirc set up +python setup.py register -r pypi +python setup.py sdist bdist_wininst upload -r pypi +``` + +and do again for Python 3 target + +```bash +cd ~/antlr/code/antlr4/runtime/Python3 +# assume you have ~/.pypirc set up +python setup.py register -r pypi +python setup.py sdist bdist_wininst upload -r pypi +``` + +Add links to the artifacts from download.html + +## Update javadoc for runtime and tool + +First gen javadoc: + +```bash +$ cd antlr4 +$ mvn -DskipTests javadoc:jar install +``` + +Then copy to website: + +```bash +cd ~/antlr/sites/website-antlr4/api +git checkout gh-pages +git pull origin gh-pages +cd Java +jar xvf ~/.m2/repository/org/antlr/antlr4-runtime/4.5.2/antlr4-runtime-4.5.2-javadoc.jar +cd ../JavaTool +jar xvf ~/.m2/repository/org/antlr/antlr4/4.5.2/antlr4-4.5.2-javadoc.jar +git commit -a -m 'freshen api doc' +git push origin gh-pages +``` + +## Update Intellij plug-in + +Rebuild antlr plugin with new antlr jar. diff --git a/doc/resources.md b/doc/resources.md new file mode 100644 index 000000000..8ee31a3c8 --- /dev/null +++ b/doc/resources.md @@ -0,0 +1,33 @@ +# Articles and Resources + +## Books + + + + + + +## Articles + +* [Playing with ANTLR4, Primefaces extensions for Code Mirror and web-based DSLs](http://leonotepad.blogspot.com.br/2014/01/playing-with-antlr4-primefaces.html) +* [A Tale of Two Grammars](https://dexvis.wordpress.com/2012/11/22/a-tale-of-two-grammars/) +* [ANTLR 4: using the lexer, parser and listener with example grammar](http://www.theendian.com/blog/antlr-4-lexer-parser-and-listener-with-example-grammar/) +* [Creating External DSLs using ANTLR and Java](http://java.dzone.com/articles/creating-external-dsls-using) + +## Presentations + +* [Introduction to ANTLR 4 by Oliver Zeigermann](https://docs.google.com/presentation/d/1XS_VIdicCQVonPK6AGYkWTp-3VeHfGuD2l8yNMpAfuQ/edit#slide=id.p) + +## Videos + + + +## Resources + +* [Stack overflow ANTLR4 tag](http://stackoverflow.com/questions/tagged/antlr4) +* [Antlr 4 with C# and Visual Studio 2012](http://programming-pages.com/2013/12/14/antlr-4-with-c-and-visual-studio-2012/) +* [ANTLR Language Support in VisualStudio](http://visualstudiogallery.msdn.microsoft.com/25b991db-befd-441b-b23b-bb5f8d07ee9f) +* [Upgrading to ANTLR 4 with C#](http://andrevdm.blogspot.com/2013/08/upgrading-to-antlr-4-with-c.html) +* [Generate parsers with Antlr4 via Maven](http://ljelonek.wordpress.com/2014/01/03/generate-parsers-with-antlr4-via-maven/) +* [Exploring ANTLR v4](http://johnsquibb.like97.com/blog/read/exploring-antlr-v4) +* [antlr4dart](http://pub.dartlang.org/packages/antlr4dart) \ No newline at end of file diff --git a/doc/resources/worker-base.js b/doc/resources/worker-base.js new file mode 100644 index 000000000..3494b3916 --- /dev/null +++ b/doc/resources/worker-base.js @@ -0,0 +1,1079 @@ +"no use strict"; +(function(e) { + if (typeof e.window != "undefined" && e.document) return; + e.console = function() { + var e = Array.prototype.slice.call(arguments, 0); + postMessage({ + type: "log", + data: e + }) + }, e.console.error = e.console.warn = e.console.log = e.console.trace = e.console, e.window = e, e.ace = e, e.onerror = function(e, t, n, r, i) { + postMessage({ + type: "error", + data: { + message: e, + file: t, + line: n, + col: r, + stack: i.stack + } + }) + }, e.normalizeModule = function(t, n) { + if (n.indexOf("!") !== -1) { + var r = n.split("!"); + return e.normalizeModule(t, r[0]) + "!" + e.normalizeModule(t, r[1]) + } + if (n.charAt(0) == ".") { + var i = t.split("/") + .slice(0, -1) + .join("/"); + n = (i ? i + "/" : "") + n; + while (n.indexOf(".") !== -1 && s != n) { + var s = n; + n = n.replace(/^\.\//, "") + .replace(/\/\.\//, "/") + .replace(/[^\/]+\/\.\.\//, "") + } + } + return n + }, e.require = function(t, n) { + n || (n = t, t = null); + if (!n.charAt) throw new Error("worker.js require() accepts only (parentId, id) as arguments"); + n = e.normalizeModule(t, n); + var r = e.require.modules[n]; + if (r) return r.initialized || (r.initialized = !0, r.exports = r.factory() + .exports), r.exports; + var i = n.split("/"); + if (!e.require.tlns) return console.log("unable to load " + n); + i[0] = e.require.tlns[i[0]] || i[0]; + var s = i.join("/") + ".js"; + return e.require.id = n, importScripts(s), e.require(t, n) + }, e.require.modules = {}, e.require.tlns = {}, e.define = function(t, n, r) { + arguments.length == 2 ? (r = n, typeof t != "string" && (n = t, t = e.require.id)) : arguments.length == 1 && (r = t, n = [], t = e.require.id); + if (typeof r != "function") { + e.require.modules[t] = { + exports: r, + initialized: !0 + }; + return + } + n.length || (n = ["require", "exports", "module"]); + var i = function(n) { + return e.require(t, n) + }; + e.require.modules[t] = { + exports: {}, + factory: function() { + var e = this, + t = r.apply(this, n.map(function(t) { + switch (t) { + case "require": + return i; + case "exports": + return e.exports; + case "module": + return e; + default: + return i(t) + } + })); + return t && (e.exports = t), e + } + } + }, e.define.amd = {}, e.initBaseUrls = function(t) { + require.tlns = t + }, e.initSender = function() { + var n = e.require("ace/lib/event_emitter") + .EventEmitter, + r = e.require("ace/lib/oop"), + i = function() {}; + return function() { + r.implement(this, n), this.callback = function(e, t) { + postMessage({ + type: "call", + id: t, + data: e + }) + }, this.emit = function(e, t) { + postMessage({ + type: "event", + name: e, + data: t + }) + } + }.call(i.prototype), new i + }; + var t = e.main = null, + n = e.sender = null; + e.onmessage = function(r) { + var i = r.data; + if (i.command) { + if (!t[i.command]) throw new Error("Unknown command:" + i.command); + t[i.command].apply(t, i.args) + } else if (i.init) { + initBaseUrls(i.tlns), require("ace/lib/es5-shim"), n = e.sender = initSender(); + var s = require(i.module)[i.classname]; + t = e.main = new s(n) + } else i.event && n && n._signal(i.event, i.data) + } +})(this), ace.define("ace/lib/oop", ["require", "exports", "module"], function(e, t, n) { + "use strict"; + t.inherits = function(e, t) { + e.super_ = t, e.prototype = Object.create(t.prototype, { + constructor: { + value: e, + enumerable: !1, + writable: !0, + configurable: !0 + } + }) + }, t.mixin = function(e, t) { + for (var n in t) e[n] = t[n]; + return e + }, t.implement = function(e, n) { + t.mixin(e, n) + } +}), ace.define("ace/lib/event_emitter", ["require", "exports", "module"], function(e, t, n) { + "use strict"; + var r = {}, + i = function() { + this.propagationStopped = !0 + }, + s = function() { + this.defaultPrevented = !0 + }; + r._emit = r._dispatchEvent = function(e, t) { + this._eventRegistry || (this._eventRegistry = {}), this._defaultHandlers || (this._defaultHandlers = {}); + var n = this._eventRegistry[e] || [], + r = this._defaultHandlers[e]; + if (!n.length && !r) return; + if (typeof t != "object" || !t) t = {}; + t.type || (t.type = e), t.stopPropagation || (t.stopPropagation = i), t.preventDefault || (t.preventDefault = s), n = n.slice(); + for (var o = 0; o < n.length; o++) { + n[o](t, this); + if (t.propagationStopped) break + } + if (r && !t.defaultPrevented) return r(t, this) + }, r._signal = function(e, t) { + var n = (this._eventRegistry || {})[e]; + if (!n) return; + n = n.slice(); + for (var r = 0; r < n.length; r++) n[r](t, this) + }, r.once = function(e, t) { + var n = this; + t && this.addEventListener(e, function r() { + n.removeEventListener(e, r), t.apply(null, arguments) + }) + }, r.setDefaultHandler = function(e, t) { + var n = this._defaultHandlers; + n || (n = this._defaultHandlers = { + _disabled_: {} + }); + if (n[e]) { + var r = n[e], + i = n._disabled_[e]; + i || (n._disabled_[e] = i = []), i.push(r); + var s = i.indexOf(t); + s != -1 && i.splice(s, 1) + } + n[e] = t + }, r.removeDefaultHandler = function(e, t) { + var n = this._defaultHandlers; + if (!n) return; + var r = n._disabled_[e]; + if (n[e] == t) { + var i = n[e]; + r && this.setDefaultHandler(e, r.pop()) + } else if (r) { + var s = r.indexOf(t); + s != -1 && r.splice(s, 1) + } + }, r.on = r.addEventListener = function(e, t, n) { + this._eventRegistry = this._eventRegistry || {}; + var r = this._eventRegistry[e]; + return r || (r = this._eventRegistry[e] = []), r.indexOf(t) == -1 && r[n ? "unshift" : "push"](t), t + }, r.off = r.removeListener = r.removeEventListener = function(e, t) { + this._eventRegistry = this._eventRegistry || {}; + var n = this._eventRegistry[e]; + if (!n) return; + var r = n.indexOf(t); + r !== -1 && n.splice(r, 1) + }, r.removeAllListeners = function(e) { + this._eventRegistry && (this._eventRegistry[e] = []) + }, t.EventEmitter = r +}), ace.define("ace/range", ["require", "exports", "module"], function(e, t, n) { + "use strict"; + var r = function(e, t) { + return e.row - t.row || e.column - t.column + }, + i = function(e, t, n, r) { + this.start = { + row: e, + column: t + }, this.end = { + row: n, + column: r + } + }; + (function() { + this.isEqual = function(e) { + return this.start.row === e.start.row && this.end.row === e.end.row && this.start.column === e.start.column && this.end.column === e.end.column + }, this.toString = function() { + return "Range: [" + this.start.row + "/" + this.start.column + "] -> [" + this.end.row + "/" + this.end.column + "]" + }, this.contains = function(e, t) { + return this.compare(e, t) == 0 + }, this.compareRange = function(e) { + var t, n = e.end, + r = e.start; + return t = this.compare(n.row, n.column), t == 1 ? (t = this.compare(r.row, r.column), t == 1 ? 2 : t == 0 ? 1 : 0) : t == -1 ? -2 : (t = this.compare(r.row, r.column), t == -1 ? -1 : t == 1 ? 42 : 0) + }, this.comparePoint = function(e) { + return this.compare(e.row, e.column) + }, this.containsRange = function(e) { + return this.comparePoint(e.start) == 0 && this.comparePoint(e.end) == 0 + }, this.intersects = function(e) { + var t = this.compareRange(e); + return t == -1 || t == 0 || t == 1 + }, this.isEnd = function(e, t) { + return this.end.row == e && this.end.column == t + }, this.isStart = function(e, t) { + return this.start.row == e && this.start.column == t + }, this.setStart = function(e, t) { + typeof e == "object" ? (this.start.column = e.column, this.start.row = e.row) : (this.start.row = e, this.start.column = t) + }, this.setEnd = function(e, t) { + typeof e == "object" ? (this.end.column = e.column, this.end.row = e.row) : (this.end.row = e, this.end.column = t) + }, this.inside = function(e, t) { + return this.compare(e, t) == 0 ? this.isEnd(e, t) || this.isStart(e, t) ? !1 : !0 : !1 + }, this.insideStart = function(e, t) { + return this.compare(e, t) == 0 ? this.isEnd(e, t) ? !1 : !0 : !1 + }, this.insideEnd = function(e, t) { + return this.compare(e, t) == 0 ? this.isStart(e, t) ? !1 : !0 : !1 + }, this.compare = function(e, t) { + return !this.isMultiLine() && e === this.start.row ? t < this.start.column ? -1 : t > this.end.column ? 1 : 0 : e < this.start.row ? -1 : e > this.end.row ? 1 : this.start.row === e ? t >= this.start.column ? 0 : -1 : this.end.row === e ? t <= this.end.column ? 0 : 1 : 0 + }, this.compareStart = function(e, t) { + return this.start.row == e && this.start.column == t ? -1 : this.compare(e, t) + }, this.compareEnd = function(e, t) { + return this.end.row == e && this.end.column == t ? 1 : this.compare(e, t) + }, this.compareInside = function(e, t) { + return this.end.row == e && this.end.column == t ? 1 : this.start.row == e && this.start.column == t ? -1 : this.compare(e, t) + }, this.clipRows = function(e, t) { + if (this.end.row > t) var n = { + row: t + 1, + column: 0 + }; + else if (this.end.row < e) var n = { + row: e, + column: 0 + }; + if (this.start.row > t) var r = { + row: t + 1, + column: 0 + }; + else if (this.start.row < e) var r = { + row: e, + column: 0 + }; + return i.fromPoints(r || this.start, n || this.end) + }, this.extend = function(e, t) { + var n = this.compare(e, t); + if (n == 0) return this; + if (n == -1) var r = { + row: e, + column: t + }; + else var s = { + row: e, + column: t + }; + return i.fromPoints(r || this.start, s || this.end) + }, this.isEmpty = function() { + return this.start.row === this.end.row && this.start.column === this.end.column + }, this.isMultiLine = function() { + return this.start.row !== this.end.row + }, this.clone = function() { + return i.fromPoints(this.start, this.end) + }, this.collapseRows = function() { + return this.end.column == 0 ? new i(this.start.row, 0, Math.max(this.start.row, this.end.row - 1), 0) : new i(this.start.row, 0, this.end.row, 0) + }, this.toScreenRange = function(e) { + var t = e.documentToScreenPosition(this.start), + n = e.documentToScreenPosition(this.end); + return new i(t.row, t.column, n.row, n.column) + }, this.moveBy = function(e, t) { + this.start.row += e, this.start.column += t, this.end.row += e, this.end.column += t + } + }) + .call(i.prototype), i.fromPoints = function(e, t) { + return new i(e.row, e.column, t.row, t.column) + }, i.comparePoints = r, i.comparePoints = function(e, t) { + return e.row - t.row || e.column - t.column + }, t.Range = i +}), ace.define("ace/anchor", ["require", "exports", "module", "ace/lib/oop", "ace/lib/event_emitter"], function(e, t, n) { + "use strict"; + var r = e("./lib/oop"), + i = e("./lib/event_emitter") + .EventEmitter, + s = t.Anchor = function(e, t, n) { + this.$onChange = this.onChange.bind(this), this.attach(e), typeof n == "undefined" ? this.setPosition(t.row, t.column) : this.setPosition(t, n) + }; + (function() { + r.implement(this, i), this.getPosition = function() { + return this.$clipPositionToDocument(this.row, this.column) + }, this.getDocument = function() { + return this.document + }, this.$insertRight = !1, this.onChange = function(e) { + var t = e.data, + n = t.range; + if (n.start.row == n.end.row && n.start.row != this.row) return; + if (n.start.row > this.row) return; + if (n.start.row == this.row && n.start.column > this.column) return; + var r = this.row, + i = this.column, + s = n.start, + o = n.end; + if (t.action === "insertText") + if (s.row === r && s.column <= i) { + if (s.column !== i || !this.$insertRight) s.row === o.row ? i += o.column - s.column : (i -= s.column, r += o.row - s.row) + } else s.row !== o.row && s.row < r && (r += o.row - s.row); + else t.action === "insertLines" ? (s.row !== r || i !== 0 || !this.$insertRight) && s.row <= r && (r += o.row - s.row) : t.action === "removeText" ? s.row === r && s.column < i ? o.column >= i ? i = s.column : i = Math.max(0, i - (o.column - s.column)) : s.row !== o.row && s.row < r ? (o.row === r && (i = Math.max(0, i - o.column) + s.column), r -= o.row - s.row) : o.row === r && (r -= o.row - s.row, i = Math.max(0, i - o.column) + s.column) : t.action == "removeLines" && s.row <= r && (o.row <= r ? r -= o.row - s.row : (r = s.row, i = 0)); + this.setPosition(r, i, !0) + }, this.setPosition = function(e, t, n) { + var r; + n ? r = { + row: e, + column: t + } : r = this.$clipPositionToDocument(e, t); + if (this.row == r.row && this.column == r.column) return; + var i = { + row: this.row, + column: this.column + }; + this.row = r.row, this.column = r.column, this._signal("change", { + old: i, + value: r + }) + }, this.detach = function() { + this.document.removeEventListener("change", this.$onChange) + }, this.attach = function(e) { + this.document = e || this.document, this.document.on("change", this.$onChange) + }, this.$clipPositionToDocument = function(e, t) { + var n = {}; + return e >= this.document.getLength() ? (n.row = Math.max(0, this.document.getLength() - 1), n.column = this.document.getLine(n.row) + .length) : e < 0 ? (n.row = 0, n.column = 0) : (n.row = e, n.column = Math.min(this.document.getLine(n.row) + .length, Math.max(0, t))), t < 0 && (n.column = 0), n + } + }) + .call(s.prototype) +}), ace.define("ace/document", ["require", "exports", "module", "ace/lib/oop", "ace/lib/event_emitter", "ace/range", "ace/anchor"], function(e, t, n) { + "use strict"; + var r = e("./lib/oop"), + i = e("./lib/event_emitter") + .EventEmitter, + s = e("./range") + .Range, + o = e("./anchor") + .Anchor, + u = function(e) { + this.$lines = [], e.length === 0 ? this.$lines = [""] : Array.isArray(e) ? this._insertLines(0, e) : this.insert({ + row: 0, + column: 0 + }, e) + }; + (function() { + r.implement(this, i), this.setValue = function(e) { + var t = this.getLength(); + this.remove(new s(0, 0, t, this.getLine(t - 1) + .length)), this.insert({ + row: 0, + column: 0 + }, e) + }, this.getValue = function() { + return this.getAllLines() + .join(this.getNewLineCharacter()) + }, this.createAnchor = function(e, t) { + return new o(this, e, t) + }, "aaa".split(/a/) + .length === 0 ? this.$split = function(e) { + return e.replace(/\r\n|\r/g, "\n") + .split("\n") + } : this.$split = function(e) { + return e.split(/\r\n|\r|\n/) + }, this.$detectNewLine = function(e) { + var t = e.match(/^.*?(\r\n|\r|\n)/m); + this.$autoNewLine = t ? t[1] : "\n", this._signal("changeNewLineMode") + }, this.getNewLineCharacter = function() { + switch (this.$newLineMode) { + case "windows": + return "\r\n"; + case "unix": + return "\n"; + default: + return this.$autoNewLine || "\n" + } + }, this.$autoNewLine = "", this.$newLineMode = "auto", this.setNewLineMode = function(e) { + if (this.$newLineMode === e) return; + this.$newLineMode = e, this._signal("changeNewLineMode") + }, this.getNewLineMode = function() { + return this.$newLineMode + }, this.isNewLine = function(e) { + return e == "\r\n" || e == "\r" || e == "\n" + }, this.getLine = function(e) { + return this.$lines[e] || "" + }, this.getLines = function(e, t) { + return this.$lines.slice(e, t + 1) + }, this.getAllLines = function() { + return this.getLines(0, this.getLength()) + }, this.getLength = function() { + return this.$lines.length + }, this.getTextRange = function(e) { + if (e.start.row == e.end.row) return this.getLine(e.start.row) + .substring(e.start.column, e.end.column); + var t = this.getLines(e.start.row, e.end.row); + t[0] = (t[0] || "") + .substring(e.start.column); + var n = t.length - 1; + return e.end.row - e.start.row == n && (t[n] = t[n].substring(0, e.end.column)), t.join(this.getNewLineCharacter()) + }, this.$clipPosition = function(e) { + var t = this.getLength(); + return e.row >= t ? (e.row = Math.max(0, t - 1), e.column = this.getLine(t - 1) + .length) : e.row < 0 && (e.row = 0), e + }, this.insert = function(e, t) { + if (!t || t.length === 0) return e; + e = this.$clipPosition(e), this.getLength() <= 1 && this.$detectNewLine(t); + var n = this.$split(t), + r = n.splice(0, 1)[0], + i = n.length == 0 ? null : n.splice(n.length - 1, 1)[0]; + return e = this.insertInLine(e, r), i !== null && (e = this.insertNewLine(e), e = this._insertLines(e.row, n), e = this.insertInLine(e, i || "")), e + }, this.insertLines = function(e, t) { + return e >= this.getLength() ? this.insert({ + row: e, + column: 0 + }, "\n" + t.join("\n")) : this._insertLines(Math.max(e, 0), t) + }, this._insertLines = function(e, t) { + if (t.length == 0) return { + row: e, + column: 0 + }; + while (t.length > 2e4) { + var n = this._insertLines(e, t.slice(0, 2e4)); + t = t.slice(2e4), e = n.row + } + var r = [e, 0]; + r.push.apply(r, t), this.$lines.splice.apply(this.$lines, r); + var i = new s(e, 0, e + t.length, 0), + o = { + action: "insertLines", + range: i, + lines: t + }; + return this._signal("change", { + data: o + }), i.end + }, this.insertNewLine = function(e) { + e = this.$clipPosition(e); + var t = this.$lines[e.row] || ""; + this.$lines[e.row] = t.substring(0, e.column), this.$lines.splice(e.row + 1, 0, t.substring(e.column, t.length)); + var n = { + row: e.row + 1, + column: 0 + }, + r = { + action: "insertText", + range: s.fromPoints(e, n), + text: this.getNewLineCharacter() + }; + return this._signal("change", { + data: r + }), n + }, this.insertInLine = function(e, t) { + if (t.length == 0) return e; + var n = this.$lines[e.row] || ""; + this.$lines[e.row] = n.substring(0, e.column) + t + n.substring(e.column); + var r = { + row: e.row, + column: e.column + t.length + }, + i = { + action: "insertText", + range: s.fromPoints(e, r), + text: t + }; + return this._signal("change", { + data: i + }), r + }, this.remove = function(e) { + e instanceof s || (e = s.fromPoints(e.start, e.end)), e.start = this.$clipPosition(e.start), e.end = this.$clipPosition(e.end); + if (e.isEmpty()) return e.start; + var t = e.start.row, + n = e.end.row; + if (e.isMultiLine()) { + var r = e.start.column == 0 ? t : t + 1, + i = n - 1; + e.end.column > 0 && this.removeInLine(n, 0, e.end.column), i >= r && this._removeLines(r, i), r != t && (this.removeInLine(t, e.start.column, this.getLine(t) + .length), this.removeNewLine(e.start.row)) + } else this.removeInLine(t, e.start.column, e.end.column); + return e.start + }, this.removeInLine = function(e, t, n) { + if (t == n) return; + var r = new s(e, t, e, n), + i = this.getLine(e), + o = i.substring(t, n), + u = i.substring(0, t) + i.substring(n, i.length); + this.$lines.splice(e, 1, u); + var a = { + action: "removeText", + range: r, + text: o + }; + return this._signal("change", { + data: a + }), r.start + }, this.removeLines = function(e, t) { + return e < 0 || t >= this.getLength() ? this.remove(new s(e, 0, t + 1, 0)) : this._removeLines(e, t) + }, this._removeLines = function(e, t) { + var n = new s(e, 0, t + 1, 0), + r = this.$lines.splice(e, t - e + 1), + i = { + action: "removeLines", + range: n, + nl: this.getNewLineCharacter(), + lines: r + }; + return this._signal("change", { + data: i + }), r + }, this.removeNewLine = function(e) { + var t = this.getLine(e), + n = this.getLine(e + 1), + r = new s(e, t.length, e + 1, 0), + i = t + n; + this.$lines.splice(e, 2, i); + var o = { + action: "removeText", + range: r, + text: this.getNewLineCharacter() + }; + this._signal("change", { + data: o + }) + }, this.replace = function(e, t) { + e instanceof s || (e = s.fromPoints(e.start, e.end)); + if (t.length == 0 && e.isEmpty()) return e.start; + if (t == this.getTextRange(e)) return e.end; + this.remove(e); + if (t) var n = this.insert(e.start, t); + else n = e.start; + return n + }, this.applyDeltas = function(e) { + for (var t = 0; t < e.length; t++) { + var n = e[t], + r = s.fromPoints(n.range.start, n.range.end); + n.action == "insertLines" ? this.insertLines(r.start.row, n.lines) : n.action == "insertText" ? this.insert(r.start, n.text) : n.action == "removeLines" ? this._removeLines(r.start.row, r.end.row - 1) : n.action == "removeText" && this.remove(r) + } + }, this.revertDeltas = function(e) { + for (var t = e.length - 1; t >= 0; t--) { + var n = e[t], + r = s.fromPoints(n.range.start, n.range.end); + n.action == "insertLines" ? this._removeLines(r.start.row, r.end.row - 1) : n.action == "insertText" ? this.remove(r) : n.action == "removeLines" ? this._insertLines(r.start.row, n.lines) : n.action == "removeText" && this.insert(r.start, n.text) + } + }, this.indexToPosition = function(e, t) { + var n = this.$lines || this.getAllLines(), + r = this.getNewLineCharacter() + .length; + for (var i = t || 0, s = n.length; i < s; i++) { + e -= n[i].length + r; + if (e < 0) return { + row: i, + column: e + n[i].length + r + } + } + return { + row: s - 1, + column: n[s - 1].length + } + }, this.positionToIndex = function(e, t) { + var n = this.$lines || this.getAllLines(), + r = this.getNewLineCharacter() + .length, + i = 0, + s = Math.min(e.row, n.length); + for (var o = t || 0; o < s; ++o) i += n[o].length + r; + return i + e.column + } + }) + .call(u.prototype), t.Document = u +}), ace.define("ace/lib/lang", ["require", "exports", "module"], function(e, t, n) { + "use strict"; + t.last = function(e) { + return e[e.length - 1] + }, t.stringReverse = function(e) { + return e.split("") + .reverse() + .join("") + }, t.stringRepeat = function(e, t) { + var n = ""; + while (t > 0) { + t & 1 && (n += e); + if (t >>= 1) e += e + } + return n + }; + var r = /^\s\s*/, + i = /\s\s*$/; + t.stringTrimLeft = function(e) { + return e.replace(r, "") + }, t.stringTrimRight = function(e) { + return e.replace(i, "") + }, t.copyObject = function(e) { + var t = {}; + for (var n in e) t[n] = e[n]; + return t + }, t.copyArray = function(e) { + var t = []; + for (var n = 0, r = e.length; n < r; n++) e[n] && typeof e[n] == "object" ? t[n] = this.copyObject(e[n]) : t[n] = e[n]; + return t + }, t.deepCopy = function s(e) { + if (typeof e != "object" || !e) return e; + var t; + if (Array.isArray(e)) { + t = []; + for (var n = 0; n < e.length; n++) t[n] = s(e[n]); + return t + } + var r = e.constructor; + if (r === RegExp) return e; + t = r(); + for (var n in e) t[n] = s(e[n]); + return t + }, t.arrayToMap = function(e) { + var t = {}; + for (var n = 0; n < e.length; n++) t[e[n]] = 1; + return t + }, t.createMap = function(e) { + var t = Object.create(null); + for (var n in e) t[n] = e[n]; + return t + }, t.arrayRemove = function(e, t) { + for (var n = 0; n <= e.length; n++) t === e[n] && e.splice(n, 1) + }, t.escapeRegExp = function(e) { + return e.replace(/([.*+?^${}()|[\]\/\\])/g, "\\$1") + }, t.escapeHTML = function(e) { + return e.replace(/&/g, "&") + .replace(/"/g, """) + .replace(/'/g, "'") + .replace(/ 0 || -1) * Math.floor(Math.abs(e))), e + } + + function B(e) { + var t = typeof e; + return e === null || t === "undefined" || t === "boolean" || t === "number" || t === "string" + } + + function j(e) { + var t, n, r; + if (B(e)) return e; + n = e.valueOf; + if (typeof n == "function") { + t = n.call(e); + if (B(t)) return t + } + r = e.toString; + if (typeof r == "function") { + t = r.call(e); + if (B(t)) return t + } + throw new TypeError + } + Function.prototype.bind || (Function.prototype.bind = function(t) { + var n = this; + if (typeof n != "function") throw new TypeError("Function.prototype.bind called on incompatible " + n); + var i = u.call(arguments, 1), + s = function() { + if (this instanceof s) { + var e = n.apply(this, i.concat(u.call(arguments))); + return Object(e) === e ? e : this + } + return n.apply(t, i.concat(u.call(arguments))) + }; + return n.prototype && (r.prototype = n.prototype, s.prototype = new r, r.prototype = null), s + }); + var i = Function.prototype.call, + s = Array.prototype, + o = Object.prototype, + u = s.slice, + a = i.bind(o.toString), + f = i.bind(o.hasOwnProperty), + l, c, h, p, d; + if (d = f(o, "__defineGetter__")) l = i.bind(o.__defineGetter__), c = i.bind(o.__defineSetter__), h = i.bind(o.__lookupGetter__), p = i.bind(o.__lookupSetter__); + if ([1, 2].splice(0) + .length != 2) + if (! function() { + function e(e) { + var t = new Array(e + 2); + return t[0] = t[1] = 0, t + } + var t = [], + n; + t.splice.apply(t, e(20)), t.splice.apply(t, e(26)), n = t.length, t.splice(5, 0, "XXX"), n + 1 == t.length; + if (n + 1 == t.length) return !0 + }()) Array.prototype.splice = function(e, t) { + var n = this.length; + e > 0 ? e > n && (e = n) : e == void 0 ? e = 0 : e < 0 && (e = Math.max(n + e, 0)), e + t < n || (t = n - e); + var r = this.slice(e, e + t), + i = u.call(arguments, 2), + s = i.length; + if (e === n) s && this.push.apply(this, i); + else { + var o = Math.min(t, n - e), + a = e + o, + f = a + s - o, + l = n - a, + c = n - o; + if (f < a) + for (var h = 0; h < l; ++h) this[f + h] = this[a + h]; + else if (f > a) + for (h = l; h--;) this[f + h] = this[a + h]; + if (s && e === c) this.length = c, this.push.apply(this, i); + else { + this.length = c + s; + for (h = 0; h < s; ++h) this[e + h] = i[h] + } + } + return r + }; + else { + var v = Array.prototype.splice; + Array.prototype.splice = function(e, t) { + return arguments.length ? v.apply(this, [e === void 0 ? 0 : e, t === void 0 ? this.length - e : t].concat(u.call(arguments, 2))) : [] + } + } + Array.isArray || (Array.isArray = function(t) { + return a(t) == "[object Array]" + }); + var m = Object("a"), + g = m[0] != "a" || !(0 in m); + Array.prototype.forEach || (Array.prototype.forEach = function(t) { + var n = F(this), + r = g && a(this) == "[object String]" ? this.split("") : n, + i = arguments[1], + s = -1, + o = r.length >>> 0; + if (a(t) != "[object Function]") throw new TypeError; + while (++s < o) s in r && t.call(i, r[s], s, n) + }), Array.prototype.map || (Array.prototype.map = function(t) { + var n = F(this), + r = g && a(this) == "[object String]" ? this.split("") : n, + i = r.length >>> 0, + s = Array(i), + o = arguments[1]; + if (a(t) != "[object Function]") throw new TypeError(t + " is not a function"); + for (var u = 0; u < i; u++) u in r && (s[u] = t.call(o, r[u], u, n)); + return s + }), Array.prototype.filter || (Array.prototype.filter = function(t) { + var n = F(this), + r = g && a(this) == "[object String]" ? this.split("") : n, + i = r.length >>> 0, + s = [], + o, u = arguments[1]; + if (a(t) != "[object Function]") throw new TypeError(t + " is not a function"); + for (var f = 0; f < i; f++) f in r && (o = r[f], t.call(u, o, f, n) && s.push(o)); + return s + }), Array.prototype.every || (Array.prototype.every = function(t) { + var n = F(this), + r = g && a(this) == "[object String]" ? this.split("") : n, + i = r.length >>> 0, + s = arguments[1]; + if (a(t) != "[object Function]") throw new TypeError(t + " is not a function"); + for (var o = 0; o < i; o++) + if (o in r && !t.call(s, r[o], o, n)) return !1; + return !0 + }), Array.prototype.some || (Array.prototype.some = function(t) { + var n = F(this), + r = g && a(this) == "[object String]" ? this.split("") : n, + i = r.length >>> 0, + s = arguments[1]; + if (a(t) != "[object Function]") throw new TypeError(t + " is not a function"); + for (var o = 0; o < i; o++) + if (o in r && t.call(s, r[o], o, n)) return !0; + return !1 + }), Array.prototype.reduce || (Array.prototype.reduce = function(t) { + var n = F(this), + r = g && a(this) == "[object String]" ? this.split("") : n, + i = r.length >>> 0; + if (a(t) != "[object Function]") throw new TypeError(t + " is not a function"); + if (!i && arguments.length == 1) throw new TypeError("reduce of empty array with no initial value"); + var s = 0, + o; + if (arguments.length >= 2) o = arguments[1]; + else + do { + if (s in r) { + o = r[s++]; + break + } + if (++s >= i) throw new TypeError("reduce of empty array with no initial value") + } while (!0); + for (; s < i; s++) s in r && (o = t.call(void 0, o, r[s], s, n)); + return o + }), Array.prototype.reduceRight || (Array.prototype.reduceRight = function(t) { + var n = F(this), + r = g && a(this) == "[object String]" ? this.split("") : n, + i = r.length >>> 0; + if (a(t) != "[object Function]") throw new TypeError(t + " is not a function"); + if (!i && arguments.length == 1) throw new TypeError("reduceRight of empty array with no initial value"); + var s, o = i - 1; + if (arguments.length >= 2) s = arguments[1]; + else + do { + if (o in r) { + s = r[o--]; + break + } + if (--o < 0) throw new TypeError("reduceRight of empty array with no initial value") + } while (!0); + do o in this && (s = t.call(void 0, s, r[o], o, n)); while (o--); + return s + }); + if (!Array.prototype.indexOf || [0, 1].indexOf(1, 2) != -1) Array.prototype.indexOf = function(t) { + var n = g && a(this) == "[object String]" ? this.split("") : F(this), + r = n.length >>> 0; + if (!r) return -1; + var i = 0; + arguments.length > 1 && (i = H(arguments[1])), i = i >= 0 ? i : Math.max(0, r + i); + for (; i < r; i++) + if (i in n && n[i] === t) return i; + return -1 + }; + if (!Array.prototype.lastIndexOf || [0, 1].lastIndexOf(0, -3) != -1) Array.prototype.lastIndexOf = function(t) { + var n = g && a(this) == "[object String]" ? this.split("") : F(this), + r = n.length >>> 0; + if (!r) return -1; + var i = r - 1; + arguments.length > 1 && (i = Math.min(i, H(arguments[1]))), i = i >= 0 ? i : r - Math.abs(i); + for (; i >= 0; i--) + if (i in n && t === n[i]) return i; + return -1 + }; + Object.getPrototypeOf || (Object.getPrototypeOf = function(t) { + return t.__proto__ || (t.constructor ? t.constructor.prototype : o) + }); + if (!Object.getOwnPropertyDescriptor) { + var y = "Object.getOwnPropertyDescriptor called on a non-object: "; + Object.getOwnPropertyDescriptor = function(t, n) { + if (typeof t != "object" && typeof t != "function" || t === null) throw new TypeError(y + t); + if (!f(t, n)) return; + var r, i, s; + r = { + enumerable: !0, + configurable: !0 + }; + if (d) { + var u = t.__proto__; + t.__proto__ = o; + var i = h(t, n), + s = p(t, n); + t.__proto__ = u; + if (i || s) return i && (r.get = i), s && (r.set = s), r + } + return r.value = t[n], r + } + } + Object.getOwnPropertyNames || (Object.getOwnPropertyNames = function(t) { + return Object.keys(t) + }); + if (!Object.create) { + var b; + Object.prototype.__proto__ === null ? b = function() { + return { + __proto__: null + } + } : b = function() { + var e = {}; + for (var t in e) e[t] = null; + return e.constructor = e.hasOwnProperty = e.propertyIsEnumerable = e.isPrototypeOf = e.toLocaleString = e.toString = e.valueOf = e.__proto__ = null, e + }, Object.create = function(t, n) { + var r; + if (t === null) r = b(); + else { + if (typeof t != "object") throw new TypeError("typeof prototype[" + typeof t + "] != 'object'"); + var i = function() {}; + i.prototype = t, r = new i, r.__proto__ = t + } + return n !== void 0 && Object.defineProperties(r, n), r + } + } + if (Object.defineProperty) { + var E = w({}), + S = typeof document == "undefined" || w(document.createElement("div")); + if (!E || !S) var x = Object.defineProperty + } + if (!Object.defineProperty || x) { + var T = "Property description must be an object: ", + N = "Object.defineProperty called on non-object: ", + C = "getters & setters can not be defined on this javascript engine"; + Object.defineProperty = function(t, n, r) { + if (typeof t != "object" && typeof t != "function" || t === null) throw new TypeError(N + t); + if (typeof r != "object" && typeof r != "function" || r === null) throw new TypeError(T + r); + if (x) try { + return x.call(Object, t, n, r) + } catch (i) {} + if (f(r, "value")) + if (d && (h(t, n) || p(t, n))) { + var s = t.__proto__; + t.__proto__ = o, delete t[n], t[n] = r.value, t.__proto__ = s + } else t[n] = r.value; + else { + if (!d) throw new TypeError(C); + f(r, "get") && l(t, n, r.get), f(r, "set") && c(t, n, r.set) + } + return t + } + } + Object.defineProperties || (Object.defineProperties = function(t, n) { + for (var r in n) f(n, r) && Object.defineProperty(t, r, n[r]); + return t + }), Object.seal || (Object.seal = function(t) { + return t + }), Object.freeze || (Object.freeze = function(t) { + return t + }); + try { + Object.freeze(function() {}) + } catch (k) { + Object.freeze = function(t) { + return function(n) { + return typeof n == "function" ? n : t(n) + } + }(Object.freeze) + } + Object.preventExtensions || (Object.preventExtensions = function(t) { + return t + }), Object.isSealed || (Object.isSealed = function(t) { + return !1 + }), Object.isFrozen || (Object.isFrozen = function(t) { + return !1 + }), Object.isExtensible || (Object.isExtensible = function(t) { + if (Object(t) === t) throw new TypeError; + var n = ""; + while (f(t, n)) n += "?"; + t[n] = !0; + var r = f(t, n); + return delete t[n], r + }); + if (!Object.keys) { + var L = !0, + A = ["toString", "toLocaleString", "valueOf", "hasOwnProperty", "isPrototypeOf", "propertyIsEnumerable", "constructor"], + O = A.length; + for (var M in { + toString: null + }) L = !1; + Object.keys = function I(e) { + if (typeof e != "object" && typeof e != "function" || e === null) throw new TypeError("Object.keys called on a non-object"); + var I = []; + for (var t in e) f(e, t) && I.push(t); + if (L) + for (var n = 0, r = O; n < r; n++) { + var i = A[n]; + f(e, i) && I.push(i) + } + return I + } + } + Date.now || (Date.now = function() { + return (new Date) + .getTime() + }); + var _ = " \n \f\r \u00a0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u202f\u205f\u3000\u2028\u2029\ufeff"; + if (!String.prototype.trim || _.trim()) { + _ = "[" + _ + "]"; + var D = new RegExp("^" + _ + _ + "*"), + P = new RegExp(_ + _ + "*$"); + String.prototype.trim = function() { + return String(this) + .replace(D, "") + .replace(P, "") + } + } + var F = function(e) { + if (e == null) throw new TypeError("can't convert " + e + " to object"); + return Object(e) + } +}) \ No newline at end of file diff --git a/doc/targets.md b/doc/targets.md new file mode 100644 index 000000000..170431d93 --- /dev/null +++ b/doc/targets.md @@ -0,0 +1,23 @@ +# Runtime Libraries and Code Generation Targets + +This page lists the available and upcoming ANTLR runtimes. Please note that you won't find here language specific code generators. This is because there is only one tool, written in Java, which is able to generate lexer and parser code for all targets, through command line options. The tool can be invoked from the command line, or any integration plugin to popular IDEs and build systems: Eclipse, IntelliJ, Visual Studio, Maven. So whatever your environment and target is, you should be able to run the tool and produce the code in the targeted language. As of writing, the available targets are the following: + +* [Java](java-target.md)
+The [ANTLR v4 book](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference) has a decent summary of the runtime library. We have added a useful XPath feature since the book was printed that lets you select bits of parse trees. +
[Runtime API](http://www.antlr.org/api/Java/index.html) +
See [Getting Started with ANTLR v4](getting-started.md) + +* [C#](csharp-target.md) +* [Python](python-target.md) (2 and 3) +* [JavaScript](javascript-target.md) +* Swift (not yet available) +* C++ (not yet available) + +## Target feature parity + +New features generally appear in the Java target and then migrate to the other targets, but these other targets don't always get updated in the same overall tool release. This section tries to identify features added to Java that have not been added to the other targets. + +|Feature|Java|C♯|JavaScript|Python2|Python3|Swift|C++| +|---|---|---|---|---|---|---|---| +|Ambiguous tree construction|4.5.1|-|-|-|-|-|-| + diff --git a/doc/tool-options.md b/doc/tool-options.md new file mode 100644 index 000000000..fe5d4a399 --- /dev/null +++ b/doc/tool-options.md @@ -0,0 +1,161 @@ +# ANTLR Tool Command Line Options + +If you invoke the ANTLR tool without command line arguments, you’ll get a help message: + +```bash +$ antlr4 +ANTLR Parser Generator Version 4.5 + -o ___ specify output directory where all output is generated + -lib ___ specify location of grammars, tokens files + -atn generate rule augmented transition network diagrams + -encoding ___ specify grammar file encoding; e.g., euc-jp + -message-format ___ specify output style for messages in antlr, gnu, vs2005 + -long-messages show exception details when available for errors and warnings + -listener generate parse tree listener (default) + -no-listener don't generate parse tree listener + -visitor generate parse tree visitor + -no-visitor don't generate parse tree visitor (default) + -package ___ specify a package/namespace for the generated code + -depend generate file dependencies + -D