added big comments

[git-p4: depot-paths = "//depot/code/antlr4/main/": change = 9565]
2011-12-12 16:33:10 -08:00 · 2011-12-12 16:33:10 -08:00 · 8f7fb98e16
parent b9afdf6e07
commit 8f7fb98e16
1 changed files with 85 additions and 1 deletions
--- a/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java
+++ b/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java
@ -43,6 +43,57 @@ import org.stringtemplate.v4.misc.MultiMap;

 import java.util.*;

+/**
+ The embodiment of the adaptive LL(*) parsing strategy.
+
+ The basic complexity of the adaptive strategy makes it harder to
+ understand. We begin with ATN simulation to build paths in a
+ DFA. Subsequent prediction requests go through the DFA first. If
+ they reach a state without an edge for the current symbol, the
+ algorithm fails over to the ATN simulation to complete the DFA
+ path for the current input (until it finds a conflict state or
+ uniquely predicting state).
+
+ All of that is done without using the outer context because we
+ want to create a DFA that is not dependent upon the rule
+ invocation stack when we do a prediction.  One DFA works in all
+ contexts. We avoid using context not necessarily because it
+ slower, although it can be, but because of the DFA caching
+ problem.  The closure routine only considers the rule invocation
+ stack created during prediction beginning in the entry rule.  For
+ example, if prediction occurs without invoking another rule's
+ ATN, there are no context stacks in the configurations. When this
+ leads to a conflict, we don't know if it's an ambiguity or a
+ weakness in the strong LL(*) parsing strategy (versus full
+ LL(*)).
+
+ So, we simply retry the ATN simulation again, this time
+ using full outer context and filling a dummy DFA (to avoid
+ polluting the context insensitive DFA). Configuration context
+ stacks will be the full invocation stack from the start rule. If
+ we get a conflict using full context, then we can definitively
+ say we have a true ambiguity for that input sequence. If we don't
+ get a conflict, it implies that the decision is sensitive to the
+ outer context. (It is not context-sensitive in the sense of
+ context sensitive grammars.) We create a special DFA accept state
+ that maps rule context to a predicted alternative. That is the
+ only modification needed to handle full LL(*) prediction. In
+ general, full context prediction will use more lookahead than
+ necessary, but it pays to share the same DFA. For a schedule
+ proof that full context prediction uses that most the same amount
+ of lookahead as a context insensitive prediction, see the comment
+ on method retryWithContext().
+
+ So, the strategy is complex because we bounce back and forth from
+ the ATN to the DFA, simultaneously performing predictions and
+ extending the DFA according to previously unseen input
+ sequences. The retry with full context is a recursive call to the
+ same function naturally because it does the same thing, just with
+ a different initial context. The problem is, that we need to pass
+ in a "full context mode" parameter so that it knows to report
+ conflicts differently. It also knows not to do a retry, to avoid
+ infinite recursion, if it is already using full context.
+ */
 public class ParserATNSimulator<Symbol> extends ATNSimulator {
 	public static boolean debug = false;
 	public static boolean dfa_debug = false;
@ -222,7 +273,7 @@ public class ParserATNSimulator<Symbol> extends ATNSimulator {
 			}
 			// if no edge, pop over to ATN interpreter, update DFA and return
 			if ( s.edges == null || t >= s.edges.length || t < -1 || s.edges[t+1] == null ) {
-				if ( dfa_debug ) System.out.println("no edge for "+parser.getTokenNames()[t]);
+				if ( dfa_debug && t>=0 ) System.out.println("no edge for "+parser.getTokenNames()[t]);
 				int alt = -1;
 				if ( dfa_debug ) {
 					System.out.println("ATN exec upon "+
@ -285,6 +336,39 @@ public class ParserATNSimulator<Symbol> extends ATNSimulator {
 		return prevAcceptState.prediction;
 	}

+	/** Performs ATN simulation to compute a predicted alternative based
+	 *  upon the remaining input, but also updates the DFA cache to avoid
+	 *  having to traverse the ATN again for the same input sequence.
+
+	 There are some key conditions we're looking for after computing a new
+	 set of ATN configs (proposed DFA state):
+	       * if the set is empty, there is no viable alternative for current symbol
+	       * does the state uniquely predict an alternative?
+	       * does the state have a conflict that would prevent us from
+	         putting it on the work list?
+	       * if in non-greedy decision is there a config at a rule stop state?
+
+	 We also have some key operations to do:
+	       * add an edge from previous DFA state to potentially new DFA state, D,
+	         upon current symbol but only if adding to work list, which means in all
+	         cases except no viable alternative (and possibly non-greedy decisions?)
+	       * collecting predicates and adding semantic context to DFA accept states
+	       * adding rule context to context-sensitive DFA accept states
+	       * consuming an input symbol
+	       * reporting a conflict
+	       * reporting an ambiguity
+	       * reporting a context sensitivity
+	       * reporting insufficient predicates
+
+	 We should isolate those operations, which are side-effecting, to the
+	 main work loop. We can isolate lots of code into other functions, but
+	 they should be side effect free. They can return package that
+	 indicates whether we should report something, whether we need to add a
+	 DFA edge, whether we need to augment accept state with semantic
+	 context or rule invocation context. Actually, it seems like we always
+	 add predicates if they exist, so that can simply be done in the main
+	 loop for any accept state creation or modification request.
+	 */
 	public int execATN(@NotNull SymbolStream<Symbol> input,
 					   @NotNull DFA dfa,
 					   int startIndex,