add large comment describing the decision rule for continuing analysis.

2012-09-09 17:55:52 -07:00 · 2012-09-09 17:55:52 -07:00 · 3c5dec1172
parent 0ac781aebf
commit 3c5dec1172
1 changed files with 120 additions and 0 deletions
--- a/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java
+++ b/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java
@ -653,6 +653,7 @@ public class ParserATNSimulator<Symbol extends Token> extends ATNSimulator {
 				D.prediction = predictedAlt;
 			}
 			else {
+				boolean cont = needMoreLookaheadSLL(reach);
 				D.configs.conflictingAlts = getConflictingAlts(reach);
 				if ( D.configs.conflictingAlts!=null ) {
 					if ( greedy ) {
@ -1335,6 +1336,125 @@ public class ParserATNSimulator<Symbol extends Token> extends ATNSimulator {
 		return new ATNConfig(config, t.target, newContext);
 	}

+	/**
+	 SLL analysis termination.
+
+	 */
+	public boolean needMoreLookaheadSLL(@NotNull ATNConfigSet configs) {
+		return false;
+	}
+
+	/**
+	 Full LL analysis termination.
+
+	 Can we stop looking ahead during ATN simulation or is there some
+	 uncertainty as to which alternative we will ultimately pick, after
+	 consuming more input?  Even if there are partial conflicts, we might
+	 know that everything is going to resolve to the same minimum
+	 alt. That means we can stop since no more lookahead will change that
+	 fact. On the other hand, there might be multiple conflicts that
+	 resolve to different minimums.  That means we need more look ahead to
+	 decide which of those alternatives we should predict.
+
+	 The basic idea is to split the set of configurations down into a set
+	 with all configs that don't conflict with each other and then
+	 (possibly multiple) subsets of configurations that are mutually
+	 conflicting. Then reduce each set of configurations to the set of
+	 possible alternatives.  This is just the set of all alternatives
+	 within the nonconflicting set. It is the minimum alt of each
+	 conflicting subset. If the union of these alternatives sets is a
+	 singleton, then no amount of more lookahead will help us. We will
+	 always pick that alternative. If, however, there are more
+	 alternatives then we are uncertain which alt to predict and must
+	 continue looking for resolution. We may or may not discover an
+	 ambiguity in the future, even if there are no conflicting subsets
+	 this round.
+
+	 The biggest sin is to terminate early because it means we've made a
+	 decision but were uncertain as to the eventual outcome. We haven't
+	 used enough lookahead.  On the other hand, announcing a conflict too
+	 late is no big deal; you will still have the conflict. It's just
+	 inefficient.
+
+	 Semantic predicates for full LL aren't involved in this decision
+	 because the predicates are evaluated during start state computation.
+	 This set of configurations was derived from the initial subset with
+	 configurations holding false predicate stripped out.
+
+	 CONFLICTING CONFIGS
+
+	 Two configurations, (s, i, x) and (s, j, x'), conflict when i!=j but
+	 x = x'. Because we merge all (s, i, _) configurations together, that
+	 means that there are at most n configurations associated with state s
+	 for n possible alternatives in the decision. The merged stacks
+	 complicate the comparison of config contexts, x and x'. Sam checks to
+	 see if one is a subset of the other by calling merge and checking to
+	 see if the merged result is either x or x'. If the x associated with
+	 lowest alternative i is the superset, then i is the only possible
+	 prediction since the others resolve to min i as well. If, however, x
+	 is associated with j>i then at least one stack configuration for j is
+	 not in conflict with alt i. The algorithm should keep going, looking
+	 for more lookahead due to the uncertainty.
+
+	 For simplicity, I'm doing a simple equality check between x and x'
+	 that lets the algorithm continue to consume lookahead longer than
+	 necessary. I will check to see if it is efficient enough. The reason
+	 I like the simple equality is of course the simplicity but also
+	 because that is the test you need to detect the alternatives that are
+	 actually in conflict.  If all states report the same conflicting alt
+	 set, then we know we have the real ambiguity set.
+
+	 CONTINUE/STOP RULE
+
+	 Ok, here's the decision to keep going: continue if union of resolved
+	 alt sets from nonconflicting and conflicting subsets has more than
+	 one alt. The complete set of alternatives, [i for (_,i,_)], tells us
+	 which alternatives are still in the running for the amount of input
+	 we've consumed at this point. The conflicting sets let us to strip
+	 away configurations that won't lead to more states (because we
+	 resolve conflicts to the configuration with a minimum alternate for
+	 given conflicting set.)  The set of viable alternatives for a
+	 configuration set is therefore a subset of the complete [i for
+	 (_,i,_)] set.
+
+	 CASES:
+
+	 * no conflicts & > 1 alt in set => continue
+
+	 * (s, 1, x), (s, 2, x), (s, 3, z)
+	   (s', 1, y), (s', 2, y)
+	   yields nonconflicting set {3} U conflicting sets {1} U {1} = {1,3}
+	     => continue
+
+	 * (s, 1, x), (s, 2, x),
+	   (s', 1, y), (s', 2, y)
+	   (s'', 1, z)
+	   yields nonconflicting set you this {1} U conflicting sets {1} U {1} = {1}
+	     => stop and predict 1, announce ambiguity {1,2}
+
+	 * (s, 1, x), (s, 2, x),
+	   (s', 1, y), (s', 2, y)
+	   yields conflicting sets {1} U {1} = {1}
+	     => stop and predict 1, announce ambiguity {1,2}
+
+	 * (s, 1, x), (s, 2, x)
+	   (s', 2, y), (s', 3, y)
+	   yields conflicting sets {1} U {2} = {1,2}
+	     => continue
+
+	 * (s, 1, x), (s, 2, x)
+	   (s', 3, y), (s', 4, y)
+	   yields conflicting sets {1} U {3} = {1,3}
+	     => continue
+
+	 The function implementation tries to bail out as soon as
+	 possible. That means we can stop as soon as our final alt set gets
+	 more than a single alternative in it.
+	 */
+	public boolean needMoreLookaheadLL(@NotNull ATNConfigSet configs) {
+		return false;
+	}
+
 	/**
 	 * From grammar: