Merge pull request #695 from parrt/prec-filter-comments

add parrt summary of conversation with Sam about precedence DFA optimization
2014-09-01 13:16:23 -07:00 · 2014-09-01 13:16:23 -07:00 · 6e869b3e80
parent 7bf47e1670 fd194f073b
commit 6e869b3e80
1 changed files with 107 additions and 0 deletions
--- a/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java
+++ b/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java
@ -992,6 +992,113 @@ public class ParserATNSimulator extends ATNSimulator {
 		return configs;
 	}

+	/* parrt internal source braindump that doesn't mess up
+	 * external API spec.
+
+		applyPrecedenceFilter is an optimization to avoid highly
+		nonlinear prediction of expressions and other left recursive
+		rules. The precedence predicates such as {3>=prec}? Are highly
+		context-sensitive in that they can only be properly evaluated
+		in the context of the proper prec argument. Without pruning,
+		these predicates are normal predicates evaluated when we reach
+		conflict state (or unique prediction). As we cannot evaluate
+		these predicates out of context, the resulting conflict leads
+		to full LL evaluation and nonlinear prediction which shows up
+		very clearly with fairly large expressions.
+
+		Example grammar:
+
+		e : e '*' e
+		  | e '+' e
+		  | INT
+		  ;
+
+		We convert that to the following:
+
+		e[int prec]
+			:   INT
+				( {3>=prec}? '*' e[4]
+				| {2>=prec}? '+' e[3]
+				)*
+			;
+
+		The (..)* loop has a decision for the inner block as well as
+		an enter or exit decision, which is what concerns us here. At
+		the 1st + of input 1+2+3, the loop entry sees both predicates
+		and the loop exit also sees both predicates by falling off the
+		edge of e.  This is because we have no stack information with
+		SLL and find the follow of e, which will hit the return states
+		inside the loop after e[4] and e[3], which brings it back to
+		the enter or exit decision. In this case, we know that we
+		cannot evaluate those predicates because we have fallen off
+		the edge of the stack and will in general not know which prec
+		parameter is the right one to use in the predicate.
+
+		Because we have special information, that these are precedence
+		predicates, we can resolve them without failing over to full
+		LL despite their context sensitive nature. We make an
+		assumption that prec[-1] <= prec[0], meaning that the current
+		precedence level is greater than or equal to the precedence
+		level of recursive invocations above us in the stack. For
+		example, if predicate {3>=prec}? is true of the current prec,
+		then one option is to enter the loop to match it now. The
+		other option is to exit the loop and the left recursive rule
+		to match the current operator in rule invocation further up
+		the stack. But, we know that all of those prec are lower or
+		the same value and so we can decide to enter the loop instead
+		of matching it later. That means we can strip out the other
+		configuration for the exit branch.
+
+		So imagine we have (14,1,$,{2>=prec}?) and then
+		(14,2,$-dipsIntoOuterContext,{2>=prec}?). The optimization
+		allows us to collapse these two configurations. We know that
+		if {2>=prec}? is true for the current prec parameter, it will
+		also be true for any prec from an invoking e call, indicated
+		by dipsIntoOuterContext. As the predicates are both true, we
+		have the option to evaluate them early in the decision start
+		state. We do this by stripping both predicates and choosing to
+		enter the loop as it is consistent with the notion of operator
+		precedence. It's also how the full LL conflict resolution
+		would work.
+
+		The solution requires a different DFA start state for each
+		precedence level.
+
+		The basic filter mechanism is to remove configurations of the
+		form (p, 2, pi) if (p, 1, pi) exists for the same p and pi. In
+		other words, for the same ATN state and predicate context,
+		remove any configuration associated with an exit branch if
+		there is a configuration associated with the enter branch.
+
+		It's also the case that the filter evaluates precedence
+		predicates and resolves conflicts according to precedence
+		levels. For example, for input 1+2+3 at the first +, we see
+		prediction filtering
+
+		[(11,1,[$],{3>=prec}?), (14,1,[$],{2>=prec}?), (5,2,[$],up=1),
+		 (11,2,[$],up=1), (14,2,[$],up=1)],hasSemanticContext=true,dipsIntoOuterContext
+
+		to
+
+		[(11,1,[$]), (14,1,[$]), (5,2,[$],up=1)],dipsIntoOuterContext
+
+		This filters because {3>=prec}? evals to true and collapses
+		(11,1,[$],{3>=prec}?) and (11,2,[$],up=1) since early conflict
+		resolution based upon rules of operator precedence fits with
+		our usual match first alt upon conflict.
+
+		We noticed a problem where a recursive call resets precedence
+		to 0. Sam's fix: each config has flag indicating if it has
+		returned from an expr[0] call. then just don't filter any
+		config with that flag set. flag is carried along in
+		closure(). so to avoid adding field, set bit just under sign
+		bit of dipsIntoOuterContext (SUPPRESS_PRECEDENCE_FILTER).
+		With the change you filter "unless (p, 2, pi) was reached
+		after leaving the rule stop state of the LR rule containing
+		state p, corresponding to a rule invocation with precedence
+		level 0"
+	 */
+	
 	/**
 	 * This method transforms the start state computed by
 	 * {@link #computeStartState} to the special start state used by a