antlr/tool/playground/R.g4

/**
derived from http://svn.r-project.org/R/trunk/src/main/gram.y
http://cran.r-project.org/doc/manuals/R-lang.html#Parser
*/
grammar R;

// ambig upon a(i)<-  (delayed a bit since ';' could follow--really ambig on "a(i)")

/** ambig since stacks are exact as it loops around; no way to distinguish

    I tried tracking input index in stack to differentiate the 2 invocations
    of expr_or_assign, but that would mean altering the our context from
    the decision-making in expr_or_assign.  Also, later we need to have
    context stacks that are not dependent on input position to reuse them.

    The fact that the recursive version correctly matches the input while the
    looping version does not is a problem. We base the notion of ambiguous
    on the same state, different alternatives, same stack. But, if the
    rule invocation stack does not uniquely indicate context, we are not accurately
    detecting ambiguities. We are detecting ambiguities overzealously.

    We need a way for the context stack or configuration to distinguish between
    iterations of the loop that dive into the same rule such as expr_or_assign*.
    Perhaps the answer is to track iteration number in the configuration:

	(s, alt, ctx, iter#)

    When we reached the state following '<-', say p, in expr then we need

	(p, 1, [expr expr_or_assign prog], 1)
	(p, 2, [expr expr_or_assign prog], 2)

    But, that number would be useful... we might pass through 3 or 4 loops.
    The iteration index really has to be a part of the stack context.
    Perhaps we and an additional stack element as if we were doing the
    recursive version

	prog : expr_or_assign prog | ;

	(p, 1, [expr expr_or_assign prog])
	(p, 2, [expr expr_or_assign prog expr_or_assign prog])

    The "expr expr_or_assign prog" represents the second call back down
    into expr_or_assign like the loop would except that the stack looks different.
    
    Or, we could mark stack references with the loop iteration index.

	(p, 1, [expr expr_or_assign prog])
	(p, 2, [expr expr_or_assign.2 prog])

    This seems reusable as opposed to the input index. It might be complicated
    to track this. In the general case, we would need a mapping from rule
    invocation of rule r to a count, and within a specific rule context. That
    might add a HashMap for every RuleContext. ick. Also, one about the context
    that I create during ATN simulation? I would have to track that as well
    as the generated code in the parser. Rule invocation states would act
    like triggers that would bump account for that target rule in the current ctx.
*/
prog	:	expr_or_assign* ;

/** This one is not ambig since 2nd time into expr_or_assign has different
    context where expr_or_assign* shows same context.
 */
//prog	:	expr_or_assign expr_or_assign ;

// not ambig, context different
//prog	:	expr_or_assign prog | ;

expr_or_assign
	:	expr '=' expr_or_assign
	|	expr	// match ID a, fall out, reenter, match "(i)<-x" via alt 1
        ;

expr : expr_primary ('<-' ID)? ;
expr_primary
    : '(' ID ')'
    | ID '(' ID ')'
    | ID
    ;

/*
expr	:	'(' ID ')'  // and this
	|	expr '<-'<assoc=right> ID
	|	ID '(' ID ')'
 	|	ID
	;
*/

HEX	:	'0' ('x'|'X') HEXDIGIT+ [Ll]? ;

INT	:	DIGIT+ [Ll]? ;

fragment
HEXDIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

FLOAT	:	DIGIT+ '.' DIGIT* EXP? [Ll]?
	|	DIGIT+ EXP? [Ll]?
	|	'.' DIGIT+ EXP? [Ll]?
	;
fragment
DIGIT	:   '0'..'9' ;
fragment
EXP	:   ('E' | 'e') ('+' | '-')? INT ;

COMPLEX	:   INT 'i'
	|   FLOAT 'i'
	;

STRING	:	'"' ( ESC | ~('\\'|'"') )* '"'
	|	'\'' ( ESC | ~('\\'|'\'') )* '\''
	;

fragment
ESC
    :   '\\' ([abtnfrv]|'"'|'\'')
    |   UNICODE_ESCAPE
    |	HEX_ESCAPE
    |   OCTAL_ESCAPE
    ;

fragment
UNICODE_ESCAPE
    :   '\\' 'u' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT
    |   '\\' 'u' '{' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT '}'
    ;

fragment
OCTAL_ESCAPE
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
HEX_ESCAPE
    :   '\\' HEXDIGIT HEXDIGIT?
    ;

ID      :   '.'? (LETTER|'_'|'.') (LETTER|DIGIT|'_'|'.')*
	|   LETTER (LETTER|DIGIT|'_'|'.')*
	;

fragment
LETTER      :   'a'..'z'|'A'..'Z'|'\u0080'..'\u00FF' ;

USER_OP	    :	'%' .* '%' ;

COMMENT :   '#' .* '\n' {skip();} ;

/** Doesn't handle '\n' correctly. it's context-sensitive */
WS          :   (' '|'\t'|'\n'|'\r')+ {skip();} ;
add some tests [git-p4: depot-paths = "//depot/code/antlr4/main/": change = 9763] 2011-12-29 11:49:37 +08:00			`/**`
			`derived from http://svn.r-project.org/R/trunk/src/main/gram.y`
			`http://cran.r-project.org/doc/manuals/R-lang.html#Parser`
			`*/`
			`grammar R;`

			`// ambig upon a(i)<- (delayed a bit since ';' could follow--really ambig on "a(i)")`

			`/** ambig since stacks are exact as it loops around; no way to distinguish`

			`I tried tracking input index in stack to differentiate the 2 invocations`
			`of expr_or_assign, but that would mean altering the our context from`
			`the decision-making in expr_or_assign. Also, later we need to have`
			`context stacks that are not dependent on input position to reuse them.`

			`The fact that the recursive version correctly matches the input while the`
			`looping version does not is a problem. We base the notion of ambiguous`
			`on the same state, different alternatives, same stack. But, if the`
			`rule invocation stack does not uniquely indicate context, we are not accurately`
			`detecting ambiguities. We are detecting ambiguities overzealously.`

			`We need a way for the context stack or configuration to distinguish between`
			`iterations of the loop that dive into the same rule such as expr_or_assign*.`
			`Perhaps the answer is to track iteration number in the configuration:`

			`(s, alt, ctx, iter#)`

			`When we reached the state following '<-', say p, in expr then we need`

			`(p, 1, [expr expr_or_assign prog], 1)`
			`(p, 2, [expr expr_or_assign prog], 2)`

			`But, that number would be useful... we might pass through 3 or 4 loops.`
			`The iteration index really has to be a part of the stack context.`
			`Perhaps we and an additional stack element as if we were doing the`
			`recursive version`

			`prog : expr_or_assign prog \| ;`

			`(p, 1, [expr expr_or_assign prog])`
			`(p, 2, [expr expr_or_assign prog expr_or_assign prog])`

			`The "expr expr_or_assign prog" represents the second call back down`
			`into expr_or_assign like the loop would except that the stack looks different.`

			`Or, we could mark stack references with the loop iteration index.`

			`(p, 1, [expr expr_or_assign prog])`
			`(p, 2, [expr expr_or_assign.2 prog])`

			`This seems reusable as opposed to the input index. It might be complicated`
			`to track this. In the general case, we would need a mapping from rule`
			`invocation of rule r to a count, and within a specific rule context. That`
			`might add a HashMap for every RuleContext. ick. Also, one about the context`
			`that I create during ATN simulation? I would have to track that as well`
			`as the generated code in the parser. Rule invocation states would act`
			`like triggers that would bump account for that target rule in the current ctx.`
			`*/`
			`prog : expr_or_assign* ;`

			`/** This one is not ambig since 2nd time into expr_or_assign has different`
			`context where expr_or_assign* shows same context.`
			`*/`
			`//prog : expr_or_assign expr_or_assign ;`

			`// not ambig, context different`
			`//prog : expr_or_assign prog \| ;`

			`expr_or_assign`
			`: expr '=' expr_or_assign`
			`\| expr // match ID a, fall out, reenter, match "(i)<-x" via alt 1`
			`;`

			`expr : expr_primary ('<-' ID)? ;`
			`expr_primary`
			`: '(' ID ')'`
			`\| ID '(' ID ')'`
			`\| ID`
			`;`

			`/*`
			`expr : '(' ID ')' // and this`
			`\| expr '<-'<assoc=right> ID`
			`\| ID '(' ID ')'`
			`\| ID`
			`;`
			`*/`

			`HEX : '0' ('x'\|'X') HEXDIGIT+ [Ll]? ;`

			`INT : DIGIT+ [Ll]? ;`

			`fragment`
			`HEXDIGIT : ('0'..'9'\|'a'..'f'\|'A'..'F') ;`

			`FLOAT : DIGIT+ '.' DIGIT* EXP? [Ll]?`
			`\| DIGIT+ EXP? [Ll]?`
			`\| '.' DIGIT+ EXP? [Ll]?`
			`;`
			`fragment`
			`DIGIT : '0'..'9' ;`
			`fragment`
			`EXP : ('E' \| 'e') ('+' \| '-')? INT ;`

			`COMPLEX : INT 'i'`
			`\| FLOAT 'i'`
			`;`

			`STRING : '"' ( ESC \| ~('\\'\|'"') )* '"'`
			`\| '\'' ( ESC \| ~('\\'\|'\'') )* '\''`
			`;`

			`fragment`
			`ESC`
			`: '\\' ([abtnfrv]\|'"'\|'\'')`
			`\| UNICODE_ESCAPE`
			`\| HEX_ESCAPE`
			`\| OCTAL_ESCAPE`
			`;`

			`fragment`
			`UNICODE_ESCAPE`
			`: '\\' 'u' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT`
			`\| '\\' 'u' '{' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT '}'`
			`;`

			`fragment`
			`OCTAL_ESCAPE`
			`: '\\' ('0'..'3') ('0'..'7') ('0'..'7')`
			`\| '\\' ('0'..'7') ('0'..'7')`
			`\| '\\' ('0'..'7')`
			`;`

			`fragment`
			`HEX_ESCAPE`
			`: '\\' HEXDIGIT HEXDIGIT?`
			`;`

			`ID : '.'? (LETTER\|'_'\|'.') (LETTER\|DIGIT\|'_'\|'.')*`
			`\| LETTER (LETTER\|DIGIT\|'_'\|'.')*`
			`;`

			`fragment`
			`LETTER : 'a'..'z'\|'A'..'Z'\|'\u0080'..'\u00FF' ;`

			`USER_OP : '%' .* '%' ;`

			`COMMENT : '#' .* '\n' {skip();} ;`

			`/** Doesn't handle '\n' correctly. it's context-sensitive */`
			`WS : (' '\|'\t'\|'\n'\|'\r')+ {skip();} ;`