Next: yacc Up: Syntax Analysis Previous: Table-Driven Parsing

LR Parsers

Introduction

Recognize CFGs for most programming languages. No need to rewrite grammar.
Most general O(n) shift-reduce method.
Optimal syntax error detection.
LR grammars superset of LL grammars.

Bottom-Up Parsing

Construct parse tree from the leaves to the root.
Corresponds to a rightmost derivation in reverse.

Rightmost derivation corresponding to Figure 4.25.

Handles during parsing of Figure 4.25.

Definition of "handle" $\beta$ :

The leftmost simple phrase of a right-sentencial form is the handle.

Shift-Reduce Parsing

Parser performs actions based on parse table information:

Shift. Shift the next input symbol onto the top of the stack.
Reduce. The right end of the string to be reduced must be at the top of the stack. Locate the left end of the string within the stack and decide with what nonterminal to replace the string.
Accept. Announce successful completion of parsing.
Error. Discover a syntax error and call an error recovery routine.

Figure 4.28. Shift-reduce parsing on input id*id.

LR(0) Automaton

The LR(0) automaton is a DFA which accepts viable prefixes of right sentencial forms, ending in a handle.
States of the NFA correspond to LR(0) Items.
Use subset construction algorithm to convert NFA to DFA.

LR(0) Item: A grammar rule with a dot ( $\cdot$ ) added between symbols on the RHS
Example: The production rule $A \rightarrow X Y Z$ yields four items:

$A \rightarrow \cdot X Y Z$
$A \rightarrow X \cdot Y Z$
$A \rightarrow X Y \cdot Z$
$A \rightarrow X Y Z \cdot$

Items indicate how much of a production has been seen at a given point in parsing process.

$A \rightarrow X \cdot Y Z$ indicates a string derivable from

appears on input and a string derivable from

is expected on input.
$A \rightarrow X Y Z \cdot$ indicates input derivable from the RHS

has been seen and a reduction may occur.

Kernel Items: All items whose dots are not at the beginning of the RHS, plus the augmented initial item $S' \rightarrow \cdot S$ .
Nonkernel Items: Items with dots at beginning of RHS, except $S' \rightarrow \cdot S$ .

NFA of LR(0) Items: formed using each item as a state of NFA with transitions corresponding to movement of dots by one symbol.
Items with a dot preceding a nonterminal have epsilon transitions to all items beginning with that nonterminal.

LR(0) Automaton is the DFA formed by subset construction of the LR(0) NFA.

Expression grammar (4.1) produces the following LR(0) Automaton:

Nonkernel items are generated from kernel items using the closure operation
and are shown shaded.

LR(0) Parse Table

Form NFA of LR(0) Items.
Construct LR(0) Automaton (DFA) using subset construction.
Construct LR Parse Table from DFA.
LR Parse Table contains Action and GoTo sections.

Action Table:

If $A \rightarrow \alpha \cdot a \beta \in I_i$ and , for terminal , then = shift .
For a complete item $A \rightarrow \alpha \cdot \in I_i$ , = reduce $[A\rightarrow\alpha]$ for all terminals .
If $S' \rightarrow S \cdot \in I_i$ , then $Action[i,\$]$ = accept.
If no conflicts exist in the parse table, the grammar is LR(0).

GoTo Table:

becomes for all nonterminals .

Note: The expression grammar (4.1) is not LR(0), due to shift/reduce conflicts in states 2 & 9 of the LR(0) automaton.

SLR Parse Table

Use Follow() set to mitigate shift/reduce conflicts.
Step #2 in the LR(0) parse table construction is modified as follows:
For a complete item $A \rightarrow \alpha \cdot \in I_i$ , = reduce $[A\rightarrow\alpha]$ for all terminals in Follow(), ( $A \neq S'$ ).

Note: The expression grammar (4.1) is SLR, using the parse table in Figure 4.37.

Figure 4.34: SLR parsing actions for input id*id.

Ambiguous Grammars

Ambiguous grammar can't be LR.
LR proves grammar unambiguous.
Resolve parse table conflicts by disambiguating rules.
Control operator precedence/associativity in ambiguous grammar.

Dangling-Else Ambiguity

Abstraction of if-then-else grammar (4.67):

$S' \rightarrow S$
$S \rightarrow i S e S \mid i S \mid a$

Note: Figure 4.50 is missing the item $S \rightarrow i S \cdot$ in state

Figure 4.51: LR parsing table for dangling-else grammar (4.67).

Note: Conflict in state 4 on input e between s5/r2 not shown.

Figure 4.52: Parsing actions on input iiaea.

Notes:

In step (5), (state 4, input e), s5 and r2 are both valid.
Choosing s5 gives parse tree with e grouped with 2nd i.
Choosing r2 gives parse tree with e grouped with 1st i.
Resolve conflict in favor of s5, to group else with preceding unmatched if, as shown in top parse tree in Figure 4.9.

LR(1) Grammars

Example 4.48: Abstraction of C assignment and pointer dereference operators, grammar (4.49):

$S \rightarrow L {\tt\bf =} R \mid R$
$L \rightarrow {\tt\bf *} R \mid {\tt\bf id}$
$R \rightarrow L$

Notes:

State 2 contains shift/reduce conflict.
Grammar is not SLR.
Grammar is not ambiguous.
Reduction in state 2 is not valid.

LR(1) Items

Recall SLR parse table:

If contains $A \rightarrow \alpha\cdot$ and a in Follow(A), reduce by $A \rightarrow \alpha$ on input a.
But $\beta\alpha$ on stack may not allow $\beta A$ followed by a. Thus, $A \rightarrow \alpha$ would be invalid.
Must know which terminals can follow $\alpha$ for reduction $A \rightarrow \alpha$ .

LR(1) items add "lookaheads" to rule out invalid reductions.

Example: $[A \rightarrow \alpha \cdot \beta, {\tt\bf a}]$ , LR(1) item with lookahead a. If $\beta \ne \epsilon$ , no change.

If $[A \rightarrow \alpha \cdot, {\tt\bf a}]$ , reduce only on input a. LR(1) lookaheads a subset of Follow(A).

Modify Closure for LR(1) items:

: Consider $[A \rightarrow \alpha \cdot B \beta, {\tt\bf a}]$
: For each $B \rightarrow \gamma$ , we have $[B \rightarrow \cdot \gamma, {\tt\bf b}]$ , ${\tt\bf b}$ in First( $\beta{\tt\bf a}$ ).

Calculate GOTOs only on valid lookahead terminals.

Example 4.54: Consider the augmented grammar (4.55):

: $S' \rightarrow S$
: $S \rightarrow C C$
: $C \rightarrow {\tt\bf c} C \mid {\tt\bf d}$
: Note: Grammar is SLR.

Figure 4.42: LR(1) parse table for grammar (4.55).

LALR Parse Table

Merge LR(1) states with common cores (LR(0) items).
No new shift/reduce conflicts.
May have reduce/reduce conflicts. See Example 4.58.

Consider LR(1) parse table for grammar (4.55).

Combine LR(1) states to form new LALR states:

$I_{47}$ = + .
$I_{36}$ = + .
$I_{89}$ = + .

Notes:

LALR parse table has same number of states as SLR when grammar is SLR.
LALR may take longer to detect error than LR, but will never shift a symbol past the point where LR detects error.

Figure 4.43: LALR parse table for grammar (4.55).

Next: yacc Up: Syntax Analysis Previous: Table-Driven Parsing

CS 631 Class Account 2009-12-03