next up previous
Next: yacc Up: Syntax Analysis Previous: Table-Driven Parsing

LR Parsers

Introduction


Bottom-Up Parsing

  • Construct parse tree from the leaves to the root.
  • Corresponds to a rightmost derivation in reverse.
Fig4_25.gif
Rightmost derivation corresponding to Figure 4.25.
Fig4_25a.gif
Handles during parsing of Figure 4.25.
Fig4_26.gif
Definition of "handle" $\beta$:
Fig4_26a.gif
The leftmost simple phrase of a right-sentencial form is the handle.
Fig4_27.gif

Shift-Reduce Parsing

Parser performs actions based on parse table information:

  1. Shift. Shift the next input symbol onto the top of the stack.
  2. Reduce. The right end of the string to be reduced must be at the top of the stack. Locate the left end of the string within the stack and decide with what nonterminal to replace the string.
  3. Accept. Announce successful completion of parsing.
  4. Error. Discover a syntax error and call an error recovery routine.


Figure 4.28. Shift-reduce parsing on input id*id.

Fig4_28.gif
Fig4_35.gif
Fig4_36.gif
Fig4_37.gif

LR(0) Automaton

  • The LR(0) automaton is a DFA which accepts viable prefixes of right sentencial forms, ending in a handle.
  • States of the NFA correspond to LR(0) Items.
  • Use subset construction algorithm to convert NFA to DFA.

LR(0) Item: A grammar rule with a dot ($\cdot$) added between symbols on the RHS
Example: The production rule $A \rightarrow X Y Z$ yields four items:

$A \rightarrow \cdot X Y Z$
$A \rightarrow X \cdot Y Z$
$A \rightarrow X Y \cdot Z$
$A \rightarrow X Y Z \cdot$

Items indicate how much of a production has been seen at a given point in parsing process.

$A \rightarrow X \cdot Y Z$ indicates a string derivable from $X$ appears on input and a string derivable from $Y Z$ is expected on input.
$A \rightarrow X Y Z \cdot$ indicates input derivable from the RHS $X Y Z$ has been seen and a reduction may occur.

Kernel Items: All items whose dots are not at the beginning of the RHS, plus the augmented initial item $S' \rightarrow \cdot S$.
Nonkernel Items: Items with dots at beginning of RHS, except $S' \rightarrow \cdot S$.

NFA of LR(0) Items: formed using each item as a state of NFA with transitions corresponding to movement of dots by one symbol.
Items with a dot preceding a nonterminal have epsilon transitions to all items beginning with that nonterminal.

LR(0) Automaton is the DFA formed by subset construction of the LR(0) NFA.

Expression grammar (4.1) produces the following LR(0) Automaton:

Fig4_31.gif

Nonkernel items are generated from kernel items using the closure operation
and are shown shaded.

LR(0) Parse Table

  • Form NFA of LR(0) Items.
  • Construct LR(0) Automaton (DFA) using subset construction.
  • Construct LR Parse Table from DFA.
  • LR Parse Table contains Action and GoTo sections.

Action Table:
  1. If $A \rightarrow \alpha \cdot a \beta \in I_i$ and $GoTo(I_i,a)=I_j$, for terminal $a$, then $Action[i,a]$ = shift $j$.
  2. For a complete item $A \rightarrow \alpha \cdot \in I_i$, $Action[i,a]$ = reduce $[A\rightarrow\alpha]$ for all terminals $a$.
  3. If $S' \rightarrow S \cdot \in I_i$, then $Action[i,\$]$ = accept.
  4. If no conflicts exist in the parse table, the grammar is LR(0).

GoTo Table:
  1. $GoTo[I_i,A] = I_j$ becomes $GoTo[i,A] = j$ for all nonterminals $A$.

Note: The expression grammar (4.1) is not LR(0), due to shift/reduce conflicts in states 2 & 9 of the LR(0) automaton.

SLR Parse Table

  • Use Follow() set to mitigate shift/reduce conflicts.
  • Step #2 in the LR(0) parse table construction is modified as follows:
  • For a complete item $A \rightarrow \alpha \cdot \in I_i$, $Action[i,a]$ = reduce $[A\rightarrow\alpha]$ for all terminals $a$ in Follow($A$), ($A \neq S'$).

Note: The expression grammar (4.1) is SLR, using the parse table in Figure 4.37.

Figure 4.34: SLR parsing actions for input id*id.
Fig4_34.gif

Ambiguous Grammars

  • Ambiguous grammar can't be LR.
  • LR proves grammar unambiguous.
  • Resolve parse table conflicts by disambiguating rules.
  • Control operator precedence/associativity in ambiguous grammar.

Dangling-Else Ambiguity

Abstraction of if-then-else grammar (4.67):

$S' \rightarrow S$
$S \rightarrow i S e S \mid i S \mid a$

Fig4_50.gif

Note: Figure 4.50 is missing the item $S \rightarrow i S \cdot$ in state $I_4$.

Figure 4.51: LR parsing table for dangling-else grammar (4.67).

Fig4_51.gif

Note: Conflict in state 4 on input e between s5/r2 not shown.

Figure 4.52: Parsing actions on input iiaea.

Fig4_52.gif

Notes:
  • In step (5), (state 4, input e), s5 and r2 are both valid.
  • Choosing s5 gives parse tree with e grouped with 2nd i.
  • Choosing r2 gives parse tree with e grouped with 1st i.
  • Resolve conflict in favor of s5, to group else with preceding unmatched if, as shown in top parse tree in Figure 4.9.

Fig4_9.gif


LR(1) Grammars

Example 4.48: Abstraction of C assignment and pointer dereference operators, grammar (4.49):

$S \rightarrow L {\tt\bf =} R \mid R $
$L \rightarrow {\tt\bf *} R \mid {\tt\bf id} $
$R \rightarrow L$

Fig4_39.gif

Notes:
  • State 2 contains shift/reduce conflict.
  • Grammar is not SLR.
  • Grammar is not ambiguous.
  • Reduction in state 2 is not valid.

LR(1) Items


Recall SLR parse table:
  • If $I_i$ contains $A \rightarrow \alpha\cdot$ and a in Follow(A), reduce by $A \rightarrow \alpha$ on input a.
  • But $\beta\alpha$ on stack may not allow $\beta A$ followed by a. Thus, $A \rightarrow \alpha$ would be invalid.
  • Must know which terminals can follow $\alpha$ for reduction $A \rightarrow \alpha$.

LR(1) items add "lookaheads" to rule out invalid reductions.
Example: $[A \rightarrow \alpha \cdot \beta, {\tt\bf a}]$, LR(1) item with lookahead a. If $\beta \ne \epsilon$, no change.
If $[A \rightarrow \alpha \cdot, {\tt\bf a}]$, reduce only on input a. LR(1) lookaheads a subset of Follow(A).
Modify Closure for LR(1) items:
Consider $[A \rightarrow \alpha \cdot B \beta, {\tt\bf a}]$
For each $B \rightarrow \gamma$, we have $[B \rightarrow \cdot \gamma, {\tt\bf b}]$, ${\tt\bf b}$ in First( $\beta{\tt\bf a}$).

Calculate GOTOs only on valid lookahead terminals.

Example 4.54: Consider the augmented grammar (4.55):

$S' \rightarrow S$
$S \rightarrow C C$
$C \rightarrow {\tt\bf c} C \mid {\tt\bf d}$
Note: Grammar is SLR.
Fig4_41.gif

Figure 4.42: LR(1) parse table for grammar (4.55).
Fig4_42.gif

LALR Parse Table

  • Merge LR(1) states with common cores (LR(0) items).
  • No new shift/reduce conflicts.
  • May have reduce/reduce conflicts. See Example 4.58.

Consider LR(1) parse table for grammar (4.55).
Combine LR(1) states to form new LALR states:
  • $I_{47}$ = $I_4$ + $I_7$.
  • $I_{36}$ = $I_3$ + $I_6$.
  • $I_{89}$ = $I_8$ + $I_9$.

Notes:
  • LALR parse table has same number of states as SLR when grammar is SLR.
  • LALR may take longer to detect error than LR, but will never shift a symbol past the point where LR detects error.

Figure 4.43: LALR parse table for grammar (4.55).

Fig4_43.gif



next up previous
Next: yacc Up: Syntax Analysis Previous: Table-Driven Parsing
CS 631 Class Account 2009-12-03