120 likes | 226 Views
Formal Aspects Term 2, Week4. LECTURE: LR “Shift-Reduce” Parsers: The JavaCup Parser-Generator CREATES LR “Shift-Reduce” Parsers, they are very commonly used and superior to LL parsers TUTORIAL: How to use, and how to create, a Shift-Reduce Parser. LR “Shift-Reduce” Parsers.
E N D
Formal AspectsTerm 2, Week4 • LECTURE: • LR “Shift-Reduce” Parsers: The JavaCup Parser-Generator CREATES LR “Shift-Reduce” Parsers, they are very commonly used and superior to LL parsers • TUTORIAL: • How to use, and how to create, a Shift-Reduce Parser
LR “Shift-Reduce” Parsers • LR S-R parsers consist of a Parsing Table (a DFSM) PLUS a Stack of States and Symbols. States are numbered in the Table, and Symbols are tokens or non-terminals. • The Parser is input with a string which it has to parse. It shifts the tokens from the string to the stack. Tokens State Symbol PARSING TABLE - ACTIONS ON THE STACK State STACK States Symbol State
LR “Shift-Reduce” Parser - The Start • Assume String = T1 T2 T3 T4 ..... Tn is input. • The first token T1 from the Left of the string is input to the Table with state 1. The Table is used to find out what to do: SHIFT or REDUCE. EXAMPLE: Stack 1: state 1 INPUT T1 .... consult table => SHIFT T1, move to state X Stack 2: state x T1 state 1
LR “Shift-Reduce” Parsers - General Workings • Given a symbol and a state input to the Table, carry out the following: (see PAGE 60 in Appel’s book) Sn: (means “Shift symbol, move to state n”) Put symbol onto the top of the stack; Put the new state number n on top of the stack Rk: (means “Reduce with rule k”) matching the RHS of rule k with the top of the stack and REMOVE all the matched top; Push the LHS of rule k onto the top of the stack; Input LHS of rule k + state below it to the Table.
To Create a LR(1) Parser • We will now go through the steps required to BUILD a shift-reduce parser • This method is embedded in JavaCup
Jargon 1 : ITEM • An ITEM is a grammar’s production rule with a “DOT” somewhere in its Right Hand Side. • The DOT represents a notional parsing position • e.g. E ::= (.S,E) E ::= (S,.E) • S ::= .S;S S ::= .id := E • are example items from Grammar 3.1
Jargon 2: Closure of an Item • The CLOSURE of an item R (or set of items) is the set C of items such that • (1) C contains R • AND • (2) IF there is a member of C of the form • X ::= w .Y z • where Y is a non-terminal, then ALL the defining production rules of Y must appear in C with the DOT at the start of their RHS. • E.g. closure(E ::= (.S,E) ) = • { E ::= (.S,E) S ::= .S ; S S ::= .id := E S ::= .print (L) }
LR “Shift-Reduce” Parsers - Generation • TWO STAGE PROCESS: • 1: CREATE A FINITE STATE MACHINE WITH • NODES = SETS OF ITEMS • ARCS ANNOTATED WITH NON-TEMINALS OR TOKENS 2: CREATE A PARSING TABLE FROM THE MACHINE
1: CREATING THE FINITE STATE MACHINE • To generate a new state from an old one: • newstate(w: SYMBOL,S: OLDSTATE) = • closure( set of items of the form • Z ::= .... w. .... • where Z ::= .... .w .... is a member of S )
ALGORITHM TO CREATE FSM • T = set of STATES in the FSM, E = set of TRANSITIONS • E = { } ; T = { closure( S’ ::= .S$ ) } ; • repeat • for each state S in T • for each item: ‘Z ::= .... .w ....’ in S • add newstate(w,S) to T • add S --w--> newstate(w,S) to E • end for • end for • until E and T do not change • NB ‘ACCEPT’ STATE OF FSM = newstate($, anystate)
2: TO CREATE THE TABLE FROM THE FSM • 1. NUMBER STATES 1,2,3, ... • 2. For a transition n ---- x ----> m where m contains an item of the form Z ::= ... w. • Put ‘reduce X’ all along row m under the token column, where X is the no. of Z ::= ... W • Otherwise: • 3. For a transition n ---- x ----> m where x is a token, put ‘Shift m’ in row n column x • 4. For a transition n ---- Y ----> m where Y is a non-terminal, put ‘goto m’ in row n column Y
LR Parsers - Summary • In this lecture we have seen HOW LR parsers work and HOW they can be automatically created from a grammar specification. • NB • LR means parse string from Left to right, but build up the parse tree from the Right of the string first. • “Most” parsers are “LR(1)” - the “1” means they look at the 1 next token in the string.