1 / 35

CPSC 388 – Compiler Design and Construction

CPSC 388 – Compiler Design and Construction. Implementing a Parser LL(1) and LALR Grammars FBI Noon Dining Hall Vicki Anderson Recruiter. Announcements. PROG 3 out, due Oct 9 th Get started NOW! HW due Friday HW6 posted, due next Friday. Parsing using CFGs.

blue
Download Presentation

CPSC 388 – Compiler Design and Construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 388 – Compiler Design and Construction Implementing a Parser LL(1) and LALR Grammars FBI Noon Dining Hall Vicki Anderson Recruiter

  2. Announcements • PROG 3 out, due Oct 9th • Get started NOW! • HW due Friday • HW6 posted, due next Friday

  3. Parsing using CFGs • Algorithms can parse using CFGs in O(n3) time (n is the number of characters in input stream) – TOO SLOW • Subclasses of grammars can be parsed in O(n) time • LL(1) 1 token of look ahead Do a left most derivation Scan input from left to right • LALR(1) one token of look-ahead do a rightmost derivation in reverse scan the input left-to-right LA means "look-ahead“(nothing to do with the number of tokens)

  4. LALR(1) • More general than LL(1) grammars (Every LL(1) grammar is a LALR(1) grammar but not vice versa) • Class of grammars used by java_cup, Bison, YACC • Parsed bottom up (start with non-terminals and build tree from leaves up to root) • Covered in text section 4.6-4.7 • For class need to understand details of just LL(1) grammars

  5. LL(1) Grammars – Predictive Parsers • “build” parse tree top-downactually discover tree top-down, don’t actually build it • Keep track of work to be done using a stack • Scanned tokens along with stack correspond to leaves of incomplete tree • Use parse table to decide how to parse input • Rows are non-terminals • Columns are tokens (plus EOF token) • Cells are the bodies of production rules

  6. Predictive Parser Algorithm s.push(EOF) // special EOF terminal s.push(start) // start is start non-terminal x=s.peek() t=scanner.next_token() While (x != EOF): if x==t: s.pop() t=scanner.next_token() else: if x is terminal: error else: if table[x][t]==empty: error else: let body=table[x][t] //body of production output x→body s.pop() s.push(…) //push body from right to left x=s.peek()

  7. Example Parse using algorithm • Consider the language of balanced parentheses and brackets, e.g. ([]) • Input String is “([])EOF” • Grammar: S →ε | ( S ) | [ S ] • Parse Table:

  8. Not All Grammars LL(1) • Not all Grammars are LL(1): S → ( S ) | [ S ] | ( ) | [ ] • If input is ( don’t know which rule to use! • Try input “[[]]” to LL(1) grammar using predictive parser • Draw input seen so far • Stack • Action taken

  9. Is Grammar LL(1) • Given a grammar how do you tell if it is LL(1)? • How to build the parse table? • If parse table is built and only one entry per cell then LL(1)

  10. Non-LL(1) Grammars • If a grammar is left-recursive • If a grammar is not left-factored • It is sometimes possible to change a grammar to remove left-recursion and to make it left-factored

  11. Left-Recursion • Grammar g is recursive if there exists a production such that: Recursive Left recursive Right recursive

  12. Removing Immediate Left-Recursion • Consider the grammar A → Aα | β • A is a nonterminal • α a sequence of terminals and/or nonterminals • β is a sequence of terminals and/or nonterminals not starting with A • Replace production with A →β A’ A’ →α A’ | ε • Two grammars are equivalent (recognize same set of input strings)

  13. You Try it • Remove left recursion from the grammar: exp → exp - factor | factor factor → INTLITERAL | ( exp ) • Construct parse tree using original grammar and new grammar using input “5-3-2” • In general more difficult than this to remove left recursion, see text 4.3.3

  14. Left Factored • A grammar is NOT left-factored if a non-terminal has two productions whose bodies have common prefixes exp → ( exp ) | ( ) • A top-down predictive parser would not know which production rule to use when seeing input character of “(“

  15. Left Factoring • Given a pair of productions: A →αβ1 | αβ2 • α is sequence of terminals and non-terminals • β1 and β2 are sequence of terminals and non-terminals but don’t have common prefix (may be epsilon) • Change to: A →α A’ A’ →β1 | β2

  16. Left Factoring Example • So for grammar exp → ( exp ) | ( ) • It becomes exp → ( exp’ exp’ → exp ) | )

  17. You Try It • Remove left recursion and do left factoring for grammar exp → ( exp ) | exp exp | ( )

  18. Building Parse Tables • Recall a parse table • Every row is a non-terminal • Every column is an input token • Every cell contains a production body • If any cell contains more than one production body then grammar is not LL(1) • To build parse table need to have FIRST set and FOLLOW set

  19. FIRST set • FIRST(α) α is some sequence of terminals and non-terminals FIRST(α) is set of terminals that begin the strings derivable from α if α can derive ε, then ε is in FIRST(α)

  20. FIRST(X) • X is a single terminal, non-terminal or ε • FIRST(X)={X} //X is terminal • FIRST(X)={ε} //X is ε • FIRST(X)=… //X is non-terminal • Look at all productions rules with X as head • For each production rule, X →Y1,Y2,…Yn • Put FIRST(Y1) - {ε} into FIRST(X). • If ε is in FIRST(Y1), then put FIRST(Y2) - {ε} into FIRST(X). • If ε is in FIRST(Y2), then put FIRST(Y3) - {ε} into FIRST(X). • etc... • If ε is in FIRST(Yi) for 1 <= i <= n (all production right-hand side

  21. Example FIRST Sets • Compute FIRST sets for each non-terminal: exp → term exp’ exp’ → - term exp’ | ε term → factor term’ term’ → / factor term’ | ε factor → INTLITERAL | ( exp ) { INTLITERAL, ( } { /, ε } { INTLITERAL, ( } { -, ε } {INTLITERAL, ( }

  22. FIRST(α) for any α • α is of the form X1, X2, …, Xn • Where each X is a terminal, non-terminal or ε • Put FIRST(X1) - {ε} into FIRST(α) • If epsilon is in FIRST(X1) put FIRST(X2) into FIRST(α). • etc... • If ε is in the FIRST set for every Xn, put ε into FIRST(α).

  23. Example FIRST sets for rules FIRST( term exp' ) = { INTLITERAL, ( } FIRST( - term exp' ) = { - } FIRST(ε ) = {ε } FIRST( factor term' ) = { INTLITERAL, ( } FIRST( / factor term' ) = { / } FIRST(ε ) = {ε } FIRST( INTLITERAL ) = { INTLITERAL } FIRST( ( exp ) ) = { ( }

  24. Why Do We Care about FIRST(α)? • During parsing, suppose the top-of-stack symbol is nonterminal A, that there are two productions: • A →α • A →β • And that the current token is x • If x is in FIRST(α) then use first production • If x is in FIRST(β) then use second production

  25. FOLLOW(A) sets • Only defined for singlenon-terminals, A • the set of terminals that can appear immediately to the right of A (may include EOF but never ε)

  26. Calculating FOLLOW(A) • If A is start non-terminal put EOF in FOLLOW(A) • Find productions with A in body: • For each production X →α A β • put FIRST(β) – {ε} in FOLLOW(A) • If ε in FIRST(β) put FOLLOW(X) into FOLLOW(A) • For each production X →α A • put FOLLOW(X) into FOLLOW(A)

  27. FIRST and FOLLOW sets • To compute FIRST(A) you must look for A on a production's left-hand side. • To compute FOLLOW(A) you must look for A on a production's right-hand side. • FIRST and FOLLOW sets are always sets of terminals (plus, perhaps, ε for FIRST sets, and EOF for follow sets). • Nonterminals are never in a FIRST or a FOLLOW set.

  28. Example FOLLOW sets CAPS are non-terminals and lower-case are terminals S → B c | D B B → a b | c S D → d | ε X FIRST(X) FOLLOW(X) ------------------------------------------- D { d, ε } { a, c } B { a, c } { c, EOF } S { a, c, d } { EOF, c } Note: FOLLOW of S always includes EOF

  29. You Try It • Computer FIRST and FOLLOW sets for: methodHeader → VOID ID LPAREN paramList RPAREN paramList → epsilon paramList → nonEmptyParamList nonEmptyParamList → ID ID nonEmptyParamList → ID ID COMMA nonEmptyParamList • Remember you need FIRST and FOLLOW sets for all non-terminals and FIRST sets for all bodies of rules

  30. Parse Table Current Token Non-terminals Rule bodies

  31. Parse Table Construction Algorithm for each production X →α: for each terminal t in First(α): put α in Table[X,t] if ε is in First(α) then: for each terminal t in Follow(X): put α in Table[X,t]

  32. Example Parse Table Construction S → B c | D B B → a b | c S D → d | ε For this grammar: • Construct FIRST and FOLLOW Sets • Apply algorithm to calculate parse table

  33. Example Parse Table Construction X FIRST(X) FOLLOW(X) --------------------------------------------------- D { d, ε } { a, c } B { a, c } { c, EOF } S { a, c, d } { EOF, c } Bc { a, c } DB { d, a, c } ab { a } cS { c } D { d } Ε{ε }

  34. Parse Table Finish Filling In Table

  35. Predictive Parser Algorithm s.push(EOF) // special EOF terminal s.push(start) // start is start non-terminal x=s.peek() t=scanner.next_token() While (x != EOF): if x==t: s.pop() t=scanner.next_token() else: if x is terminal: error else: if table[x][t]==empty: error else: let body=table[x][t] //body of production output x→body s.pop() s.push(…) //push body from right to left x=s.peek()

More Related