1 / 82

Topic #4: Syntactic Analysis (Parsing)

Topic #4: Syntactic Analysis (Parsing). EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003. Lexical Analyzer and Parser. Parser. Accepts string of tokens from lexical analyzer (usually one token at a time) Verifies whether or not string can be generated by grammar

prue
Download Presentation

Topic #4: Syntactic Analysis (Parsing)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003

  2. Lexical Analyzer and Parser

  3. Parser • Accepts string of tokens from lexical analyzer (usually one token at a time) • Verifies whether or not string can be generated by grammar • Reports syntax errors (recovers if possible)

  4. Errors • Lexical errors (e.g. misspelled word) • Syntax errors (e.g. unbalanced parentheses, missing semicolon) • Semantic errors (e.g. type errors) • Logical errors (e.g. infinite recursion)

  5. Error Handling • Report errors clearly and accurately • Recover quickly if possible • Poor error recover may lead to avalanche of errors

  6. Error Recovery • Panic mode: discard tokens one at a time until a synchronizing token is found • Phrase-level recovery: Perform local correction that allows parsing to continue • Error Productions: Augment grammar to handle predicted, common errors • Global Production: Use a complex algorithm to compute least-cost sequence of changes leading to parseable code

  7. Context Free Grammars • CFGs can represent recursive constructs that regular expressions can not • A CFG consists of: • Tokens (terminals, symbols) • Nonterminals (syntactic variables denoting sets of strings) • Productions (rules specifying how terminals and nonterminals can combine to form strings) • A start symbol (the set of strings it denotes is the language of the grammar)

  8. Derivations (Part 1) • One definition of language: the set of strings that have valid parse trees • Another definition: the set of strings that can be derived from the start symbol E  E + E | E * E | (E) | – E | id E => -E (read E derives –E) E => -E => -(E) => -(id)

  9. Derivations (Part 2) • αAβ => αγβif A  γis a production and α and β are arbitrary strings of grammar symbols • If a1 => a2 => … => an, we say a1 derives an • => means derives in one step • *=> means derives in zero or more steps • +=> means derives in one or more steps

  10. Sentences and Languages • Let L(G) be the language generated by the grammar G with start symbol S: • Strings in L(G) may contain only tokens of G • A string w is in L(G) if and only if S +=> w • Such a string w is a sentence of G • Any language that can be generated by a CFG is said to be a context-free language • If two grammars generate the same language, they are said to be equivalent

  11. Sentential Forms • If S *=> α, whereαmay contain nonterminals, we say thatα is a sentential form of G • A sentence is a sentential form with no nonterminals

  12. Leftmost Derivations • Only the leftmost nonterminal in any sentential form is replaced at each step • A leftmost step can be written as wAγlm=> wδγ • w consists of only terminals • γis a string of grammar symbols • If α derives β by a leftmost derivation, then we write αlm*=> β • If S lm*=> α then we say that α is a left-sentential form of the grammar • Analogous terms exist for rightmost derivations

  13. Parse Trees • A parse tree can be viewed as a graphical representation of a derivation • Every parse tree has a unique leftmost derivation (not true of every sentence) • An ambiguous grammars has: • more than one parse tree for at least one sentence • more than one leftmost derivation for at least one sentence

  14. Capability of Grammars • Can describe most programming language constructs • An exception: requiring that variables are declared before they are used • Therefore, grammar accepts superset of actual language • Later phase (semantic analysis) does type checking

  15. Regular Expressions vs. CFGs • Every construct that can be described by an RE and also be described by a CFG • Why use REs at all? • Lexical rules are simpler to describe this way • REs are often easier to read • More efficient lexical analyzers can be constructed

  16. Verifying Grammars • A proof that a grammar verifies a language has two parts: • Must show that every string generated by the grammar is part of the language • Must show that every string that is part of the language can be generated by the grammar • Rarely done for complete programming languages!

  17. Eliminating Ambiguity (1) stmt ifexprthenstmt | ifexprthenstmtelsestmt | other if E1thenif E2then S1else S2

  18. Eliminating Ambiguity (2)

  19. Eliminating Ambiguity (3) stmt matched | unmatched matched ifexprthenmatchedelsematched | other unmatched ifexprthenstmt | ifexprthenmatchedelseunmatched

  20. Left Recursion • A grammar is left recursive if for any nonterminal A such that there exists any derivation A +=> Aα for any string α • Most top-down parsing methods can not handle left-recursive grammars

  21. Eliminating Left Recursion (1) A  Aα1 | Aα2 | … | Aαm | β1 | β2 | … | βn A  β1A’| β2A’ | … | βnA’ A’  α1A’ | α2A’ | … | αmA’ | ε Harder case: S  Aa | b A  Ac | Sd | ε

  22. Eliminating Left Recursion (2) • First arrange the nonterminals in some order A1, A2, … An • Apply the following algorithm: for i = 1 to n { for j = 1 to i-1 { replace each production of the form Ai Ajγ by the productions Ai δ1γ | δ2γ |… | δkγ, where Aj δ1 | δ2 |… | δk are the Aj productions } eliminate the left recursion among Ai productions }

  23. Left Factoring • Rewriting productions to delay decisions • Helpful for predictive parsing • Not guaranteed to remove ambiguity A  αβ1 | αβ2 A αA’ A’  β1 | β2

  24. Limitations of CFGs • Can not verify repeated strings • Example: L1 = {wcw | w is in (a|b)*} • Abstracts checking that variables are declared • Can not verify repeated counts • Example: L2 = {anbmcndm | n≥1 & m≥1} • Abstracts checking that number of formal and actual parameters are equal • Therefore, some checks put off until semantic analysis

  25. Top Down Parsing • Can be viewed two ways: • Attempt to find leftmost derivation for input string • Attempt to create parse tree, starting from at root, creating nodes in preorder • General form is recursive descent parsing • May require backtracking • Backtracking parsers not used frequently because not needed

  26. Predictive Parsing • A special case of recursive-descent parsing that does not require backtracking • Must always know which production to use based on current input symbol • Can often create appropriate grammar: • removing left-recursion • left factoring the resulting grammar

  27. Transition Diagrams • For parser: • One diagram for each nonterminal • Edge labels can be tokens or nonterminal • A transition on a token means we should take that transition if token is next input symbol • A transition on a nonterminal can be thought of as a call to a procedure for that nonterminal • As opposed to lexical analyzers: • One (or more) diagrams for each token • Labels are symbols of input alphabet

  28. Creating Transition Diagrams • First eliminate left recursion from grammar • Then left factor grammar • For each nonterminal A: • Create an initial and final state • For every production A  X1X2…Xn, create a path from initial to final state with edges labeled X1,X2,…, Xn.

  29. Using Transition Diagrams • Predictive parsers: • Start at start symbol of grammar • From state s with edge to state t labeled with token a, if next input token is a: • State changes to t • Input cursor moves one position right • If edge labeled by nonterminal A: • State changes to start state for A • Input cursor is not moved • If final state of A reached, then state changes to t • If edge labeled by ε, state changes to t • Can be recursive or non-recursive using stack

  30. Transition Diagram Example E  TE’ E’  +TE’ | ε T  FT’ T’  *FT’ | ε F  (E) | id E  E + T | T T  T * F | F F  (E) | id E: T’: E’: T: F:

  31. Simplifying Transition Diagrams E’: E:

  32. Input Stack Nonrecursive Predictive Parsing (1)

  33. Nonrecursive Predictive Parsing (2) • Program considers X, the symbol on top of the stack, and a, the next input symbol • If X = a = $, parser halts successfully • if X = a ≠ $, parser pops X off stack and advances to next input symbol • If X is a nonterminal, the program consults M[X, a] (production or error entry)

  34. Nonrecursive Predictive Parsing (3) • Initialize stack with start symbol of grammar • Initialize input pointer to first symbol of input • After consulting parsing table: • If entry is production, parser replaces top entry of stack with right side of production (leftmost symbol on top) • Otherwise, an error recovery routine is called

  35. Predictive Parsing Table

  36. Using a Predictive Parsing Table

  37. FIRST • FIRST(α) is the set of all terminals that begin any string derived from α • Computing FIRST: • If X is a terminal, FIRST(X) = {X} • If Xε is a production, add ε to FIRST(X) • If X is a nonterminal and XY1Y2…Ynis a production: • For all terminals a, add a to FIRST(X) if a is a member of any FIRST(Yi) and ε is a member of FIRST(Y1), FIRST(Y2), … FIRST(Yi-1) • If ε is a member of FIRST(Y1), FIRST(Y2), … FIRST(Yn), add ε to FIRST(X)

  38. FOLLOW • FOLLOW(A), for any nonterminal A, is the set of terminals a that can appear immediately to the right if A in some sentential form • More formally, a is in FOLLOW(A) if and only if there exists a derivation of the form S *=>αAaβ • $ is in FOLLOW(A) if and only if there exists a derivation of the form S *=> αA

  39. Computing FOLLOW • Place $ in FOLLOW(S) • If there is a production A  αBβ, then everything in FIRST(β) (except for ε) is in FOLLOW(B) • If there is a production A αB, or aproduction A αBβ where FIRST(β) contains ε,then everything in FOLLOW(A) is also in FOLLOW(B)

  40. FIRST and FOLLOW Example E  TE’ E’  +TE’ | ε T  FT’ T’  *FT’ | ε F  (E) | id FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E’) = {+, ε} FIRST(T’) = {*, ε} FOLLOW(E) = FOLLOW(E’) = {), $} FOLLOW(T) = FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, *, $}

  41. Creating a Predictive Parsing Table • For each production A  α : • For each terminal a in FIRST(α) add A  α to M[A, a] • If εis in FIRST(α) add A  α to M[A, b] for every terminal b in FOLLOW(A) • If εis in FIRST(α) and $ is in FOLLOW(A) add A  α to M[A, $] • Mark each undefined entry of M as an error entry (use some recovery strategy)

  42. Multiply-Defined Entries Example S iEtSS’ | a S’ eS | ε E  b

  43. LL(1) Grammars (1) • Algorithm covered in class can be applied to any grammar to produce a parsing table • If parsing table has no multiply-defined entries, grammar is said to be “LL(1)” • First “L”, left-to-right scanning of input • Second “L”, produces leftmost derivation • “1” refers to the number of lookahead symbols needed to make decisions

  44. LL(1) Grammars (2) • No ambiguous or left-recursive grammar can be LL(1) • Eliminating left recursion and left factoring does not always lead to LL(1) grammar • Some grammars can not be transformed into an LL(1) grammar at all • Although the example of a non-LL(1) grammar we covered has a fix, there are no universal rules to handle cases like this

  45. Shift-Reduce Parsing • One simple form of bottom-up parsing is shift-reduce parsing • Starts at the bottom (leaves, terminals) and works its way up to the top (root, start symbol) • Each step is a “reduction”: • Substring of input matching the right side of a production is “reduced” • Replaced with the nonterminal on the left of the production • If all substrings are chosen correctly, a rightmost derivation is traced in reverse

  46. Shift-Reduce Parsing Example S aABe A  Abc | b B -> d abbcde aAbcde aAde aABe S S rm=> aABe rm=>aAde rm=>aAbcde rm=> abbcde

  47. Handles (1) • Informally, a “handle” of a string: • Is a substring of the string • Matches the right side of a production • Reduction to left side of production is one step along reverse of rightmost derivation • Leftmost substring matching right side of production is not necessarily a handle • Might not be able to reduce resulting string to start symbol • In example from previous slide, if reduce aAbcde to aAAcde,can not reduce this to S

  48. Handles (2) • Formally, a handle of a right-sentential form γ: • Is a production A  β and a position of γ where βmay be found and replaced with A • Replacing A by β leads to the previous right-sentential form in a rightmost derivation of γ • So if S rm*=> αAw rm=> αβw then A  β in the position following α is a handle of αβw • The string w to the right of the handle contains only terminals • Can be more than one handle if grammar is ambiguous (more than one rightmost derivation)

  49. Ambiguity and Handles Example E  E + E E  E * E E  (E) E  id E rm=> E + E rm=> E + E * E rm=> E + E * id3 rm=> E + id2 * id3 rm=> id1 + id2 * id3 E rm=> E * E rm=> E * id3 rm=> E + E * id3 rm=> E + id2 * id3 rm=> id1 + id2 * id3

  50. Handle Pruning • Repeat the following process, starting from string of tokens until obtain start symbol: • Locate handle in current right-sentential form • Replace handle with left side of appropriate production • Two problems that need to be solved: • How to locate handle • How to choose appropriate production

More Related