1 / 32

The Parser

The Parser checks and verifies syntax based on specified rules, reporting errors and building an IR. Learn about parsing, CFGs, top-down and bottom-up parsing, derivations, parse trees, ambiguity, and top-down parsing algorithms.

orowell
Download Presentation

The Parser

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Parser • Its job: • Check and verify syntax based on specified syntax rules • Report errors • Build IR • Good news • the process can be automated

  2. Parsing • The scanner recognizes words • The parser recognizes syntactic units • Can we use regular expressions? • No. How would we recognize L={pkqk} then? • We use Context-Free Grammars (CFGs) to specify context-free syntax. • Coming up: • CFGs • Top-down parsing • Bottom-up parsing

  3. CFGs • A CFG describes how a sentence of a language may be generated. • Example: • Use this grammar to generate the sentence mwa ha ha ha EvilLaugh  mwa EvilCackle EvilCackle  ha EvilCackle EvilCackle  ha

  4. CFGs • A CFG is a quadruple (N, T, R, S) where • N is the set of non-terminal symbols • T is the set of terminal symbols • S N is the starting symbol • R  N(NT)* is a set of rules • Example: Consider G = (N, T, R, S) where • N = {S} • T ={ (, ) } • R ={ S (S) , SSS, S }

  5. Derivations • Thelanguage described by a CFG is the set of strings that can be derived from the start symbol using the rules of the grammar. • At each step, we choose a non-terminal to replace. S(S) (SS) ((S)S) (( )S) (( )(S)) (( )((S))) (( )(( ))) sentential form derivation This example demonstrates a leftmost derivation : one where we always expand the leftmost non-terminal in the sentential form.

  6. Derivations and parse trees • We can describe a derivation using a graphical representation called parse tree: • the root is labeled with the start symbol, S • each internal node is labeled with a non-terminal • the children of an internal node A are the right-hand side of a production A • each leaf is labeled with a terminal • A parse tree has a unique leftmost and a unique rightmost derivation (however, we cannot tell which one was used by looking at the tree)

  7. Derivations and parse trees • So, how can we use the grammar described earlier to verify the syntax of "(( )((( ))))"? • We must try to find a derivation for that string. • We can work top-down (starting at the root/start symbol) or bottom-up (starting at the leaves). • Careful! • There may be more than one grammars to describe the same language. • Not all grammars are suitable

  8. Problems in parsing • Consider S  if E then S else S | if E then S • What is the parse tree for if E1 then S1 elseif E2 then S2 else S3 • There are two possible parse trees! This problem is called ambiguity • A CFG is ambiguous if one or more terminal strings have multiple leftmost derivations from the start symbol.

  9. Ambiguity • There is no general algorithm to tell whether a CFG is ambiguous or not. • There is no standard procedure for eliminating ambiguity. • Some languages are inherently ambiguous. • In those cases, any grammar we come up with will be ambiguous.

  10. Ambiguity • In general, we try to eliminate ambiguity by rewriting the grammar. • Example: • EE+E | EE | id becomes: EE+T | T TTF | F F id

  11. Ambiguity • In general, we try to eliminate ambiguity by rewriting the grammar. • Example: • Sif E then S else S | if E then S | other becomes: S  EwithElse | EnoElseEwithElse  if E then EwithElse else EwithElse | otherEnoElse  if E then S | if E then EwithElse else EnoElse

  12. Top-down parsing • Main idea: • Start at the root, grow towards leaves • Pick a production and try to match input • May need to backtrack • Some grammars are backtrack-free • Example: • Use the expression grammar to parse x-2*y

  13. Grammar problems • Because we try to generate a leftmost derivation by scanning the input from left to right, grammars of the form A  A x may cause endless recursion. • Such grammars are called left-recursive and they must be transformed if we want to use a top-down parser.

  14. Left recursion • A grammar is left recursive if for a non-terminal A, there is a derivation A+ A • There are three types of left recursion: • direct (A  A x) • indirect (A  B C, B  A ) • hidden (A  B A, B  )

  15. Left recursion • To eliminate direct left recursion replace A  A1 | A2 | ... | Am | 1 | 2 | ... | n with A  1B | 2B | ... | nB B  1B | 2B | ... | mB | 

  16. Left recursion • How about this: S  EE  E+T E  T T  E-T T  id There is direct recursion: EE+TThere is indirect recursion: TE+T, ET Algorithm for eliminating indirect recursion List the nonterminals in some order A1, A2, ...,An for i=1 to n for j=1 to i-1 if there is a production AiAj, replace Ajwith its rhs eliminate any direct left recursion on Ai

  17. Eliminating indirect left recursion ordering: S, E, T, F i=S i=E i=T, j=E S  EE  E+T E  T T  E-T T  F F  E*F F  id S  EE  E+T E  T T  E-T T  F F  E*F F  id S  EE  TE' E'+TE'| T  E-T T  F F  E*F F  id S  EE  TE' E'+TE'| T  TE'-T T  F F  E*F F  id S  EE  TE' E'+TE'| T  FT' T'  E'-TT'| F  E*F F  id

  18. Eliminating indirect left recursion i=F, j=E i=F, j=T S  EE  TE' E'+TE'| T  FT' T'  E'-TT'| F  TE'*F F  id S  EE  TE' E'+TE'| T  FT' T'  E'-TT'| F  FT'E'*F F  id S  EE  TE' E'+TE'| T  FT' T'  E'-TT'| F  idF' F'  T'E'*FF'|

  19. Grammar problems • Consider S  if E then S else S | if E then S • Which of the two productions should we use to expand non-terminal S when the next token is if? • We can solve this problem by factoring out the common part in these rules. This way, we are postponing the decision about which rule to choose until we have more information (namely, whether there is an else or not). • This is called left factoring

  20. Left factoring A  1 | 2 |...| n |  becomes A  B| B 1 | 2 |...| n

  21. Grammar problems • A symbol XV is useless if • there is no derivation from X to any string in the language (non-terminating) • there is no derivation from S that reaches a sentential form containing X (non-reachable) • Reduced grammar = a grammar that does not contain any useless symbols.

  22. Useless symbols • In order to remove useless symbols, apply two algorithms: • First, remove all non-terminating symbols • Then, remove all non-reachable symbols. • The order is important! • For example, consider S+ X where  contains a non-terminating symbol. What will happen if we apply the algorithms in the wrong order? • Concrete example: S  AB | a, A a

  23. Useless symbols • Example Initial grammar: S AB | CA A a B CB | AB C cB | b D aD | d Algorithm 1 (terminating symbols): A is in because of A a C is in because of C b D is in because of D d S is in because A, C are in and S AC

  24. Useless symbols • Example continued After algorithm 1: S CA A a C b D aD | d Algorithm 2 (reachable symbols): S is in because it is the start symbol C and A are in because S is in and S CA Final grammar: S CA A a C b

  25. Predictive parsing • Basic idea: • GivenA   |  the parser should be able to choose between  and • How? • What if we do some "preprocessing" to answer the question: Given a non-terminal A and lookahead t, which (if any) production of A is guaranteed to start with a t?

  26. Predictive parsing • If we have two productions: A , we want a distinct way of choosing the correct one. • Define: • for G, x  FIRST() iff  * x • If FIRST() and FIRST() contain no common symbols, we will know whether we should choose A or A by looking at the lookahead symbol. • Almost right...

  27. Predictive parsing • Compute FIRST(X) as follows: • if X is a terminal, then FIRST(X)={X} • if X is a production, then add  to FIRST(X) • if X is a non-terminal and XY1Y2...Yn is a production, add FIRST(Yi) to FIRST(X) if the preceding Yjs contain  in their FIRSTs

  28. Predictive parsing • What if we have a "candidate" production A where = or *? • We could expand if we knew that there is some sentential form where the current input symbol appears after A. • Define: • for AN, xFOLLOW(A) iff  S*Ax

  29. Predictive parsing • Compute FOLLOW as follows: • FOLLOW(S) contains EOF • For productions AB, everything in FIRST() except  goes into FOLLOW(B) • For productions AB or AB where FIRST() contains , FOLLOW(B) contains everything that is in FOLLOW(A)

  30. Predictive parsing • Armed with • FIRST • FOLLOW • we can build parser where no backtracking is required!

  31. Predictive parsing (w/table) • For each production A do: • For each terminal a  FIRST() add A to entry M[A,a] • If FIRST(), add A to entry M[A,b] for each terminal b  FOLLOW(A). • If FIRST() and EOFFOLLOW(A), add A to M[A,EOF] • Use table and stack to simulate recursion.

  32. Recursive Descent Parsing • Basic idea: • Write a routine to recognize each lhs • This produces a parser with mutually recursive routines. • Good for hand-coded parsers.

More Related