1 / 55

SYNTAX ANALYSIS - 1

UNIT - 2. SYNTAX ANALYSIS - 1. OBJECTIVES. The Role of Parser Context Free Grammars Writing a Grammar Parsing – Top Down and Bottom Up Simple LR Parser Powerful LR Parser Using Ambiguous Grammars Parser Generators. Syntax Analysis and Parsing. Syntax

bracha
Download Presentation

SYNTAX ANALYSIS - 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNIT - 2 SYNTAX ANALYSIS - 1 -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  2. OBJECTIVES • The Role of Parser • Context Free Grammars • Writing a Grammar • Parsing – Top Down and Bottom Up • Simple LR Parser • Powerful LR Parser • Using Ambiguous Grammars • Parser Generators -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  3. Syntax Analysis and Parsing • Syntax • The way in which words are stringed together to form phrases, clauses and sentences • Syntax Analysis • The task concerned with fitting a sequence of tokens into a specified sequence • Parsing • To break a sentence down into its component parts with an explanation of the form, function, and syntactical relationship of each part -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  4. Syntax Analysis • Every PL has rules that describe the syntactic structure of well-formed programs • In C, a program is made up of functions, declarations, statements etc • Syntax can be specified using context-free grammars or BNF • Grammars offer benefits to both designers & compiler writers • Gives precise yet easy-to-understand syntactic specification of PL’s • Helps construct efficient parsers automatically • Provides a good structure to the language which in turn helps generate correct object code • Allows a language to be evolved iteratively by adding new constructs to perform new tasks -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  5. The Role of a Parser • Parser obtains a string of tokens from the lexical analyzer • Verifies that the string of token names can be generated by the grammar • Constructs a parse tree and passes it to the rest of the compiler • Reports errors when the program does not match the syntactic structure of the source language -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  6. Parsing Methods • Universal Parsing Method • Can parse any grammar, but too inefficient to use in any practical compilers • Example: Earley’s Algorithm • Top-Down Parsing • Build parse trees from the root to the leaves • Can be generated automatically or written manually • Example : LL Parser • Bottom-Up Parsing • Starts from the leaves and work their way up to the root • Can only be generated automatically • Example : LR Parser -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  7. Representative Grammars • Some of the grammars used in the discussion • LR grammar for bottom-up parsing E  E + T | T T  T * F | F F  ( E ) | id • Non-recursive variant of the grammar E  T E` E`  + T E` | ε T  F T` T  * F T` | ε F  ( E ) | id -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  8. Syntax Error Handling • A compiler must locate and track down errors but the error handling is left to the compiler designer • Common PL errors at many different levels • Lexical Errors • Misspellings of identifiers, keywords or operators • Syntactic Errors • Misplaced semicolons or extra or missing braces, that is “{“ or “}” • Semantic Errors • Type mismatches between operators and operands • Logical Errors • Incorrect reasoning by the programmer or use of assignment operator (=) instead of comparison operator (==) -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  9. Syntax Error Handling • Parsing methods detect errors as soon as possible, i.e., when stream of tokens cannot be parsed further • Have the viable-prefix property • Detect an error has occurred as soon as a prefix of the input that cannot be completed to form a string is seen • Errors appear syntactic and are exposed when parsing cannot continue • Goals of error handler in a parser • Report the presence of errors clearly and accurately • Recover from each error quickly enough to detect subsequent errors • Add minimal overhead to the processing of correct programs -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  10. Error-Recovery Strategies • How should the parser recover once an error is detected? • Quit with an informative error message when it detects the first error • Additional errors are uncovered if parser can restore to a state where processing of the input can continue • If errors pile up, its better to stop after exceeding some error limit • Four error-recovery strategies • Panic-Mode Recovery • Phrase-Level Recovery • Error Productions • Global Correction -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  11. Error-Recovery Strategies • Panic-Mode Recovery • Parser discards input symbols one at a time until one of a designated set of synchronizing tokens is found • Synchronizing tokens – delimiters like semicolon or “}” • Compiler designer must select the synchronizing tokens appropriate for the source language • Advantage of simplicity and is guaranteed not to go into an infinite loop • Phrase-Level Recovery • Perform local correction on remaining input, that is, may replace a prefix of the remaining input by some string that allows the parser to continue • Choose replacements that do not lead to infinite loops • Drawback is the difficulty to cope up with situations in which actual error has occurred before the point of detection -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  12. Error-Recovery Strategies • Error Productions • Anticipate the common errors that might be encountered • Augment grammar for language with productions that generate erroneous constructs • Such a parser detects anticipated errors when error production is used during parsing • Global Correction • Make as few changes in processing an incorrect input string • Use algorithms for choosing a minimal sequence of changes to obtain a globally least-cost correction • Given an incorrect input string x and grammar G, these algorithms find a parse tree for a related string y, such that the number of changes required to transform x to y is small • Too costly to implement in terms of time and space -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  13. Context-Free Grammars- Formal Definition • A context-free grammar G= (T, N, S, P) consists of: • T, a set of terminals (scanner tokens - symbols that may not appear on the left side of rule,). • N, a set of nonterminals(syntactic variables generated by productions – symbols on left or right of rule). • S, a designated start symbol nonterminal. • P, a set of productions (Rules). Each production consists of • A nonterminal called the left side of the production • The symbol  • A right side consisting of zero or more terminals and non-terminals -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  14. Context-Free Grammars- Formal Definition • Example grammar for simple arithmetic expressions expression  expression + term expression  expression – term expression  term term  term * factor term  term / factor term  factor factor  ( expression ) factor  id -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  15. Notational Conventions • Terminal symbols • Lowercase letters early in the alphabet like a,b,c • Operator symbols such as +, * • Punctuation symbols such as parentheses, and comma • Digits 0,1,….,9 • Nonterminal symbols • Uppercase letters early in the alphabet like A,B,C • Letter S, which, when used stands for the start symbol • Lowercase letters late in the alphabet like u,v,..z represent string of terminals • Uppercase letters late in the alphabet such as X, Y, Z represent grammar symbols that is either terminal or nonterminal -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  16. Notational Conventions • Lower case Greek letters α, β, γrepresent strings of grammar symbols • If A → α1 ,, A → α2 …….. A → αn are all Productions with A on the LHS ( known as A-productions), then, A → α1 | α2 …| αn (α1 , α2 … αn are alternatives of A) • Unless otherwise stated , the LHS of the first production is the start nonterminals • Example: previous grammar written using these conventions E  E + T | E – T | T T  T * F | T / F | F F  ( E ) | id` -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  17. Derivations • A way of showing how an input sentence is recognized with a grammar • Beginning with the start symbol, each rewriting step replaces a nonterminal by the body of one of its productions • Consider the following grammar E  E + E | E * E | – E | ( E ) | id • “E derives –E” can be denoted by E ==> – E • A derivation of – ( id ) from E E ==> – E ==> – ( E ) ==> – ( id) -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  18. Derivations • Symbols to indicate derivation • ==> denotes “derives in one step” • ==> denotes “derives in zero or more steps” • ==> denotes “derives in one or more steps” • Given the grammar G with start symbol S, a string ω is part of the language L (G), if and only if S ==> ω • If S ==> α and αcontains • Only terminals then it is a sentence of G • Both terminals and nonterminals then it is a Sentential form of G • Two choices to be made at each step in the derivation • Choose which nonterminal to replace • Pick a production with that nonterminal as head * + * * -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  19. Derivations • Leftmost Derivation • The leftmost nonterminal in each sentential form is always chosen to be replaced • If α==> β, is a step in which the leftmost nonterminal in αis replaced, we write α==> β • Rightmost Derivation • Replace the rightmost nonterminal in each sentential form • Written as α==> β • If S ==> α, then we can say that αis left-sentential form of the grammar lm rm * lm -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  20. Parse Trees and Derivations • A graphical representation for a derivation showing how to derive the string of a language from grammar starting from Start symbol • The interior node is labeled with the nonterminal in the head of the production • Children are labeled by the symbols in the RHS of the production • Yield of the tree • The string derived or generated from the nonterminal at the root of the tree • Obtained by reading the leaves of the parse tree from left to right -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  21. Parse Trees and Derivations Fig: Parse tree for – (id +id) -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  22. Ambiguity • A grammar that produces more than one parse tree for some sentence is said to be ambiguous • Can be a leftmost derivation or a rightmost derivation for the same sentence Fig: Two Parse trees for id+id*id -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  23. CFG’s Versus Regular Expressions • Every construct that can be described by a regular expression can be described by a grammar, but not vice-versa. • Grammar construction from the NFA • For each state i of the NFA, create a nonterminalAi • If state i has a transition to state j on input a, add the production Ai aAj If state i goes to state j on input ε, add the production Ai Aj • If i is an accepting state, add Ai ε • If i is the start state, make Ai the start symbol of the grammar -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  24. Exercises -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  25. -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  26. Writing a Grammar • Lexical versus Syntactic Analysis • Why use RE’s to define lexical syntax of a language? • Separating syntactic structure into lexical and non-lexical parts modularizes a compiler into two components • Lexical rules are simple and easy to describe • RE’s provide a concise and easier-to-understand notation for tokens than grammars • Efficient lexical analyzers can be constructed automatically from RE’s than from arbitrary grammars • RE’s are most useful for describing the structure of constructs such as identifiers, constants, keywords and whitespace • Grammars are useful for describing nested structures such as balanced parentheses, matching begin-end’s, corresponding if-then-else’s -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  27. Eliminating Ambiguity • Ambiguous grammars can be rewritten to eliminate ambiguity • Example : “Dangling-else” grammar stmt ifexprthenstmt | if exprthenstmt elsestmt | other • Parse tree for the statement: if E1 then S1 else if E2 then S2else S3 -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  28. Eliminating Ambiguity • Grammar is ambiguous since the following string has two parse trees if E1 then if E2 then S1else S2 -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  29. Eliminating Ambiguity • Rewriting the “dangling-else” grammar • General rule is, “Match each else with the closest unmatched then” • Idea is that statement appearing between a then and else must be “matched” • That is, each interior statement must not end with an unmatched or open then • A matched statement is either an if-then-else statement containing no open statements or any other kind of unconditional statement -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  30. Eliminating Left Recursion • Left-recursive grammar • Grammar that has a nonterminal A such that there is a derivation A ==>A for some string  • Top-down parsing methods cannot handle left-recursive grammars • Immediate Left recursion : a production of the form A  A • Left-recursive pair of productions, A  A | βcan be replaced by: A  βA′ A′   A′ | ε • Eliminating immediate left recursion • First group the A-productions as A  A1 | A2 | . . . | Am | β1 | β2 | . . . | βn Where no βibegins with an A • Replace the A-productions by A  β1 A′ | β2 A′ | . . . | βn A′ A  1 A′ | 2 A′ | . . . | m A′ | ε + -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  31. Eliminating Left Recursion • Algorithm to eliminate left recursion from a grammar -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  32. Left Factoring • When choice between two alternative A-productions is not clear, • We rewrite the grammar to defer the decision until enough of the input has been seen • Two productions as below: • In general, if A  αβ1 | αβ2 are two A-productions • We do not know whether to expand A to αβ1 or αβ2 • Defer the decision by expanding A to αA′ • After seeing the input derived from α, we expand Á toβ1or to β2 • Left-factored the original productions become A  α A′ A′  β1 | β2 -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  33. Left Factoring • Algorithm for left-factoring a grammar • For each nonterminalA, find the longest prefix αcommon to two or more alternatives • If α ≠ ε – replace all of the A-productions A  αβ1 | αβ2 | . . . | αβn | γ, where γ represents alternatives that do not begin with α, by A  αA′ | γ A′  β1 | β2 | . . . | βn • Repeatedly apply this transformation until no two alternatives for a nonterminal have a common prefix -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  34. Non-Context Free Language Constructs • Some syntactic constructs found in PL cannot be specified using grammars alone • Example 1: Problem of checking that identifiers are declared before they are used in the program The abstract language is L1 = {wcw | w is in (a|b)*} • Example 2: Problem of checking that number of formal parameters in the declaration of a function agrees with the number of actual parameters in a use of the function The abstract language is L2 = {anbmcndm | n ≥ 1 and m ≥ 1} -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  35. Top-Down Parsing • Constructing a parse tree for the input string starting from the toot and creating nodes in preorder • Can be viewed as finding a leftmost derivation for an input string • Consider the grammar below E  T E′ E′  + T E′ | ε T  F T′ T′  * F T′ | ε F  ( E ) | id Sequence of parse trees for the input id+id*id -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  36. Top-Down Parsing -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  37. Top-Down Parsing • Determining the production to be applied for a nonterminalA is the key problem at each step of top-down parse • Once an A-production is chosen, the parsing process consists of “matching” the terminal symbols in production with input string • Two types • Recursive-descent parsing • May require backtracking to find the correct A-production to be applied • Predictive Parsing • No backtracking is requires • Chooses the correct A-production by looking ahead at the input a fixed number of symbols • LL(k) Grammars : Construct predictive parsers that look k symbols ahead in the input -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  38. Recursive-Descent Parsing • The parsing program consists of a set of procedures, one for each nonterminal • Execution begins with the procedure for the start symbol, which halts and announces success if its procedure body scans the input -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  39. Recursive-Descent Parsing • To allow backtracking, the code needs to be modified • Cannot choose a unique A-production at line (1), so must try each of the several productions in some order • Failure at line (7) is not ultimate failure, but tells that we need to return to line (1) and try another A-production • Only if there are no more A-productions to try, we declare that an input error has been found • To try another A-production, we need to be able to reset the input pointer to where it was when we first reached line (1) • A local variable is needed to store this input pointer -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  40. Recursive-Descent Parsing • Consider the grammar: S  c A d A  a b | a And input string w = cad -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  41. FIRST and FOLLOW • During parsing, FIRST and FOLLOW allow us to choose which production to apply, based on the next input symbol • FIRST(α) • Set of terminals that begin strings derived from α. If α derives ε, then ε is also in FIRST(α) • Example : A ==> c γ, so c is in FIRST(A) • Consider two A-productions A α | β, where FIRST(α) and FIRST(β) are disjoint sets • Choose between these by looking at the next input symbol a, since a can be in at most one of FIRST(α) and FIRST(β) , not both * -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  42. FIRST and FOLLOW • FOLLOW(A), for a nonterminalA • Set of terminals a that can appear immediately to the right of A in some sentential form • That is, set of terminals a such that there exists a derivation of the form S ==>αAaβ • If A can be the rightmost symbol in some sentential form, then $ is in FOLLOW(A) * -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  43. FIRST and FOLLOW • To compute FIRST(X), apply following rules until no more terminals or ε can be added to any FIRST set • If X is a terminal, then FIRST(X) = { X } • If X is a nonterminal and X  Y1Y2 . . . Ykis a production for some k≥1 • Place a in FIRST(X) if for some i, a is in FIRST(Yi) , and εis in all of FIRST(Y1), . . . , FIRST(Yi-1); i.e., Y1. . . Yi-1 ==> ε • If ε is in FIRST(Yj) for all j = 1, 2, . . ., k, then add ε to FIRST(X) • If Y1does not derive ε, then we add nothing more to FIRST(X), but if Y1 ==> ε, then we add FIRST(Y2) and so on • If X ε is a production, then add εto FIRST(X) * * -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  44. FIRST and FOLLOW • To compute FOLLOW(A) for all nonterminalsA, apply following rules until nothing can be added to any FOLLOW set • Place $ in FOLLOW(S), where S is the start symbol, and $ is the input right endmarker • If there is production , A  αBβ, then everything in FIRST(β) except ε is in FOLLOW(B) • If there is a production, A  αB, or a production A  αBβ, where FIRST(β) contains ε, then everything in FOLLOW(A) is in FOLLOW(B) -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  45. LL(1) Grammars • Predictive parsers needing no backtracking are constructed for a class of grammars called LL(1) grammars • First “L” stands for scanning the input from left to right • Second “L” stands for producing a leftmost derivation • “1” for using one input symbol of look ahead at each step to make parsing decisions • A grammar G is LL(1) if and only if whenever A α | β , are two distinct productions, the following hold: • For no nonterminala do both αandβderive strings beginning with a • At most one of αandβcan derive the empty string • If β ==> ε, then α does not derive any string beginning with a terminal in FOLLOW(A). Likewise, if α ==> ε, then β does not derive an string beginning with a terminal in FOLLOW(A) * * -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  46. LL(1) Grammars • Predictive parsers can be constructed for LL(1) grammars since the proper production to apply can be selected by looking only at the current input symbol • Next algorithm collects information from FIRST and FOLLOW sets • Into a predictive parsing table M[A,a], where A is the nonterminal and a is terminal • IDEA • Production A  αis chosen if the next input symbol a is in FIRST(α) • If α = ε, we again choose A  α , if the current input symbol is in FOLLOW(A), or if the $ on the input has been reached and $ is in FOLLOW(A) -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  47. LL(1) Grammars Algorithm: Construction of Predictive Parsing Table Input : Grammar G Output: Parsing Table M Method : For each production A  αof the grammar do the • For each terminal a in FIRST(A), add A  αto M[A,a]. • If ε is in FIRST(α), then for each terminal b in FOLLOW(A), add A  α to M[A,b]. If ε is in FIRST(α) and $ is in FOLLOW(A), add A  α to M[A,$] as well If after performing the above, there is no production at all in M[A,a], then set M[A,a] to error -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  48. Nonrecursive Predictive Parsing • Built by maintaining a stack explicitly rather than recursive calls • Parser mimics a leftmost derivation • If w is the input that has been matched so far, then the stack holds a sequence of grammar symbols α such that S ==> w α * lm -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  49. Nonrecursive Predictive Parsing -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

  50. Nonrecursive Predictive Parsing -Compiled by: Namratha Nayak www.Bookspar.com | Website for Students | VTU - Notes - Question Papers

More Related