Parsing

Parsing

Parsing Definition: the process of finding the derivation of word generated by particular grammar is called parsing. There are different parsing techniques, containing the following three • Top down parsing. • Bottom up parsing. • Parsing technique for particular grammar of arithmetic expression.

Top-Down Parsing • The parse tree is created top to bottom. • Top-down parser • Recursive-Descent Parsing • Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other alternatives.) • It is a general parsing technique, but not widely used. • Not efficient • Predictive Parsing • no backtracking • efficient • needs a special form of grammars (LL(1) grammars). • Recursive Predictive Parsing is a special form of Recursive Descent parsing without backtracking. • Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.

Recursive-Descent Parsing (uses Backtracking) • Backtracking is needed. • It tries to find the left-most derivation. S  aBc B  bc | b S S input: abc a Bc a B c b c b fails, backtrack

Top Down Parsing • Left to Right parse • Leftmost derivation • k-token look ahead  LL(k) k is often 1: resulting in LL(1) parsing • Needs to predict what production to use after seeing only k tokens from the right hand side. • Can be implemented by recursive descent parser which implicitly uses the call stack. A • Considered faster to write, but might use backtracking. • Can be implemented by table driven parser which explicit use of a parse-stack. • It must be possible to predict which production to use for a nonterminal by looling ahead by only k symbols • To predict a production we need to calculate first set for the RHS of that production. • If a production RHS is nullable (derives null), then it is also necessary to use follow set for the LHS nonterminal to predict that production

Left recursion • Any productions of the form A  Aα A  BA where B*ε Are called left recursive productions

Left Recursion Removal • Consider the grammar A  Aa | b • Halting condition of top-down parsing depend upon the generation of terminal prefixes to discover dead ends. • Repeated application of above rule fail to generate a prefix that can terminate the parse.

Removing left recursion • To remove left recursion from A, the A rules are divided into two groups. Left recursive and others A  AU1 | AU2 | AU3 | ….| AUj A  V1 | V2 | V3 | … | Vk Solution: A  V1 | V2 | V3 | … | Vk| V1Z | V2Z | … | VkZ Z  u1Z | u2Z | u3Z | … | ujZ | u1 | u2 | u3 | … | uj

A  Aa | b Solution: A  bZ | b Z  aZ | a A  Aa | Ab | b | c A  bZ | cZ | b | c Z  aZ | bZ | a | b Consider another example A  AB | BA | a B  b | c A  BAZ | aZ | BA Z  BZ | B B  b | c Example

Top down parsing • Simple grammar is a grammar having rules of the following form: • A  aα • Where a is a terminal and α is string of terminals and nonterminals • No Λ-productions • Each of the multiple productions for same nonterminal starts with different terminals or R.H.S.

Example • Consider the CFG S  aSB {a} S  b {b} B  a {a} B  bBa {b} R/R is Replace top of the stack by reversed right side of production and Retain the current pointer of read head

function S() { char inp; If inp == ‘a’ then { read(inp); S(); B(); } else if inp = ‘b’ then read(inp) else reject } function B() { If inp = ‘a’ then read(inp) Else if inp == ‘b’ { read(inp); B(); if inp = ‘a’ then read(inp); else reject; Else reject; } Parser Code void main() { read(inp); S(); if inp == ‘$’ then accept; else reject; }

Compute FIRST for Any String X • If X is a terminal symbol  FIRST(X)={X} • If X is a non-terminal symbol and X   is a production rule   is in FIRST(X). • If X is a non-terminal symbol and X  Y1Y2..Yn is a production rule  if a terminal a in FIRST(Yi) and  is in all FIRST(Yj) for j=1,...,i-1 then a is in FIRST(X).  if  is in all FIRST(Yj) for j=1,...,n then  is in FIRST(X). • If X is  FIRST(X)={} • If X is Y1Y2..Yn  if a terminal a in FIRST(Yi) and  is in all FIRST(Yj) for j=1,...,i-1 then a is in FIRST(X).  if  is in all FIRST(Yj) for j=1,...,n then  is in FIRST(X).

First Sets • To compute first(α) for any string of grammar symbol α • if α = Λ, then first(α) = Λ • If α is single terminal α, then first(α) = {α}. • If α is single nonterminal A and A  X1| ….| Xn are all the productions for A, then first(α) = first(X1) U … U first(Xn). • If α  X1X2…Xn for grammar symbols X1, …, Xn then

FIRST Example E  TE’ E’  +TE’ |  T  FT’ T’  *FT’ |  F  (E) | id FIRST(F) = {(,id} FIRST(TE’) = {(,id} FIRST(T’) = {*, } FIRST(+TE’) = {+} FIRST(T) = {(,id} FIRST() = {} FIRST(E’) = {+, } FIRST(FT’) = {(,id} FIRST(E) = {(,id} FIRST(*FT’) = {*} FIRST() = {} FIRST((E)) = {(} FIRST(id) = {id}

Compute FOLLOW (for non-terminals) • If S is the start symbol  $ is in FOLLOW(S) • if A  B is a production rule  everything in FIRST() is FOLLOW(B) except  • If ( A  B is a production rule ) or ( A  B is a production rule and  is in FIRST() )  everything in FOLLOW(A) is in FOLLOW(B). We apply these rules until nothing more can be added to any follow set.

FOLLOW Example E  TE’ E’  +TE’ |  T  FT’ T’  *FT’ |  F  (E) | id FOLLOW(E) = { $, ) } FOLLOW(E’) = { $, ) } FOLLOW(T) = { +, ), $ } FOLLOW(T’) = { +, ), $ } FOLLOW(F) = {+, *, ), $ }

Predicting a Production • The tokens needed to predict production A  αare given by P(A  α) where • Sometimes we may simply write P(α) with the A implied. • Returning to the example P(ETE`) = first(T) = {id, (} P(E`+TE`) = {+} P(E` ε) = follow(E`) = {),$} P(TFT`) = first(F) = {id, (} P(T`*FT`) = {*} P(T` ε) = follow(T`) = {+, ), $} P(Fid) = {id} P(F(E)) = {(}

LL(1) Parsing table P(ETE`) = first(T) = {id, (} P(E`+TE`) = {+} P(E` ε) = follow(E`) = {),$} P(TFT`) = first(F) = {id, (} P(T`*FT`) = {*} P(T` ε) = follow(T`) = {+, ), $} P(Fid) = {id} P(F(E)) = {(}

function E() { If(inp == ‘(‘ or inp == ‘id’) then { T(); Elist(); } else reject; } Function Elist() { If inp == ‘+’ then { read(inp); T(); Elist(); } else if (inp == ‘)’ or inp == ‘$’) then //do nothing else reject; } Writing Code P(ETE`) = first(T) = {id, (} P(E`+TE`) = {+} P(E` ε) = follow(E`) = {),$} P(TFT`) = first(F) = {id, (} P(T`*FT`) = {*} P(T` ε) = follow(T`) = {+, ), $} P(Fid) = {id} P(F(E)) = {(} T` is renamed as Tlist E` is renamed as Elist

Road map Determining if a grammar is LL(1) First sets “Follow” sets – we’ll come back to this later “Massaging” a grammar into LL Using the technique of “left factoring” …and eliminating “left recursion”

Example S + x * y z Want a derivation tree for any program like: x + y * z • S  E $ • E  T E’ • E’  + T E’ • E’  - T E’ • E’   • T  F T’ • T’  * F T’ • T’  / F T’ • T’   • F  id • F  num • F  ( E )

Example First(S) = {id,num,(} First(F) = {id,num,(} First(E’) = {+,-,$} • S  E $ • E  T E’ • E’  + T E’ • E’  - T E’ • E’   • T  F T’ • T’  * F T’ • T’  / F T’ • T’   • F  id • F  num • F  ( E ) Please excuse my “senior moment” last time: this grammar is LL(1)!!!

Table-driven Predictive Parsing • S  E $ • E  T E’ • E’  + T E’ • E’  - T E’ • E’   • T  F T’ • T’  * F T’ • T’  / F T’ • T’   • F  id • F  num • F  ( E ) + id $ * S E E’ T T’ F 1 2 5 3 6 9 7 9 10 empty=error

Predictive Parsing example Given left-most input token and top of stack (“x” and “S”), determine the Action on the right using the parsing table actions are “predict”, “match”, and “accept” update the input and stack accordingly Input tokens Parse stack Action • x + y * z $ S predict 1

Predictive Parsing example Input tokens Parse stack Action • x + y * z $ S predict 1 • x + y * z $ E $ predict 2 • x + y * z $ T E’ $ predict 6

Predictive Parsing example S $ E T E’ Input tokens Parse stack Action • x + y * z $ S predict 1 • x + y * z $ E $ predict 2 • x + y * z $ T E’ $ predict 6 Note that we are elaborating a parse tree for the input … …

Predictive Parsing example Input tokens Parse stack Action • x + y * z $ S predict 1 • x + y * z $ E $ predict 2 • x + y * z $ T E’ $ predict 6 • x + y * z $ F T’ E’ $ predict 10

Predictive Parsing example Input tokens Parse stack Action • x + y * z $ S predict 1 • x + y * z $ E $ predict 2 • x + y * z $ T E’ $ predict 6 • x + y * z $ F T’ E’ $ predict 10 • x + y * z $ id T’ E’ $ match • Now we have a terminal on top of the stack (“id”), • match occurs because “x” and “id” are both • terminal symbols • we match by consuming a token & terminal • here is where a parse error might occur

Predictive Parsing example Input tokens Parse stack Action • x + y * z $ S predict 1 • x + y * z $ E $ predict 2 • x + y * z $ T E’ $ predict 6 • x + y * z $ F T’ E’ $ predict 10 • x + y * z $ id T’ E’ $ match • + y * z $ T’ E’ $ predict 9

Predictive Parsing example Input tokens Parse stack Action • x + y * z $ S predict 1 • x + y * z $ E $ predict 2 • x + y * z $ T E’ $ predict 6 • x + y * z $ F T’ E’ $ predict 10 • x + y * z $ id T’ E’ $ match • + y * z $ T’ E’ $ predict 9 • + y * z $  E’ $ predict 3

Predictive Parsing example Input tokens Parse stack Action • x + y * z $ S predict 1 • x + y * z $ E $ predict 2 • x + y * z $ T E’ $ predict 6 • x + y * z $ F T’ E’ $ predict 10 • x + y * z $ id T’ E’ $ match • + y * z $ T’ E’ $ predict 9 • + y * z $  E’ $ predict 3 • + y * z $ + T E’ $ match • y * z $ T E’ $ predict 6

Predictive Parsing example Input tokens Parse stack Action • x + y * z $ S predict 1 • x + y * z $ E $ predict 2 • x + y * z $ T E’ $ predict 6 • x + y * z $ F T’ E’ $ predict 10 • x + y * z $ id T’ E’ $ match • + y * z $ T’ E’ $ predict 9 • + y * z $  E’ $ predict 3 • + y * z $ + T E’ $ match • y * z $ T E’ $ predict 6 • y * z $ F T’ E’ $ predict 10

Predictive Parsing example Input tokens Parse stack Action • < … > • y * z $ F T’ E’ $ predict 10 • y * z $ id T’ E’ $ match • * z $ T’ E’ $ predict 7 • * z $ * F T’ E’ $ match • z $ F T’ E’ $ predict 10

Predictive Parsing example Input tokens Parse stack Action • < … > • z $ id T’ E’ $ match • $T’ E’ $ predict 9 • $ E’ $ predict 5 • $$ match •   accept

Parsing

Parsing

Presentation Transcript

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing