810 likes | 976 Views
Top-Down Parsing. Teoría de Autómatas y Lenguajes Formales M. Luisa González Díaz Universidad de Valladolid, 2006. Task. Parsing (of course); but do it: Top-Down Easy and algorithmic Efficiently Knowing (input) as little as possible Marking errors as soon as possible. Example.
E N D
Top-Down Parsing Teoría de Autómatas y Lenguajes Formales M. Luisa González Díaz Universidad de Valladolid, 2006
Task Parsing (of course); but do it: • Top-Down • Easy and algorithmic • Efficiently • Knowing (input) as little as possible • Marking errors as soon as possible
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type array [ num ptpt num ] of char $ Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type array array [ num ptpt num ] of char $ Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type array [ [ num ptpt num ] of char $ Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type num ptpt num array [ num num ptpt num ] of char $ Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type num ptpt num array [ num ptpt ptpt num ] of char $ Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type num ptpt num array [ num ptpt num num ] of char $ Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type num ptpt num array [ num ptpt num ] ] of char $ Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type num ptpt num array [ num ptpt num ] of of char $ Lexical Analyzer
simple char Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type num ptpt num array [ num ptpt num ] of char char $ Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type array [ simple ] of type simple num ptpt num char array [ num ptpt num ] of char $ Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type $ (no more) array [ simple ] of type simple num ptpt num char array [ num ptpt num ] of char Lex. An. $
Code proceduretype; if … then match(array); match (‘[‘); simple; match(‘]‘); match(of); type else if … then match(‘^’); simple else if ... then simple else error
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type ^ ^ integer Lexical Analyzer
Example type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num type ^ simple ^ integer integer $ Lexical Analyzer
match procedure match (t: token); begin if lookahead = t then looakahead := nexttoken (lexical analyzer) else error end
Solving ifs type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num We said “that rule” because we know next input symbol (lookahead): type: • array : • ^ : • rule 1: • any other : error will be detected later • integer, char, num : error detected now rule 3 rule 2
Still not complete! FIRST (α) := { a ЄΣT / α ═>* a β } A A A α = a β α α = B δ a β a γ Generalizing type→ simple | ^ simple | array [simple] oftype simple → integer | char | num ptpt num Choose rule A → αwhen a (lookahead) can appear as first symbol derived from α
Predictive Parsing Table A → α with α≠εto PPT [ A, FIRST(α) ]
U A→αЄ P Still not complete! FIRST FIRST (α) := { a ЄΣT / α ═>* a β } • FIRST (a) = { a } • FIRST (A) = FIRST (α) • FIRST (A α) = FIRST (A)
E → T E’ FIRST E E’ T T’ F E’ → + T E’ | ab + a ( n * a ( n ( n T → F T’ T’ → * F T’ | a F → ( E ) | n First (T’) = First ( *FT’) U First (a) = { *, a } First(E’) = First ( +TE’) U First (ab) = {+, a } First (E) = First (TE’) = First (T) First (T) = First (FT’) = First (F) First (F) = { (, n }
E → T E’ FIRST E E’ T T’ F E’ → + T E’ | ab + a ( n ( n * a ( n T → F T’ T’ → * F T’ | a F → ( E ) | n TE’ TE’ A →αwith α≠ ε to T [ A, First (α) - {ε} ] ab +TE’ FT’ FT’ a *FT’ E’ → + T E’ T’ → * F T’ F → ( E ) E → T E’ E’ → ab T → FT’ F → n T’ → a (E) n
A bad example program → program id ; | program id ( par-list ) ; program program ; id program id Lexical Analyzer ;
A bad example: backtracking program → program id ; | program id ( par-list ) ; program $ (no more) program id ; Ups! program id Lexical Analyzer (
A bad example: backtracking program → program id ; | program id ( par-list ) ; program $ (no more) program id ; program id Lexical Analyzer (
A bad example: backtracking program → program id ; | program id ( par-list ) ; program $ (no more) program id ; ( param-list ) Lexical Analyzer
Not so bad: factorising program → program id R ; R →( par-list ) |ε program $ (no more) program R id ; ε $ program id Lexical Analyzer ;
Another bad example E → E + n | n E E + n E + n E + n n n + n + n + n
Another bad example E → E + n | n E $ E + n E + n n n + n + n $ Lexical Analyzer
E’ n E’ + n E’ + n ε Not so bad either: eliminating left recursion E → n E’ E’ → + n E’ | ε E → E + n | n E $ Lexical Analyzer n + n + n $
FIRST (α) := { a ЄΣT / α ═>* a β } { ε } ∩ A=>*ε FIRST
E → T E’ FIRST E E’ T T’ F E’ → + T E’ | ε + ε ( n * ε ( n ( n T → F T’ T’ → * F T’ | ε F → ( E ) | n First (E) = First (TE’) First(E’) = {+, ε } First (T) = First (FT’) First (T’) = { *, ε } First (F) = { (, n }
E → T E’ FIRST E E’ T T’ F E’ → + T E’ | ε + ε ( n ( n * ε ( n T → F T’ T’ → * F T’ | ε F → ( E ) | n First ( E’F ) = { + , (, n } First ( E’T’F ) = { + , * , (, n } First ( E’T’T’ ) = { + , * , ε }
FIRST (α) := { a ЄΣT / α ═>* a β } { ε } ∩ ∩ A=>*ε A=>*ε U A→αЄ P FIRST • FIRST (a) = { a } • FIRST (A) = FIRST (α) • FIRST (A α) = (FIRST (A) – { ε }) FIRST (α) • εЄ FIRST (X1X2 … Xp) iff εЄ FIRST (Xi) i A
Remember program → program id R ; R →( par-list ) |ε program $ (no more) program R id ; ε $ program id Lexical Analyzer ;
Remember program → program id R ; R →( par-list ) |ε program $ (no more) program R id ; ERROR ε program id Lexical Analyzer id
Remember program → program id R ; R →( par-list ) |ε “Marking errors as soon as possible” program $ (no more) program R id ; ERROR program id Lexical Analyzer id
FOLLOW (A) := { a ЄΣT / S ═>*α Aa β } { $ } S S S C C S S $ … … … … ∩ … C α α A A a B β β C C S=>*αA α A B a α A a β α A ε … a B ... ε - rules Choose rule A → ε when a (lookahead) can appear following A in a sentential form
Follow • Righthand sides: αAβ with β≠ε add First(β)-{ε} to Follow (A) • For every rule B → αA add Follow (B) to Follow (A) • For every rule B → αAβ with β═> * ε add Follow (B) to Follow (A) • Add $ to Follow (Start Symbol)
Follow: algorithm 0. Add $ to Follow (Start symbol) • Righthand sides: αAβ with β≠ε add First(β)-{ε} to Follow (A) • For every rule B → αA or B → αAβ with β═> * ε add Follow (B) to Follow (A)
FIRST E E’ T T’ F ( n + ε ( n * ε ( n E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | n E E’ T T’ F FOLLOW 0) $ 0) Add $ to Follow (Start Symbol)
FIRST E E’ T T’ F ( n + ε ( n * ε ( n E → T E’ E’ → + T E’ | ε + T → F T’ T’ → * F T’ | ε F → ( E ) | n E E’ T T’ F FOLLOW $ 0) +TE’ TE’ αAβ T E’ As before 1) • Righthand sides: • αAβ with β≠ε • Add First(β)-{ε} to Follow (A)
FIRST E E’ T T’ F ( n + ε ( n * ε ( n E → T E’ E’ → + T E’ | ε * T → F T’ T’ → * F T’ | ε F → ( E ) | n E E’ T T’ F FOLLOW $ + 0) * F T’ FT’ αAβ F T’ 1) As before • Righthand sides: • αAβ with β≠ε • Add First(β)-{ε} to Follow (A)
FIRST E E’ T T’ F ( n + ε ( n * ε ( n E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | n E E’ T T’ F FOLLOW $ + * 0) ) ( E ) (E) αAβ 1) • Righthand sides: • αAβ with β≠ε • Add First(β)-{ε} to Follow (A)
FIRST E E’ T T’ F ( n + ε ( n * ε ( n E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | n E E’ T T’ F FOLLOW $ + * 0) ) ) 1) 2) For every rule like B→αA or B → αAβ with β═> *ε Add Follow (B) to Follow (A) 2) E → T E’ E → TE’ B → αA E → TE’ B → αA β
FIRST E E’ T T’ F ( n + ε ( n * ε ( n E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | n E E’ T T’ F FOLLOW $ + * 0) ) ) 1) 2) For every rule like B→αA or B → αAβ with β═> *ε Add Follow (B) to Follow (A) 2) E’ → + T E’ E’ → + TE’ B → αA E’ → +TE’ B → αA β
FIRST E E’ T T’ F ( n + ε ( n * ε ( n E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | n E E’ T T’ F FOLLOW $ + * 0) ) ) 1) 2) Every rule B→αA or B → αAβ with β═> *ε Add Follow (B) to Follow (A) 2) T → F T’ T → FT’ B → αA T → FT’ B → αA β
FIRST E E’ T T’ F ( n + ε ( n * ε ( n E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | n E E’ T T’ F FOLLOW $ + * 0) ) ) 1) 2) Every rule B→αA or B → αAβ with β═> *ε Add Follow (B) to Follow (A) 2) T’ → * F T’ T’ → * FT’ B → αA T’ → *FT’ B → αA β