1.12k likes | 1.16k Views
COMPILER DESIGN. 1. M O D U LE II. Role of the Parser Context Free Grammars Top down Parsing Bottom Up Parsing Operator Precedence Parsing LR Parsers SLR Canonical LR LALR Parser Generator. 2. THE ROLE OF THE PARSER. Lexical A n a l y z e r. source program . parse tree.
E N D
MODULE II • Role of theParser • Context FreeGrammars • Top downParsing • Bottom UpParsing • Operator PrecedenceParsing • LRParsers • SLR • CanonicalLR • LALR • ParserGenerator 2
THE ROLE OF THEPARSER Lexical Analyzer source program parsetree Symboltable 3
The Role of theParser • • • Syntax Analyzer is also known asparser. • Syntax Analyzer gets a stream of tokens from lexical analyzer and creates the syntactic structure of the given source program. • This syntactic structure is mostly a parsetree. • The syntax of a programming is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFG’s. • The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. • If it satisfies, the parser creates the parse tree of thatprogram. • Otherwise the parser gives the errormessages • • • 4
The Role of theParser • We categorize the parsers into twogroups: • Top-DownParser • the parse tree is created from top to bottom, starting from theroot. • Bottom-UpParser • the parse tree is created from bottom to top, starting from theleaves • • Both top-down and bottom-up parsers scan the input from left to right and one symbol at atime. Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-freegrammars. • – – LL for top-down parsing LR for bottom-upparsing 5
Syntax errorhandler Examples of errors in differentphases :misspelling of an identifier, keyword oroperator • • • • Lexical Syntactic : arithmetic expression with unbalanced parentheses Semantic : operator applied to an incompatibleoperand : infinite recursivecall Goals of Error Handler in aParser Logical • • • It should report the presence of errors clearly and accurately It should recover from each errorquickly It should not significantly slow down the processing of correct programs 6
Error recoverystrategies • • Four types of error recoverystrategies • Panic mode • Phrase level • Errorproductions • Globalcorrection • Panic moderecovery: • On discovering an error, parser discards input symbols one at a time until one of the designated set of synchronizing tokens is found. • The synchronizing tokens are usually delimiters such as semicolon orend • It skips many input without checking additional errors ,so it has an advantage of simplicity • It guaranteed not to go in to an infiniteloop • 7
Error recoverystrategies • Phrase levelrecovery • On discovering an error ,parser perform local correction on the remaining input • It may replace a prefix of the remaining input by some string that allows the parser tocontinue • Local correction would be to replace a comma by a semicolon, delete an extra semicolon ,insert a missing semicolon. • Errorproductions • Augment thegrammarwith productions that generatethe erroneous constructs • The grammar augmented by these error productions to construct aparser • If an error production is used by the parser, generate error diagnostics to indicatetheerroneousconstruct recognized the input 8
Error recoverystrategies • Global correction • Algorithms are used for choosing a minimal sequenceof changes to obtain a globally least cost correction • Given an incorrect input string x and grammar G, these algorithms will find a parse tree for a related string y such that the number of insertions, deletions and changes of tokens required to transform x in to y is as small aspossible. • This technique is most costly in terms of time andspace 9
Context-FreeGrammars • • Inherently recursive structures of a programming language are defined by a context-freegrammar. • In a context-free grammar ,wehave: • A finite set of terminals ( The set oftokens) • A finite set of non-terminals(syntactic-variables) • A finite set of productions rules in the followingform • • AwhereA is anon-terminaland is a string of terminals and non-terminals including the emptystring) • – A start symbol (one of the non-terminalsymbol) • Context-free grammar, G =(V,T,S,P).
Context-FreeGrammars Example: expr → expr opexpr expr→ ( expr ) expr → - expr expr →id op → + op →- op →* op →/ op →↑ Terminals:id + - * / ↑ () Non terminal : expr ,op Start Symbol :expr • • • •
NotationalConventions • Terminals • Lowercaseletters , Operator symbols , punctuation symbols, the digits, if, idetc • Non Terminals • Upper case letters, Startsymbol • GrammarSymbols • • – Either non terminals orterminals Example: E → EAE | (E) | -E |id A → + | - | * |/ or E and A are non terminals E is startsymbol Others areterminals E id
Derivations • E E+E • E+E derives from E • – we can replace E byE+E • • • E E+E id+E id+id A sequence of replacements of non-terminal symbols is called a derivation of id+id from E. In general a derivation stepis A if there is a production rule Ain ourgrammar where and are arbitrary strings of terminal and non-terminalsymbols 1 2 ...n (n derivesfrom1 or 1 derives n) : derives in one step : derives in zero or moresteps : derives in one or moresteps * +
CFG -Terminology • L(G) is the language of G (the language generated by G) which is a set ofsentences. A sentenceofL(G) is a string of terminal symbols of G. If S is the start symbol of Gthen is a sentence of L(G)iff S+ where is a string of terminals ofG. If G is a context-free grammar, L(G) is a context-free language. Two grammars are equivalent if they produce the samelanguage. • • • • S* • • If contains non-terminals, it is called as a sentential form ofG. • If does not contain non-terminals, it is called as a sentence ofG.
DerivationExample E -E -(E) -(E+E) -(id+E) -(id+id) OR E -E -(E) -(E+E) -(E+id) -(id+id) • At each derivation step, we can choose any of the non-terminal in the sentential form of G for thereplacement. • If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-mostderivation. • If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation ( Canonical derivation). • •
Left-Most and Right-MostDerivations Left-MostDerivation E -E -(E) -(E+E) -(id+E) -(id+id) lm lm lm lm lm • Right-MostDerivation • Erm -Erm -(E)rm -(E+E)rm -(E+id)rm -(id+id) • We will see that the top-down parsers try to find the left-most derivation of the given source program. • We will see that the bottom-up parsers try to find the right-most derivation of the given source program in the reverseorder.
ParseTree • Inner nodes of a parse tree are non-terminalsymbols. • The leaves of a parse tree are terminalsymbols. • A parse tree can be seen as a graphical representation of aderivation. E E -E -(E) E E -(E+E) - E - E - E ( E ) ( E ) E + E E E - - E E -(id+E) -(id+id) ( E ) ( E ) E + E E + E id id id
Ambiguity • A grammar produces more than one parse tree for a sentence is called as an ambiguousgrammar. • E E E+E id+E id+E*E id+id*E id+id*id E + E id * E E id id E E E*E E+E*E id+E*E id+id*E id+id*id * E E E + E id id id
Ambiguity • For the most parsers, the grammar must beunambiguous. unambiguousgrammar unique selection of the parse tree for asentence • • We should eliminate the ambiguity in the grammar during the design phase of thecompiler. An unambiguous grammar should be written to eliminate theambiguity. We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice. • •
Ambiguity – “dangling else ” ifE1 then ifE2 thenS1 elseS2 stmt stmt ifexpr then E1 ifexpr stmt else stmt ifexpr thenstmt then stmt S2 E1 ifexpr then stmt elsestmt S1 S2 E2 E2 1 S1 2
Ambiguity • We prefer the second parse tree (else matches with closestif). • “ Match each else with the closest previous unmatchedthen” • So, we have to disambiguate our grammar to reflect thischoice. • The unambiguous grammar willbe: • stmtmatchedstmt | unmatchedstmt • matchedstmt ifexpr thenmatchedstmt elsematchedstmt | otherstmts • unmatchedstmt ifexpr thenstmt | • ifexpr thenmatchedstmt elseunmatchedstmt
LeftRecursion • Grammar LeftRecursive Right recursive for some string (left recursive ) for some string (right recursive) + A AA A + • • • Top-down parsing techniques cannot handle left-recursive grammars. We must eliminate leftrecursion The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation.
ImmediateLeft-Recursion Ingeneral, A A 1 | ... | A m | 1 | ... |n A 1 A’ | ... | nA’ A’ 1A’ | ... |mA’ | where 1 ... n do not start withA eliminate immediate leftrecursion an equivalentgrammar
Immediate Left-Recursion --Example EE+T | T TT*F | F F id | (E) E TE’ E’ +T E’ | T FT’ eliminate immediate leftrecursion T’ *FT’ | F id | (E)
Left-Factoring • A predictive parser (a top-down parser without backtracking) insists that the grammar must be left-factored Ingeneral, A1 | 2 . But, if we re-write the grammar as follows AA’ A’1 | 2 so, we can immediately expand A toA’ • •
Left-Factoring --Algorithm • For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix, letsay A1 | ... |n | 1 | ... |m AA’| 1 | ... |m A’ 1 | ... |n
Left-Factoring –Example1 S iEtS | iEtSeS |a Eb S iEtSS’ |a S’ eS | Eb
Top-DownParsing • • • The parse tree is created from top tobottom. • Top-downparser • Recursive-DescentParsing • Backtracking is needed (If a choice of a production rule does not work, we backtrack to try otheralternatives.) • It is a general parsing technique, but not widely used. • Notefficient • PredictiveParsing • no backtracking • efficient • needs a special form of grammars (LL(1)grammars). • RecursivePredictiveParsing is a special form of Recursive Descent parsing withoutbacktracking. • Non-Recursive Predictive Parser is also known as LL(1)parser.
Recursive-Descent Parsing (usesBacktracking) • • Backtracking isneeded. It tries to find the left-mostderivation. S aBc Bbc | b S S input:abc c a fails,backtrack a B B c b c b
PredictiveParser • When re-writing a non-terminal in a derivation step, a predictive parser can uniquely choose a production rule by just looking the current symbol in the input string. A 1 | ...| n input: ... a ....... currenttoken
Predictive Parser(example) stmt if...... | while...... | begin...... | for..... • When we are trying to write the non-terminal stmt, we can uniquely choose the production rule by just looking the currenttoken. • When we are trying to write the non-terminal stmt, if the current token is ifwe have to choose first production rule. • We eliminate the left recursion in the grammar, and left factorit. • •
Recursive PredictiveParsing • Each non-terminal corresponds to aprocedure. • Ex: A aBb • proc A{ • match the current token with a, and move to the next token; • call‘B’; • match the current token with b, and move to the nexttoken; • } (This is only the production rule forA)
Recursive Predictive Parsing A aBb | bAB proc A{ case of the current token{ • match the current token with a, and move to the nexttoken; • call‘B’; • match the current token with b, and move to the nexttoken; ‘a’: • ‘b’: - match the current token with b, and move to the nexttoken; • call‘A’; • call‘B’; • } • }
Recursive Predictive Parsing • When to apply-productions. A aA | bB | • If all other productions fail, we should apply an -production. For example, if the current token is not a or b, we may apply the -production. Most correct choice: We should apply an -production for a non- terminal A when the current token is in the follow set of A (which terminals can follow A in the sententialforms). •
Recursive Predictive Parsing(Example) A aBe| cBd | C B bB | C f proc A{ case of the current token{ match the current token withf, and move to the next token;} proc B{ case of the current token{ b: - match the current token withb, and move to the nexttoken; - callB e,d: donothing } follow set ofB proc C{ • a: - match the current token witha, and move to the nexttoken; • callB; • match the current token withe, and move to the nexttoken; • c: - match the current token withc, and move to the nexttoken; • callB; • match the current token withd, and move to the nexttoken; • f: - callC • first set ofC } } }
Non-Recursive Predictive Parsing -- LL(1)Parser • • • • Non-Recursive predictive parsing is a table-driven parser. It is a top-downparser. It is also known as LL(1)Parser. In LL(1) the first “L“ second“L” scanning the input from left to right and producing a leftmost derivationand one input symbol of lookahead at eachstep the“1” It uses stackexplicity • • In non recursive predictive parser ,production is applied on the parsing table
Non-Recursive PredictiveParsing INPUT PredictiveParsing Program OUTPUT STACK Parsing TableM
LL(1)Parser • Inputbuffer • Input string to be parsed .The end of the string is marked with a special symbol$. • Output • A production rule representing a step of the derivation sequence (left-most derivation) of the string in the inputbuffer. • Stack • Contains the grammarsymbols • At the bottom of the stack, there is a special end marker symbol$. • Initially the stack contains only the symbol $ and the starting symbol S. ie,$S initialstack • When the stack is emptied (ie. only $ left in the stack), the parsing iscompleted. • Parsingtable • A two-dimensional array M[A,a] • Each row ( A ) ,is a non-terminalsymbol • Each column (a), is a terminal symbol or the special symbol$ • Each entry holds a productionrule.
LL(1) Parser – ParserActions • The symbol at the top of the stack (say X) and the current symbol in the input string (say a) determine the parseraction. There are FOUR possible PARSERACTIONS:- If X = a= $ parser halts and announces successful completion of the parsing If X = a#$ parser pops X from the stack, and advances the input pointerto the next inputsymbol 8. If X is anon-terminal parser looks at the parsing tableentryM[X,a]. If M[X,a] holds a production rule XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser also outputs the production rule XY1Y2...Yk to represent a step of the derivation. • • • 11. noneofthe above error – – all empty entries in the parsing table areerrors. If X is a terminal symbol different from a, this is also an errorcase.
Non Recursive Predictive Parsingprogram Input : A string w and a parsing table M for grammar G Output : If w is in L(G), a leftmost derivation of w; Otherwise, an error indication Method : Initially parser is in configuration ,it has $S on the stack with S , the start symbol of G on top ,and w$ in the inputbuffer. The program thatutilizesthe parsing table M to producea parse for theinput Algorithm:
Algorithm: set ip to point to the first symbol of w$; repeat let X be the top of the stack and a the symbol pointed by ip; if X is a terminal or $then if X=a then pop X from the stack and advanceip else error() else if M [X ,a] =X Y1 Y2 …YK thenbegin pop X from thestack; push YK … …Y2 Y1 on to the stack ,with Y1 ontop; output theproductionX Y1 Y2…YK end else error() until X=$
LL(1) Parser –Example1 S aBa BbB | Input :abba LL(1)Parsing Table stack input output
LL(1) Parser –Example1 Outputs: SaBa B bB BbB B Derivation(left-most): SaBaabBaabbBaabba S parse tree a B a b B b B
LL(1) Parser –Example2 E TE’ E’+TE’ | T FT’ T’*FT’ | F (E) | id EE+T | T TT*F | F F id | (E) Input : id+id
Constructing LL(1) ParsingTables Two functions are used in the construction of LL(1) parsingtables: – FIRST FOLLOW • • FIRST() is a set of the terminal symbols which occur as first symbols in strings derived from where is any string of grammarsymbols. if derives to , then is also in FIRST(). • • • FOLLOW(A) is the set of the terminals which occur immediately after (follow) the non-terminal A in the strings derived from the starting symbol. • a terminal a isinFOLLOW(A) if S* Aa • $ isin FOLLOW(A) if S* A
Compute FIRST for Any StringX If X is a terminalsymbol FIRST(X)={X} If X is a non-terminalsymbol and X is a productionrule FIRST(X) = { } If X is a non-terminalsymbol and X Y1Y2..Yn is a productionrule if a terminal a in FIRST(Yi) and is in all FIRST(Yj) for j=1,...,i-1 then a is inFIRST(X). if is in all FIRST(Yj) forj=1,...,n then is inFIRST(X). • • •
FIRSTExample E TE’ E’+TE’ | T FT’ T’*FT’ | F (E) | id FIRST(F)= {(,id} FIRST(TE’) ={(,id} FIRST(+TE’) = {+} FIRST() = {} FIRST(FT’) ={(,id} FIRST(*FT’) ={*} FIRST() = {} FIRST((E)) ={(} FIRST(id) ={id} FIRST(T’) = {*,} FIRST(T)= {(,id} FIRST(E’) = {+,} FIRST(E)= {(,id}
Compute FOLLOW (fornon-terminals) • If S is thestartsymbol $ is inFOLLOW(S) if ABis a production rule everything in FIRST() is FOLLOW(B) except • • If ( A B is a productionrule) or ( A Bis a production rule and is in FIRST()) everything in FOLLOW(A) is inFOLLOW(B). We apply these rules until nothing more can be added to any followset.
FOLLOWExample E TE’ E’+TE’ | T FT’ T’*FT’ | F (E) | id FOLLOW(E)= { $, )} FOLLOW(E’) = { $, )} FOLLOW(T)= { +, ), $} FOLLOW(T’) = { +, ), $} FOLLOW(F) = {+, *, ), $}