310 likes | 407 Views
BNF. BNF. BNF stands for either Backus-Naur Form or Backus Normal Form BNF is a metalanguage used to describe the grammar of a programming language BNF is formal and precise BNF is a notation for context-free grammars BNF is essential in compiler construction. BNF.
E N D
BNF • BNF stands for either Backus-Naur Form or Backus Normal Form • BNF is a metalanguage used to describe the grammar of a programming language • BNF is formal and precise • BNF is a notation for context-free grammars • BNF is essential in compiler construction
BNF • < > indicate a nonterminal that needs to be further expanded, e.g. <variable> • Symbols not enclosed in < > are terminals; they represent themselves, e.g. if, while, ( • The symbol ::= means is defined as • The symbol | means or; it separates alternatives, e.g. <addop> ::= + | -
BNF uses recursion • <integer> ::= <digit> | <integer> <digit>or<integer> ::= <digit> | <digit> <integer> • Recursion is all that is needed (at least, in a formal sense) • "Extended BNF" allows repetition as well as recursion • Repetition is usually better when using BNF to construct a compiler
BNF Examples I • <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • <if statement> ::= if ( <condition> ) <statement> | if ( <condition> ) <statement> else <statement>
BNF Examples II • <unsigned integer> ::= <digit> | <unsigned integer> <digit> • <integer> ::= <unsigned integer> | + <unsigned integer> | - <unsigned integer>
BNF Examples III • <identifier> ::= <letter> | <identifier> <letter> | <identifier> <digit> • <block> ::= { <statement list> } • <statement list> ::= <statement> | <statement list> <statement>
BNF Examples IV • <statement> ::= <block> | <assignment statement> | <break statement> | <continue statement> | <do statement> | <for loop> | <goto statement> | <if statement> | . . .
Extended BNF • The following are pretty standard: • [ ] enclose an optional part of the rule • Example:<if statement> ::= if ( <condition> ) <statement> [ else <statement> ] • { } mean the enclosed can be repeated any number of times (including zero) • Example:<parameter list> ::= ( ) | ( { <parameter> , } <parameter> )
Parser • Parser works on a stream of tokens. • The smallest item is a token. token source program parse tree get next token
E E E - E - E - E ( E ) ( E ) E + E E E - E - E ( E ) ( E ) E E + E + E id id id Parse Tree • Inner nodes of a parse tree are non-terminal symbols. • The leaves of a parse tree are terminal symbols. • A parse tree can be seen as a graphical representation of a derivation. • Example: • E E + E | E – E | E * E | E / E | - E • E ( E ) • E id E -E -(E) -(E+E) -(id+E) -(id+id)
Two groups of parser • We categorize the parsers into two groups: • Top-Down Parser • the parse tree is created top to bottom, starting from the root. • Bottom-Up Parser • the parse is created bottom to top; starting from the leaves • Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time). • Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars. • LL for top-down parsing • LR for bottom-up parsing
Left-Most and Right-Most Derivations Left-Most Derivation E -E -(E) -(E+E) -(id+E) -(id+id) Right-Most Derivation E -E -(E) -(E+E) -(E+id) -(id+id) lm lm lm lm lm rm rm rm rm rm
Ambiguity rules • For the most parsers, the grammar must be unambiguous. • We should eliminate the ambiguity in the grammar during the design phase of the compiler. • An unambiguous grammar should be written to eliminate the ambiguity. • We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice.
E E + E id * E E id id E * E E E + E id id id Ambiguity • A grammar produces more than one parse tree for a sentence is • called as an ambiguous grammar. • Example: • E E + E | E * E • E id • E E+E • id+E • id+E*E • id+id*E • id+id*id • E E*E • E+E*E • id+E*E • id+id*E • id+id*id
Unambiguous grammar unique selection of the parse tree for the grammar(Sentence).
Which is Ambiguity? BNF : stmt if expr then stmt | if expr then stmt else stmt | otherstmts if E1then if E2then S1else S2 1? 2?
Ambiguity • We prefer the second parse tree (else matches with closest if). • So, we have to disambiguate our grammar to reflect this choice. • The disambiguous grammar will be: stmt matchedstmt | unmatchedstmt matchedstmt if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt if expr then stmt | if expr then matchedstmt else unmatchedstmt
Operator Precedence- Ambiguity • Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associatively rules. E E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left) * (left to right) + (left to right)
Left Recursion • A grammar is left recursive if it has a non-terminal A such that there is a derivation. A A for some string • Top-down parsing techniques cannot handle left-recursive grammars. • So, we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive. • The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation. +
Immediate Left-Recursion A A | where does not start with A eliminate immediate left recursion A A’ A’ A’ | an equivalent grammar
In general A A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A eliminate immediate left recursion A 1 A’ | ... | n A’ A’ 1 A’ | ... | m A’ | an equivalent grammar
Immediate Left-Recursion E E+T | T T T*F | F F id | (E)
After eliminate immediate left recursion eliminate immediate left recursion E T E’ E’ +T E’ | T F T’ T’ *F T’ | F id | (E)
Creating a top-down parser Top-down parsing can be viewed as the problem of constructing a parse tree for the input string, starting form the root and creating the nodes of the parse tree in preorder. An example follows.
Top-down parsing A top-down parsing program consists of a set of procedures, one for each non-terminal. Execution begins with the procedure for the start symbol, which halts and announces success if its procedure body scans the entire input string.
Creating a top-down parser (Cont.) Given the grammar : E → TE’ E’ → +TE’ | λ T → FT’ T’ → *FT’ | λ F → (E) | id The input: id + id * id