310 likes | 491 Views
Week 2 – Lecture 1. Compiler Construction. Introduction to Parsing Recursive Grammars Derivations and parse trees Ambiguous Grammars Overview of Top Down Parsing. Syntax Analysis. aka Parsing Grouping together tokens into larger structures Analogous to lexical analysis Input: Tokens
E N D
Week 2 – Lecture 1 Compiler Construction • Introduction to Parsing • Recursive Grammars • Derivations and parse trees • Ambiguous Grammars • Overview of Top Down Parsing
Syntax Analysis • aka Parsing • Grouping together tokens into larger structures • Analogous to lexical analysis • Input: • Tokens • Output of Lexical Analyzer • Output: • Structured representation of original program
Parsing Fundamentals • Source program: • 3 + 4 • After Lexical Analysis: ???
Parsing • Expression number plus number • Similar to regular definitions: • Concatenation • Choice expression number Operator number Operator + | - | * | / • Repetition is done differently
BNF Grammar Expression number Operator number Operator + | - | * | / Structure on the left is defined to consist of the choices on the right hand side Meta-symbols: | Different conventions for writing BNF Grammars: <expression> ::= number <operator> number Expression number Operator number
Derivations • Derivation: • Sequence of replacements of structure names by choices on the RHS of grammar rules • Begins: structure name • End: string of token symbols • Each step one replacement is made Exp Exp Op Exp | number Op + | - | * | /
Example Derivation Example: number * number + number Note the different arrows: Derivation applies grammar rules Used to define grammar rules Non-terminals: Exp, Op Terminals: number, * Terminals: because they terminate the derivation
Derivations (2) • E ( E ) ??????? • E ( E ) | a • What sentences does this grammar generate An example derivation: • Note that this is what we couldn’t achieve with regular definitions • See pg 96 in textbook
Recursive Grammars • E ( E ) | a • is recursive • E ( E ) is the general case • E a is the terminating case • We have no * operator in context free grammars • Repetition = recursion • E E | • derives , , , , …. • All strings beginning with followed by zero or more repetitions of • *
Recursive Grammars (2) • a+ (regular expression) • E E a | a (1) • Or • E a E | a (2) • 2 different grammars can derive the same language (1) is left recursive (2) is right recursive • a* • Implies we need the empty production • E E a |
Recursive Grammars (3) • Require recursive data structures • trees • Parse Trees Exp Exp Op Exp | number Op + | - | * | / 1 exp 3 2 4 exp op exp number * number
Parse Trees & Derivations • Leafs = terminals • Interior nodes = non-terminals • If we replace the non-terminals right to left • The parse tree sequence is right to left • A rightmost derivation -> reverse post-order traversal • If we derive left to right: • A leftmost derivation • pre-order traversal • parse trees encode information about the derivation process
Abstract Syntax Trees Parse trees contain surplus information Parse Tree Abstract Syntax Tree + exp 3 4 exp op exp This is all the information we actually need Token sequence number + number 3 4
An exercise • Consider the grammar lexp number | (op lexp-seq) op + | - | * lexp-seq lexp-seq lexp | lexp • What are the terminals, nonterminals and start symbol • Find leftmost and rightmost derivations and parse trees for the following sentences • (+ 4) • (+ 4 (* 5 6 7))
Parsing token sequence: id + id * id E E + E | E * E | ( E ) | - E | id
Ambiguous Grammars • A grammar that generates a string with 2 distinct parse trees is called an ambiguous grammar • 2+3*4 = 2 + (3*4) = 14 • 2+3*4 = (2+3) * 4 = 20 • Our experience of maths says interpretation 1 is correct but the grammar does not express this: E E + E | E * E | ( E ) | - E | id
Removing Ambiguity • Two methods • 1. Disambiguating Rules • +ve leaves grammar unchanged • -ve grammar is not sole source of syntactic knowledge • 2. Rewrite the Grammar • Using knowledge of the meaning that we want to use later in the translation into object code to guide grammar alteration
Precedence • 2+3*4 • The * binds tighter to the 3 than the 2 E E addop E | term addop + | - term term * term | factor factor ( exp ) | number | id • Operators of equal precedence are grouped together at the same ‘level’ of the grammar ’precedence cascade’
Associativity • 45-10-5 ?30 or 40 Subtraction is left associative, left to right (=30) • E E addop E | termDoes not tell us how to split up 45-10-5 • E E addop term | termForces left associativity via left recursion • Precedence & associativity remove ambiguity of arithmetic expressions • Which is what our maths teachers took years telling us!
Ambiguous grammars Statement -> If-statement | other If-statement -> if(Exp) Statement | if (Exp) Statement else Statement Exp -> 0 | 1 Parse if (0) if (1) other else other
Removing ambiguity Statement -> Matched-stmt | Unmatched-stmt Matched-stmt -> if (Exp) Matched-stmt else Matched-stmt | other Unmatched-stmt -> if (Exp) Statement | if (Exp) Matched-stmt else Unmatched-stmt
Top Down Parsing Start parsing from the start symbol and end up with a match for the sentence we are parsing. Predictive parsing non-backtracking Parse a category of grammars which are LL(1) nonambiguous no left recursion
Top Down Parsing • Table Driven Predictive Parsing • Recursive Descent Predictive Parsing E -> TE’ E’ -> +TE’ | e T -> FT’ T’ -> *FT’ | e F -> (E) | id Note this grammar has no left recursion. Is unambiguous. Gives the correct precedence to arithmetic operators.
Predictive Parsing Program Table Driven Predictive Parsing id + id * id a + b $ Input X Output Y Z $ Stack Parsing Table
Table Driven Predictive Parsing Input Symbol Non Terminal ) + $ id ( * E->TE’ E E->TE’ E’->e E’ E’->e E’->+TE’ T T->FT’ T->FT’ T’->e T->*FT’ T’->e T’->e T’ F F->id F->(E)
Table Driven Predictive Parsing Parse id + id * id Leftmost derivation and parse tree using the grammar E -> TE’ E’ -> +TE’ | e T -> FT’ T’ -> *FT’ | e F -> (E) | id
Predictive Parsing Table • Now parse id + id * id using the parsing table
First and Follow Sets • First and Follow sets tell when it is appropriate to put the right hand side of some production on the stack. (i.e. for which input symbols) E -> TE’ E’ -> +TE’ | e T -> FT’ T’ -> *FT | e F -> (E) | id id + id * id
First Sets • If X is a terminal, then FIRST(X) is {X} • IF X -> e is a production, then add e to FIRST(X) • IF X is a nonterminal and X -> Y1Y2…Yk is a production, then place a in FIRST(X) if for some i, a is in FIRST(Yi), and e is in all of First(Y1), …First(Yi-1). If e is in FIRST(Yj) for all j = 1, 2, …k, then add e to FIRST(X).
FIRST sets E -> TE’ E’ -> +TE’ | e T -> FT’ T’ -> *FT | e F -> (E) | id