650 likes | 803 Views
Lecture 3: Parsing. CS 540 George Mason University. Static Analysis - Parsing. Syntatic/semantic structure. Syntatic structure. tokens. Scanner (lexical analysis). Parser (syntax analysis). Semantic Analysis (IC generator). Code Generator. Source language. Target
E N D
Lecture 3: Parsing CS 540 George Mason University
Static Analysis - Parsing Syntatic/semantic structure Syntatic structure tokens Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Source language Target language Code Optimizer • Syntax described formally • Tokens organized into syntax tree that describes structure • Error checking Symbol Table CS 540 Spring 2009 GMU
Static Analysis - Parsing We can use context free grammars to specify the syntax of programming languages. CS 540 Spring 2009 GMU
Context Free Grammars Definition: A context free grammar is a formal model that consists of: • A set of tokens or terminal symbols Vt • A set of nonterminal symbols Vn • Start symbol S (in Vn) • Finite set of productions of form: A R1…Rn (n >= 0) where A in Vn, R in Vn Vt CS 540 Spring 2009 GMU
CFG Examples Indicates a production Vt = {+,-,0..9}, Vn = {L,D}, s = {L} L L + D | L – D | D D 0 | … | 9 Vt={(,)}, Vn = {L}, s = {L} L ( L ) L L e Shorthand for multiple productions CS 540 Spring 2009 GMU
Languages CS 540 Spring 2009 GMU
Any regular language can be expressed using a CFG Starting with a NFA: • For each state Si in the NFA • Create non-terminal Ai • If transition (Si,a) = Sk, create production Ai a Ak • If transition (Si,e) = Sk, create production Ai Ak • If Si is a final state, create production Ai e • If Si is the NFA start state, s = Ai • What does the existence of this algorithm tell us about the relationship between regular and context free languages? CS 540 Spring 2009 GMU
NFA to CFG Example ab*a a a 1 2 3 b A1 a A2 A2 b A2 A2 a A3 A3 e CS 540 Spring 2009 GMU
Writing Grammars When writing a grammar (or RE) for some language, the following must be true: • All strings generated are in the language. • Your grammar produces all strings in the language. CS 540 Spring 2009 GMU
Try these: • Integers divisible by 2 • Legal postfix expressions • Floating point numbers with no extra zeros • Strings of 0,1 where there are more 0 than 1 CS 540 Spring 2009 GMU
Parsing • The task of parsing is figuring out what the parse tree looks like for a given input and language. • If a string is in the given language, a parse tree must exist. • However, just because a parse tree exists for some string in a given language doesn’t mean a given algorithm can find it. CS 540 Spring 2009 GMU
Parse Trees The parse tree for some string in a language that is defined by the grammar G as follows: • The root is the start symbol of G • The leaves are terminals or e. When visited from left to right, the leaves form the input string • The interior nodes are non-terminals of G • For every non-terminal A in the tree with children B1 … Bk, there is some production A B1… Bk CS 540 Spring 2009 GMU
Parse Tree for (())() L ( L ) L L e L L L ) ( ) ( L ) L ( L L e e e e CS 540 Spring 2009 GMU
Parse Tree for (())() L ( L ) L L e L L L ) ( ) ( L ) L ( L L e e e e CS 540 Spring 2009 GMU
Parsing • Parsing algorithms are based on the idea of derivations. • General Algorithms: • LL (top down) • LR (bottom up) CS 540 Spring 2009 GMU
Single Step Derivation Definition: Given a A b (with a,b in (Vn Vt)*) and a production A g, a A b a gb is a single step derivation. Examples: L + D L – D + D L L - D ( L ) ( L ) ( ( L ) L ) ( L ) L (L) L • Greek letters (a,b,c,…) • denote a (possibly empty) • sequence of terminals and • non-terminals. CS 540 Spring 2009 GMU
Derivations Definition: A sequence of the form: w0 w1 … wn is a derivation of wn from w0(w0* wn) L production L ( L ) L ( L ) L production L e ( ) L production L e ( ) L * () If wi has non-terminal symbols, it is referred to as sentential form. CS 540 Spring 2009 GMU
L * (( )) ( ) CS 540 Spring 2009 GMU
L(G), the language generated by grammar G is {w in Vt*: s * w, for start symbol s} • We’ve just shown that both () and (())() are in L(G) for the previous grammar. CS 540 Spring 2009 GMU
Leftmost Derivations • A derivation where the leftmost nonterminal is always chosen • If a string is in a given language (i.e. a derivation exists), then a leftmost derivation must exist • Rightmost derivation defined as you would expect CS 540 Spring 2009 GMU
Leftmost Derivation for (())() CS 540 Spring 2009 GMU
Rightmost Derivation for (())() CS 540 Spring 2009 GMU
Ambiguity Two (or more) parse trees or leftmost derivations for some string in the language E E + E E E – E E 0 | … | 9 E E E + E E - E 4 2 E + E E - E 2 – 3 + 4 3 4 2 3 CS 540 Spring 2009 GMU
Two leftmost derivations EE + E E E – E E – E + E 2 – E 2 – E + E 2 – E + E 2 – 3 + E 2 – 3 + E 2 – 3 + 4 2 – 3 + 4 CS 540 Spring 2009 GMU
An ambiguous grammar can sometimes be made unambiguous: E E + T | E – T | T T 0 | … | 9 • Precedence can be specified as well: E E + T | E – T | T T T * F | T / F | F F ( E ) | 0 | …| 9 enforces the correct associativity CS 540 Spring 2009 GMU
Input: begin simplestmt; simplestmt; end P P begin SS end SS S ; SS SS e S simplestmt S begin SS end SS SS SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Top Down (LL) Parsing P P begin SS end SS S ; SS SS e S simplestmt S begin SS end SS begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Top Down (LL) Parsing P P begin SS end SS S ; SS SS e S simplestmt S begin SS end SS SS S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Top Down (LL) Parsing P P begin SS end SS S ; SS SS e S simplestmt S begin SS end SS SS S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Top Down (LL) Parsing P P begin SS end SS S ; SS SS e S simplestmt S begin SS end SS SS SS S S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Top Down (LL) Parsing P P begin SS end SS S ; SS SS e S simplestmt S begin SS end SS SS SS S S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Top Down (LL) Parsing 1 P P begin SS end SS S ; SS SS e S simplestmt S begin SS end 2 SS 4 SS 3 SS 6 S 5 S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Bottomup (LR) Parsing P begin SS end SS S ; SS SS e S simplestmt S begin SS end S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Bottomup (LR) Parsing P begin SS end SS S ; SS SS e S simplestmt S begin SS end S S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Bottomup (LR) Parsing P begin SS end SS S ; SS SS e S simplestmt S begin SS end SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Bottomup (LR) Parsing P begin SS end SS S ; SS SS e S simplestmt S begin SS end SS SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Bottomup (LR) Parsing P begin SS end SS S ; SS SS e S simplestmt S begin SS end SS SS SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Bottomup (LR) Parsing 6 P P begin SS end SS S ; SS SS e S simplestmt S begin SS end 5 SS 4 SS 1 SS 3 S 2 S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU
Parsing • General Algorithms: • LL (top down) • LR (bottom up) • Both algorithms are driven by the input grammar and the input to be parsed • Two important sets: FIRST and FOLLOW CS 540 Spring 2009 GMU
FIRST Sets FIRST(a) is the set of all terminal symbols that can begin some sentential form in a derivation that starts with a a … a b • FIRST(a) = {a in Vt | a * ab } { e } if a*e • Example: S simple | begin S end FIRST(S) = {simple, begin} CS 540 Spring 2009 GMU
To compute FIRST across strings of terminals and non-terminals: FIRST(e) = { e } FIRST(Aa) = A if A is a terminal FIRST(A) FIRST(a) if A *e FIRST(A) otherwise CS 540 Spring 2009 GMU
Computing FIRST sets Initially FIRST(A) (for all A) is empty • For productions A c b, where c in Vt Add { c } to FIRST(A) • For productions A e Add { e } to FIRST(A) • For productions A a B b, where a*e and NOT (B *e) Add FIRST(aB) to FIRST(A) • For productions A a, where a*e Add FIRST(a) and { e } to FIRST(A) same thing as e FIRST(B) CS 540 Spring 2009 GMU
S a S e S B B b B e B C C c C e C d FIRST(C) = {c,d} FIRST(B) = FIRST(S) = Example 1 Start with the ‘simplest’ non-terminal CS 540 Spring 2009 GMU
S a S e S B B b B e B C C c C e C d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = Example 1 Now that we know FIRST(C) … CS 540 Spring 2009 GMU
S a S e S B B b B e B C C c C e C d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = {a,b,c,d} Example 1 CS 540 Spring 2009 GMU
P i | c | n T S Q P | a S | d S c S T R b |e S e | R n | e T R S q FIRST(P) = {i,c,n} FIRST(Q) = FIRST(R) = {b,e} FIRST(S) = FIRST(T) = Example 2 CS 540 Spring 2009 GMU
P i | c | n T S Q P | a S | d S c S T R b | e S e | R n | e T R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = FIRST(T) = Example 2 CS 540 Spring 2009 GMU
P i | c | n T S Q P | a S | d S c S T R b | e S e | R n | e T R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = {e,b,n,e} FIRST(T) = Example 2 Note: S R n n because R * e CS 540 Spring 2009 GMU
P i | c | n T S Q P | a S | d S c S T R b | e S e | R n | e T R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = {e,b,n,e} FIRST(T) = {b,c,n,q} Example 2 Note: T R S q S q q because both R and S * e CS 540 Spring 2009 GMU
S a S e | S T S T R S e | Q R r S r | e Q S T | e FIRST(S) = FIRST(R) = FIRST(T) = FIRST(Q) = Example 3 CS 540 Spring 2009 GMU