390 likes | 514 Views
Contents. Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing Symbol Tables Run-time Storage Organization Code Generation and Local Code Optimization Global Optimization.
E N D
Contents • Introduction • A Simple Compiler • Scanning – Theory and Practice • Grammars and Parsing • LL(1) Parsing • LR Parsing • Lex and yacc • Semantic Processing • Symbol Tables • Run-time Storage Organization • Code Generation and Local Code Optimization • Global Optimization
Outline • Context-Free Grammars • Errors in Context-Free Grammars • Transforming Extended BNF Grammars • Parsers and Recognizers • Grammar Analysis Algorithms
Context-Free Grammars: Concepts and Notation • A context-free grammar G = (Vt, Vn, S, P) • A finite terminal vocabularyVt • The token set produced by scanner • A finite set of nonterminal vacabularyVn • Intermediate symbols • A start symbolSVn that starts all derivations • Also called goal symbol • P, a finite set of productions (rewriting rules) of the form AX1X2Xm • AVn, Xi Vn∪ Vt, 1i m • A is a valid production
Context-Free Grammars: Concepts and Notation (Cont’d.) • Other notations • The vocabulary V of a CFG is the set of terminal and nonterminal symbols • V= Vn∪Vt • L(G), the set of strings derivable from S comprise • Context-free language of grammar G • Notational conventions • a, b, c, denote symbols in Vt • A, B, C, denote symbols in Vn • U, V, W, denote symbols in V • , , , denote strings in V* • u, v, w, denote strings in Vt*
Context-Free Grammars: Concepts and Notation (Cont’d.) • Derivation • One step derivation • If A, then A • One-step derivation • One or more steps derivation + • Zero or more steps derivation * • If S *, then is said to be sentential form of the CFG. • SF(G) is the set of sentential forms of grammar G. • L(G) = {x Vt*| S+ x} • L(G) = SF(G) ∩Vt*; that is, the language of G is simply those sentential forms of G that are terminal strings.
EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail G0 Context-Free Grammars: Concepts and Notation (Cont’d.) • Left-most derivation, a top-down parsers • lm, lm+, lm* • A sentential form produced via a leftmost derivation sequence is called a left sentential form. • E.g. of leftmost derivation of F(V+V) E lm Prefix(E) lm F(E) lm F(V Tail) lm F(V+E) lm F(V+V Tail) lm F(V+V)
EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail G0 Context-Free Grammars: Concepts and Notation (Cont’d.) • Right-most derivation (canonical derivation) • rm, rm+, rm* • Bottom-up parsers • A sentential form produced via a rightmost derivation sequence is called a right sentential form. • E.g. of rightmost derivation of F(V+V) E rm Prefix(E) rm Prefix(V Tail) rm Prefix(V+E) rm Prefix(V+V Tail) rm Prefix(V+V) rm F(V+V) Same # of steps, but different order
Context-Free Grammars: Concepts and Notation (Cont’d.) • A parse tree • Rooted by the start symbol • Its leaves are grammar symbols or
Context-Free Grammars: Concepts and Notation (Cont’d.) • Aphase of a sentential form is a sequence of symbols descended from a single nonterminal in the parse tree. • Simple or prime phrase • A simple phrase is a sequence of symbols directly derived form a nonterminal. • The handle of a sentential form is the left-most simple phrase.
Errors in Context-Free Grammars • CFGs are a definitional mechanism. They may have errors, just as programs may. • Flawed CFG • Useless nonterminals • Unreachable • Derive no terminal string SA|B Aa BBb Cc Nonterminal C cannot be reached form S Nonterminal B derives no terminal string Do exercise 7. S is the start symbol.
Errors in Context-Free Grammars (Cont’d.) • Ambiguous • Grammars that allow different parse trees for the same terminal string • It is impossible to decide whether a given CFG is ambiguous
Errors in Context-Free Grammars (Cont’d.) • It is impossible to decide whether a given CFG is ambiguous • For certain grammar classes, we can prove that constituent grammars are unambiguous • Wrong language • A general comparison algorithm applicable to all CFGs is known to be impossible
Transforming Extended BNF Grammars • Extended BNF BNF • Extended BNF allows • Square bracket [] • Optional list {}
Parsers and Recognizers • Recognizer • An algorithm that does Boolean-valued test • Is this input syntactically valid? • Parser • Answers more general questions • Is this input valid? • And, if it is, what is its structure (parse tree)?
Parsers and Recognizers (Cont’d.) • Two general approaches to parsing • Top-down parser • Expanding the parse tree (via predictions) in a depth-first manner • Preorder traversal of the parse tree • Predictive in nature • lm • LL parser, recursive descent
Parsers and Recognizers (Cont’d.) • Bottom-up parser • Beginning at its bottom (the leaves of the tree, which are terminal symbols) and determining the productions used to generate the leaves • Postorder traversal of the parse tree • rm • LR parser, shift-reduce parser
Parsers and Recognizers (Cont’d.) To parse begin SimpleStmt; SimpleStmt; end$
Parsers and Recognizers (Cont’d.) • Naming of parsing techniques • Top-down • LL • Bottom-up • LR The way to parse token sequence L: Leftmost R: Righmost
Grammar Analysis Algorithms • Goal of this section: • Discuss a number of important analysis algorithms for Grammars
Grammar Analysis Algorithms (Cont’d.) • The data structure of a grammar G
Grammar Analysis Algorithms (Cont’d.) • What nonterminals can derive ? A BCD BC B • An iterative marking algorithm
Grammar Analysis Algorithms (Cont’d.) • First() • The set of all the terminal symbols that can begin a sentential form derivable from • If is the right-hand side of a production, then First() contains terminal symbols that begin strings derivable from First()={aVt| * a} {if * then {} else }
根據定義, FIRST(X) 集合之計算可依下列三步驟而得: • 1. If XT, then FIRST(X) = {X}. • 2. If XN, X→, then add to FIRST(X). • 3. If XN, and X → Y1 Y2 . . . Yn, then add all non- elements of FIRST(Y1) to FIRST(X),if FIRST(Y1), then add all non- elements of FIRST(Y2) to FIRST(X), ..., if FIRST(Yn), then add to FIRST(X).
文法 G 定義如下: E TE’ E’ +TE’ | T FT’ T’ *FT’ | F (E) | id 則其 FIRST 求解如下: FIRST E ( id E’ + T ( id T’ * F ( id
Follow(A) • A is any nonterminal • Follow(A) is the set of terminals that may follow A in some sentential form Follow(A)={aVt|S* Aa } {if S + A then {} else }
根據定義, FOLLOW(X)集合之計算可依下列三步驟而得: • 1. Put $ into FOLLOW(S). • 2. For each A B, add all non- elements of FIRST() to FOLLOW(B). • 3. For each A B or A B, where FIRST(), add all of FOLLOW(A) to FOLLOW(B).
文法 G 定義如下: E TE’ E’ +TE’ | T FT’ T’ *FT’ | F (E) | id 則其 FIRST 求解如下: FIRST F ( id T’ * T ( id E’ + E ( id FOLLOW之求解: FOLLOW E $ ) E’ $ ) T + $ ) T’ + $ ) F * + $ )
Grammar Analysis Algorithms (Cont’d.) • Definition of C data structures and subroutines • first_set[X] • contains terminal symbols and • X is any single vocabulary symbol • follow_set[A] • contains terminal symbols and • A is a nonterminal symbol
It is a subroutine of fill_first_set()
EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail G0 The execution of fill_first_set() using grammar G0
EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail G0 The execution of fill_follow_set() using grammar G0 $,) ( $,)
More examples The execution of fill_first_set() S aSe S B B bBe B C C cCe C d The execution of fill_follow_set() $,e $,e $,e
More examples The execution of fill_first_set() S ABc A a A B b B The execution of fill_follow_set() $ b,c c