590 likes | 880 Views
Compiler Construction Principles & Implementation Techniques. Dr. Ying JIN Associate Professor Oct. 2007. Role of Parsing in a Compiler. Target Code Generation. Lexical Analysis scanning. Intermediate Code Optimization. Syntax Analysis Parsing. Intermediate Code Generation.
E N D
Compiler Construction Principles & Implementation Techniques Dr. Ying JIN Associate Professor Oct. 2007
Role of Parsing in a Compiler Target Code Generation Lexical Analysis scanning Intermediate Code Optimization Syntax Analysis Parsing Intermediate Code Generation Semantic Analysis analysis/front end synthesis/back end
What will be introduced • About Parsing • General information about a Parser (parsing) • Functional requirement (input, output, main function) • Process • General techniques in developing a scanner • How to define the syntax of a programming language? • Why not regular expressions? (not enough) • Context free grammar (上下文无关文法) • How to implement a parser with respect to its definition of syntax? • Top-down Parsing • Bottom-up Parsing
Knowledge Relation Graph Top-down using Syntax definition Context Free Grammar implement basing on Bottom-up Develop a Parser
§ 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)
3.1 The Parsing Process • Function of a Parser • Input : Token / TokenList • Output: internal representation of syntactic structure • Process • Read tokens • Establish syntactic structure – parse tree/syntax tree, according to the syntax definition (context free grammar); • Check syntactical errors
Syntactic Structures • Rules for describing the structure of a well-defined program • (1) Program • (2) Declaration • - Constant declaration • - Type declaration • - Variable Declaration • - Procedure/function declaration • (3) Body • (4) Statements (assignment, conditional, loop, function call) • (5) Expressions (arithmetic, logical, boolean)
Syntax Errors • Different Types of Syntax Errors • Following token for a syntactic structure is wrong (后继单词错) • Identifier/constant error (标识符或者常量错) • Keyword error(关键字错) • Start token for a syntactic structure is wrong(开始单词错) • Unbalanced parentheses(括号配对错)
Syntax Errors 后继单词错 int GetMax( int x ; int y) { if (x>y) {return x } else return y; } } vod main() { real 10, y; 10 + ; GetMax( x, y) ; } 括号配对错 关键字错,开始单词错 标识符错 开始单词错
Dealing with Errors • Once an error is detected, how to deal with it? • Quit immediately, not practical • Error recovery • Error repair • Error correction • There is no “perfect” way to do it!
Different Types of Parsing Methods • Universal parsing methods for any grammars • Cocke-Younger-Kasami algorithm • Earley’s algorithm • Top-down parsing methods (limited grammars)- predictive • Recursive descendent parsing (递归下降法) • LL(k) -- k=1 • Bottom-up parsing methods (limited grammars) – shift-reduce • SLR(k) • LR(k) • LALR(k) • Operator-precedence parsing (简单优先关系法) inefficient k=1
§ 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)
3.2 Context-free Grammars • What is a Grammar? • Chomsky classification of Grammars; • Context Free Grammar (some concepts)
What is a Grammar? • To define syntactic structure? • A grammar G is a quadruple (VT,VN,S,P) • VT is a finite set of terminal symbols(有限的终极符集合) • VN是is a finite set of non-terminal symbols(有限的非终极符集合) • S is start symbol,S VN • P is a set of production rules(产生式的集合), each production rule has following form: ,where , (VTVN )*
文法的分类 • O型文法: 也称为短语文法,其产生式具有形式: →,其中,(VTVN)*,并且至少含一个非终极符; • 1型文法: 也称为上下文有关文法。它是0型文法的特例,即要求|| || (S→除外,但S不得出现于产生式右部) • 2型文法: 也称为上下文无关文法。它是1型文法的特例,即要求产生式左部是一个非终极符: A→ • 3型文法: 也称为正则文法。它是2型文法的特例,即其产生式的右部至多有两个符号,而且具有下面形式之一: A →a ,A →a B, 其中A,BVN ,aVT
文法的分类 • 描述能力 O型文法 > 1型文法> 2型文法> 3型文法 • 对应自动机 • O型文法:图灵机 • 1型文法:线性有界自动机 • 2型文法:下推自动机 • 3型文法:有限自动机
Context Free Grammar (CFG) • 定义为四元组(VT,VN,S,P) • VT是有限的终极符集合 • VN是有限的非终极符集合 • S是开始符,S VN • P是产生式的集合,且具有下面的形式: AX1X2…Xn 其中AVN,Xi(VTVN) ,右部可空。
Example VT = {i, n, +, *, (, )} P: E T E E + T T F T T * F F (E) F i F n VN = {E, T, F } S = E
Derivation (直接推导): if there is a production A, we can have A , where represents one step derivation (用 A →一步推导). We can say that is derived from A; • 的含义是,使用一条规则,代替左边的某个符号,产生右端的符号串
+ : represents one or more steps derivation (通过一步或多步可推导出 ) • * : 表示 通过0步或多步可推导出
Example P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n • (E) (E+T) • E + (i+n) • E+E+T * E+i+F
G = (VT,VN, S, P) • 句型:if S * ,则称符号串为G的句型。我们用SF(G)表示文法G的所有句型的集合; • Sentence(句子):只包含终极符的句型被称为G的句子 • Language(语言): L( G ) = { u | S + u , u VT* } the set of all sentences of G;
Example P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n • 句型: E, T, E+T, F, T*F, i, n, (E), ……. • 句子: i, n, (i), (n), i+i, i+n, …… • 语言:{i, n, (i), (n), i+i, i+n , ……}
Leftmost(rightmost) derivation 最左(右)推导: 如果进行推导时选择的是句型中的最左(右)非终极符,则称这种推导为最左(右)推导,并用符号lm(r m)表示最左(右)推导。 • 左(右)句型: 用最左推导方式导出的句型,称为左句型,而用最右推导方式导出的句型,称为右句型(规范句型)。 • conclusion: each sentence has its rightmost or leftmost derivation(但对句型此结论不成立)
Example P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n • 最左推导: i+T*n +F lm i + F*n + F • 最右推导: i+T*n +Flm i + T*n +i • 左句型: i + i*F • 右句型: E+(i) • 特例: i + (T*i)
Some Notes • CFG (VT,VN, S, P) will be used to define syntactic structure of a programming language; • Normally VT will be set of tokens of the programming language; • one terminal symbol might be one token, one token type, or one symbol representing certain structure; • Non-terminal symbols act as intermediate representation of certain structure; • Productions are rules on how to derive syntactic structure;
Example(1) • Arithmetic Expressions P: Exp Exp + Exp Exp Exp – Exp Exp Exp * Exp Exp Exp / Exp Exp (Exp) Exp id Exp num VT = {id, num, (, ), +, -, *, /} VN = {Exp} S = Exp
Example(2) • Program VT = {VarDec, TypeDec, ConstDec, MainFun, FunDec } VN = {Program, Dec, Decs} S = Program P: Program Decs MainFun Decs Dec Dec VarDec Dec ConstDec Dec FunDec Dec TypeDec Decs Dec Decs Dec Decs
Example(3) • Variable Declaration VT = {Type, id, ; , , } VN = {VarDec, VarDef, VarDefList, idList} S = VarDec P: VarDec VarDec VarDef ; VarDec VarDef; VarDec P: 变量声明 变量声明 一个变量定义; 变量声明 一个变量定义 ; 变量声明 变量定义 类型 标识符序列 标识符序列 一个标识符 标识符序列 一个标识符 , 标识符序列 VarDef Type idList idList id idList id, idList
Extended BNF (扩展巴克斯范式) • Extended Notations • Optional | A | | A A A • Repetition * or {} A A | (left recursive) A * A A | (right recursive) A *
Some algorithms on Grammar Transformation • 文法等价变化: L(G1) = L(G2) • 增补文法: the start symbol does not appear in the right part of any productions; (定理2.1) • Z S
Some algorithms on Grammar Transformation • 文法G中除了S以外的空产生式(A)可以消除; (定理2.2) • 找出所有可以推导出的非终极符,记为S; • 对于所有产生式 • A X1 X2 …… Xi-1XiXi+1……Xn • Xi S • 增加A X1 X2 …… Xi-1Xi+1……Xn • 直到没有新的产生式产生 • 删除对应空产生式, 除了S
Example SA a BB A BB Sa BB S | A a BB A B B | a B |b S a BB A B S a B S A a B SA a B SA a {S, B, A} Sa B S aB S a S a S Aa
Some algorithms on Grammar Transformation • 消除无用产生式(定理2.3) • 无用产生式: A , A不出现在任何句型中; • How to find such non-terminal symbols? • 生成会出现在句型中的非终极符集合SS • SS = {S} • Si SS, • Si 1 ,……, Si n • 把1 ,……, n中的所有非终极符加入SS中; • 重复直到没有新的加入; • 不属于SS的非终极符,它们的产生式是无用产生式,删除掉;
Example S | A a BB A B B | a B |b C c {S} {S, A, B} {S, A, B} C c是无用产生式
Some algorithms on Grammar Transformation • 消除特型产生式: (定理2.4) • 特型产生式: A B • Algorithm • 对每个非终极符A,构造 SA = {B| A+B, BVN} • 如果C SA,而且C 不是特型产生式,则增加 A ; • 删除特型产生式; • 删除无用产生式;
Example S: {A,B} S a S b • S | A • A B | a • B b A: {B} A b • S | a | b • A a | b • B b
Some algorithms on Grammar Transformation • 消除公共前缀(left factoring) • 公共前缀 • A 1 | … | n| 1|…| m • 提取公因子 • A A’| 1|…| m • A’ 1 | … | n
Some algorithms on Grammar Transformation • 消除左递归(left recursion) • 直接左递归:A A1 | … | A n| 1|…| m • 消除方法: A 1A’|…| mA’ A 1A’ | … | nA’|
Example P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n E T E’ E’ +T E’ | E E + T | T = + T 1 = T
Some algorithms on Grammar Transformation • 消除左递归(left recursion) • 间接左递归: • 消除方法: • Pre-conditions • Algorithm S A b A S a | b 1:S 2:A A Aba | b A bA’ A’ baA’ |
§ 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)
3.3 Parse Trees & Abstract Syntax Tree • Problem: Derivation(推导) is a way to construct a sentence from the start symbol; • Many derivation for the same sentence; • Not uniquely represent the structure of the sentence • Parse tree: one way to represent the structure of a sentence;
Example sentence : i + i * n P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n Several derivations for it Parse Tree
Parse Tree • A labeled tree for a CFG • The root must be labeled with the start symbol; • Each node has a symbol associated with it; • Each leaf must be labeled with a terminal symbol ; • For each node which is associated with a non-terminal symbol A, has n sons, from left to right they are associated with symbols B1, …, Bn, then there must be a production A B1 … Bn
Abstract Syntax Tree • Problem with Parse tree • Includes much more than necessary nodes • Abstract Syntax Tree • contains only those nodes necessary for compilation sentence : i + i * n
§ 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)
3.4 Ambiguous Grammar • 二义性文法 • For a Grammar G, if there exists one sentence which has more than one parse tree, G is called ambiguous Grammar;
Example P: Exp Exp + Exp Exp Exp – Exp Exp Exp * Exp Exp Exp / Exp Exp (Exp) Exp id Exp num id + id + id
§ 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)