510 likes | 797 Views
Chapter 9. Compilers and Language Translation. The Compilation Process. Phase I: Lexical analysis Phase II: Parsing Phase III: Semantics and code generation Phase IV: Code Optimization. Introduction. High-level languages are more difficult to “ translate ” than assembly languages.
E N D
Chapter 9 Compilers and Language Translation
The Compilation Process • Phase I: Lexical analysis • Phase II: Parsing • Phase III: Semantics and code generation • Phase IV: Code Optimization
Introduction • High-level languages are more difficult to “translate” than assembly languages. • Assembly language and machine language are related 1-to-1. • The relationship between a high-level language and machine language is 1-to-many.
Compiler • The piece of software that translates high-level programming language codes into machine language codes. • Two distinct goals of compiler: • Correctness • Efficient and conciseExample: 2x0+2x1+…+2x50000
The Compilation Process Scanner Parser Code Generator Optimizer Objectfile
Lexical Analysis • The compiler examines the individual characters in the source program and groups them into syntactical units, called tokens, that will be analyzed in succeeding stages. • Analogous to grouping letters into words prior to analyzing text.
Parsing • During this stage the sequence of tokens formed by the scanner is checked to see whether it is syntactically correct according to the rules of the programming language. • Equivalent to checking whether the words in the text form grammatically correct sentences.
Semantic Analysis and Code Generation • If the high-level language statement is structurally correct, then the compiler analyzes its meaning and generates the proper sequence of machine language instructions to carry out these actions.
Code Optimization • The compiler takes the generated code and see whether it can be made more efficient, either by making it run faster, or having it occupy less memory.
Phase I: Lexical Analysis • Scanner, or lexical analyzer, groups input characters into tokens. • Example:a = b + 319 - delta; • The scanner discards nonessential characters, such as blanks and tabs, and the group the remaining characters into high-level syntactic symbols such as symbols, numbers, and operators.
Token Classifications • Token type Classification number symbol 1 number 2 • Others: =(3),+(4),-(5),;(6); ==(7), if(8), else (9), ( 10, ) 11
Phase II: Parsing • During the parsing phase, a compiler determines whether the tokens recognized by the scanner fit together in a grammatically meaningful way. • Analogous to the operation of “diagramming a sentence”.
Example • To prove the sequence of words: The man bit the dog is a correctly formed sentence.
Another Example The man bit the
Programming Language Example • Statement: a = b + c
Parse Tree • The structure shown in the previous example is called a parse tree. • It starts from the individual tokens a,=,b,+,c and show how these tokens can be grouped together into predefined grammatical categories such as <symbol>, <addition operator> and <expression> until the desired goal is reached. (in this case, <assignment statement>)
Grammars, Languages and BNF • How does a parser know how to construct the parse tree? • The parser must be given a formal description of the syntax, the grammatical structure, of the language that it is going to analyze. • Most widely used notation for representing the syntax of programming language is called BNF, an acronym for Backus-Naur form.
BNF • The syntax of a language is specified as a set of rules, also called productions. • The entire collection of rules is called a grammar. • BRN rule:left-hand side::=“definition”
BNF Example • <assignment statement>::=<symbol>=<expression> • The rule says that the syntactical construct called <assignment statement> is defined as a <symbol> followed by the token = followed by the syntactical construct called <expression>
Terminal/Nonterminals • BNF uses two types of objects on the right hand side of a productions: • Terminals: actual tokens of the language recognized and returned by a scanner. • Nonterminals: an intermediate grammatical category used to help explain and organize the language.
Goal Symbol • The goal symbol is the highest-level nonterminal. • When goal symbol has been produced, the parser has finished building the tree, and the statements have been successfully parsed. • The collection of all statements that can be successfully parsed is called the language defined by a grammar.
Meta-symbols • Meta-symbol: used to describe the characteristics of another language. • BNF has five meta-symbols: < > ::= | :OR, Ex:<digit>:=0|1|2|3|4|5|6|7|8|9 L : null string Ex:<signed integer>:= <sign><number> <sign>:= +|-|L
Fundamental Rule of Parsing • If, by repeated applications of the rules of the grammar, a parser can convert the sequence of input tokens into the goal symbol, then that sequence of tokens is a syntactically valid statement of the language.
Example • A three-rule grammar • <sentence>::=<noun><verb> • <noun>::= bees|dogs • <verb>::=buzz|bite • Example 1: Dogs bite. • Example 2: Bees dogs.
Another Example • Grammar for a simplified assignment statement • <assignment statement>::=<variable>=<expression> • <expression>::=<variable>|<variable>+<variable> • <variable>::= x|y|z
How to parse? • The process of parser is a complex sequence of applying rules, building grammatical constructs, seeing whether things are moving toward the correct answer (the goal symbol). If not, “undo” the rule just applied and try another. • Look-ahead parsing algorithm: “looking down the road” a few tokens to see what would happen if a certain choice were made.
Example Not possible to build a parse tree with the grammar.
Major Challenge • Design a grammar that: • Includes every valid statement that we want to be in the language • Excludes every invalid statement that we do not want to be in the language
Assignment Statement (2nd try) • <assignment statement>::=<variable>=<expression> • <expression>::=<variable>|<expression>+<expression> (recursive definition) • <variable>::= x|y|z
Validity vs. Ambiguity • It is possible to construct two parse trees of x=x+y+z using the 2nd grammar. Two different meanings. • X=(x+y)+z x=x+(y+z)
Phase III: Semantics and Code Generation • <sentence>::=<noun><verb> • <noun>::= bees|dogs • <verb>::=buzz|bite • Possible combinations: • Dogs bite. • Dogs bark. • Bees bite. • Bees bark. • Not all combinations make sense.
Semantics and Code Generation • A compiler examines the semantics of a programming language statement. It analyzes the meaning of the tokens and tries to understand the actions they perform. • If the statement is meaningless, it is semantically rejected. Otherwise it is translated into machine language.
Example • The statementsum=a+b; is syntactically correct. • But what if the variables are defined as follows: char a; double b; int sum;
Semantic Records • Each nonterminal symbol is associated with a semantic record, a data structure that stores information about a nonterminal, such as the actual name of the object and its data type.
Semantic Records (II) • Grows gradually.
Two-Stage Process • Semantic analysis: a pass over the parse tree to determine whether all branches of the tree are semantically valid. • Code generation: the compiler makes a 2nd pass over the parse tree to produce the translated code.
Code Optimization • To make the code more efficient: • Local optimization • Global optimization • Different from programmer optimization with compiler tools such as: • Visual development environments • On-line debuggers • Reusable code libraries
Local Optimization • Look at a very small block of instructions and try to improve it. • Possible approaches • Constant evaluation: x=1+1; • Strength reduction: x=x*2; • Eliminating unnecessary operations