400 likes | 601 Views
Compilers and Language Translation. Gordon College. What’s a compiler?. All computers only understand machine language Therefore, high-level language instructions must be translated into machine language prior to execution. This is a program. 10000010010110100100101……. What’s a compiler?.
E N D
Compilers and Language Translation Gordon College
What’s a compiler? • All computers only understand machine language • Therefore, high-level language instructions must be translated into machine language prior to execution This is a program 10000010010110100100101……
What’s a compiler? • CompilerA piece of system software that translates high-level languages into machine language program.c while (c!='x') { if (c == 'a' || c == 'e' || c == 'i') printf("Congrats!"); else if (c!='x') printf("You Loser!"); } Congrats! prog Compiler 10000010010110100100101…… gcc -o prog program.c
Assembler (a kind of compiler) LOAD X Assembly (opcode table) (symbol table) 0101 0000 0000 1001 Machine Language One-to-one translation
Compiler (high-level language translator) a = b + c - d; 0101 00001110001 LOAD B 0111 00001110010 ADD C 0110 00001110011 SUBTRACT D 0100 00001110100 STORE A 0101 00001110001 0111 00001110010……. One-to-many translation
Goals of a compiler • Code produced must be correct A = (B+C)-(D+E); Possible translation: LOAD B ADD C STORE B LOAD D ADD E STORE D LOAD B SUBTRACT D STORE A Is this correct? No - STORE B and STORE D changes the values of variables B and D which is the high-level language does not intend
Goals of a compiler • Code produced should be reasonably efficient and concise Compute the sum - 2x1+ 2x2+ 2x3+ 2x4+…. 2x50000 sum = 0.0 for(i=0;i<50000;i++) { sum = sum + (2.0 * x[i]); Optimizing compiler: sum = 0.0 for(i=0;i<50000;i++) { sum = sum + x[i]; sum = sum * 2.0; 49,999 less instructions
The Compilation Process • Phase I: Lexical analysis • Compiler examines the individual characters in the source program and groups them into syntactical units called tokens • Phase II: Parsing • The sequence of tokens formed by the scanner is checked to see whether it is syntactically correct Scanner Source code Groups of tokens Parser correct Groups of tokens not correct
The Compilation Process • Phase III: Semantic analysis and code generation • The compiler analyzes the meaning of the high-level language statement and generates the machine language instructions to carry out these actions Code Generator Groups of tokens Machine language
The Compilation Process • Phase IV: Code optimization • The compiler takes the generated code and sees whether it can be made more efficient Code Optimizer Machine language Machine language
The Compilation Process • Source program • Original high-level language program • Object program • Machine language translation of the source program
Phase I: Lexical Analysis • Lexical analyzer • The program that performs lexical analysis • More commonly called a scanner • Job of lexical analyzer • Group input characters into tokens • Tokens: Syntactical units that are treated as single, indivisible entities for the purposes of translation • Classify tokens according to their type
Phase I: Lexical Analysis Program statement sum = sum + a[i]; Digital perspective: tab,s,u,m,blank,=,blank,s,u,m,blank,+,blank,a,[,i,],; Tokenized: sum,=,sum,+,a[i],;
Phase I: Lexical Analysis Typical Token Classifications TOKEN TYPE CLASSIFICATION NUMBER Symbol 1 Number 2 = 3 + 4 - 5 ; 6 == 7 If 8 Else 9 ( 10 ) 11 [ 12 ] 13 …
Phase I: Lexical Analysis • Lexical Analysis Process1. Discard blanks, tabs, etc. - look for beginning of token.2. Put characters together 3. Repeat step 2 until end of token4. Classify and save token5. Repeat steps 1-4 until end of statement6. Repeat steps 1-5 until end of source code sum 1 = 3 + 4 a 1 [ 12 i 1 ] 13 ; 6 Scanner sum=sum+a[i];
Phase I: Lexical Analysis • Input to a scanner- A high-level language statement from the source program • Scanner’s output- A list of all the tokens in that statement- The classification number of each token found sum 1 = 3 + 4 a 1 [ 12 i 1 ] 13 ; 6 Scanner sum=sum+a[i];
Phase II: Parsing • Parsing phase • A compiler determines whether the tokens recognized by the scanner are a syntactically legal statement • Performed by a parser
Phase II: Parsing • Output of a parser • A parse tree, if such a tree exists • An error message, if a parse tree cannot be constructed • Successful construction of a parse tree is proof that the statement is correctly formed
Example • High-level language statement: a = b + c
Grammars, Languages, and BNF • Syntax • The grammatical structure of the language • The parser must be given the syntax of the language • BNF (Backus-Naur Form)Most widely used notation for representing the syntax of a programming language literal_expression ::= integer_literal | float_literal | string | character
Grammars, Languages, and BNF • In BNF • The syntax of a language is specified as a set of rules (also called productions) • A grammar • The entire collection of rules for a language • Structure of an individual BNF rule left-hand side ::= “definition”
Grammars, Languages, and BNF • BNF rules use two types of objects on the right-hand side of a production • Terminals • The actual tokens of the language • Never appear on the left-hand side of a BNF rule • Nonterminals • Intermediate grammatical categories used to help explain and organize the language • Must appear on the left-hand side of one or more rules
Grammars, Languages, and BNF • Goal symbol • The highest-level nonterminal • The nonterminal object that the parser is trying to produce as it builds the parse tree • All nonterminals are written inside angle brackets Java BNF
BNF Example <postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> "." <street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL> <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL> <opt-jr-part> ::= "Sr." | "Jr." | <roman-numeral> | "" Identify the following: Goal symbol, terminals, nonterminals, a individual rule Is this a legal postal address? Steve Moses Sr. 215 Rose Ave. Everywhere, NC 43563
Parsing Concepts and Techniques • Fundamental rule of parsing: • By repeated applications of the rules of the grammar- If the parser can convert the sequence of input tokens into the goal symbol the sequence of tokens is a syntactically valid statement of the languageelse the sequence of tokens is not a syntactically valid statement of the language
Is the following http address legal: http://www.csm.astate.edu/~rossa/cs3543/bnf.html Parsing Example <httpaddress> ::= http:// <hostport> [ / <path> ] [ ? <search> ] <hostport> ::= <host> [ : <port> ] <host> ::= <hostname> | <hostnumber> <hostname> ::= <ialpha> [ . <hostname> ] <hostnumber> ::= <digits> . <digits> . <digits> . <digits> <port> ::= <digits> <path> ::= <void> | <xpalphas> [ / <path> ] <search> ::= <xalphas> [ + <search> ] <xalpha> ::= <alpha> | <digit> | <safe> | <extra> | <escape> <xalphas> ::= <xalpha> [ <xalphas> ] <xpalpha> ::= <xalpha> | + <xpalphas> ::= <xpalpha> [ <xpalpha> ] <ialpha> ::= <alpha> [ <xalphas> ] <alpha> ::= a | b | … | z | A | B | … | Z <digit> ::= 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <safe> ::= $ | - | _ | @ | . | & | ~ <extra> ::= ! | * | " | ' | ( | ) | : | ; | , | <space> <escape> ::= % <hex> <hex> <hex> ::= <digit> | a | b | c | d | e | f | A | B | C | D | E | F <digits> ::= <digit> [ <digits> ] <void> ::=
Parsing Concepts and Techniques • Look-ahead parsing algorithms - intelligent parsers • One of the biggest problems in building a compiler is designing a grammar that: • Includes every valid statement that we want to be in the language • Excludes every invalid statement that we do not want to be in the language
Parsing Concepts and Techniques • Another problem in constructing a compiler: Designing a grammar that is not ambiguous • An ambiguous grammar allows the construction of two or more distinct parse trees for the same statement NOT GOOD - multiple interpretations
Phase III: Semantics and Code Generation • Semantic analysis • The compiler makes a first pass over the parse tree to determine whether all branches of the tree are semantically valid • If they are valid the compiler can generate machine language instructionselse there is a semantic error; machine language instructions are not generated
Phase III: Semantics and Code Generation • Semantic analysis • Syntactically correct, but semantically incorrectexample: sum = a + b; int a;double sum; data type mismatchchar b; Semantic records a integer sum double b char
Phase III: Semantics and Code Generation • Semantic analysis Parse tree b char a integer <expression> + <expression> Semantic record Semantic record <expression> temp ? Semantic record
Phase III: Semantics and Code Generation • Semantic analysis Parse tree b integer a integer <expression> + <expression> Semantic record Semantic record <expression> temp integer Semantic record
Phase III: Semantics and Code Generation • Code generation • Compiler makes a second pass over the parse tree to produce the translated code
Phase IV: Code Optimization • Two types of optimization • Local • Global • Local optimization • The compiler looks at a very small block of instructions and tries to determine how it can improve the efficiency of this local code block • Relatively easy; included as part of most compilers:
Phase IV: Code Optimization • Examples of possible local optimizations • Constant evaluation x = 1 + 1 ---> x = 2 • Strength reduction x = x * 2 ---> x = x + x • Eliminating unnecessary operations
Phase IV: Code Optimization • Global optimization • The compiler looks at large segments of the program to decide how to improve performance • Much more difficult; usually omitted from all but the most sophisticated and expensive production-level “optimizing compilers” • Optimization cannot make an inefficient algorithm efficient - “only makes an efficient algorithm more efficient”
Summary • A compiler is a piece of system software that translates high-level languages into machine language • Goals of a compiler: Correctness and the production of efficient and concise code • Source program: High-level language program
Summary • Object program: The machine language translation of the source program • Phases of the compilation process • Phase I: Lexical analysis • Phase II: Parsing • Phase III: Semantic analysis and code generation • Phase IV: Code optimization