90 likes | 211 Views
What is a compiler? A program that reads a program written in one language (source language) and translates it into an equivalent program in another language (target language).
E N D
What is a compiler? • A program that reads a program written in one language (source language) and translates it into an equivalent program in another language (target language). • Traditionally, the source language is a high level language and the target language is a low level language (machine code). Target program Source program compiler Error message
Why do we need to learn how a compiler works? • It is very unlikely for anyone to write another C/C++/java compiler. • But … • Parsing a small language occurs quite often: scripts languages such as python, parsing a set of commands for a mail server, etc. • For systems people, the compiler knowledge is the basic requirement. • A new processor is fast because it runs code generated by a compiler fast. • If you implement a new operating system, you need to know how compiler interacts with the OS. An OS is fast because it runs code generated by a compiler fast. • New architectures keep coming: new compiler optimizations are always needed.
Knowledge required to write a good compiler: • Formal language (lexical analysis and parsing) • Programming language (activation record, scope, type checking, etc) • Algorithm (optimization, register allocation, etc) • Graph theory (optimization, data flow analysis) • Computer architecture (target language, machine dependent optimizations) • Software engineering (large amount of code)
Source program with macros A typical compilation process Preprocessor Source program Compiler Target assembly program Try gcc with –v, -E, -S flags On linprog. assembler Relocatable machine code Loader/link-editor Absolute machine code
Compiler Phases: Source program Front End Lexical analyzer Syntax analyzer Symbol table manager Error handler Semantic analyzer Intermediate code generator Code optimizer Backend Code generator
Compiler phases: • source program is a stream of characters: pos = init+rate*60 • lexical analysis: groups characters into non-separatable units, called token, and generates token stream: Id1 = id2 + id3 * const • the information about the identifiers must be stored somewhere (symbol table). • Syntax analysis: checks whether the token stream meets the grammatical specification of the language and generates the syntax tree. • Semantic analysis: checks whether the program has a meaning (e.g. if pos is a record and init and rate are integers then the assignment does not make sense). = Id1 + id2 * id3 60
Compiler phases: • intermediate code generation, intermediate code is something that is both close to the final machine code and easy to manipulate (for optimization). One example is the three-address code: dst = op1 op op2 • The three-address code for the assignment statement: temp1 = 60; temp2 = id3 + temp1; temp3 = id2 + temp2; id1 = temp3 • Code optimization: produces better/semantically equivalent code. temp1 = id3 * 60.0 id1 = id2 + temp1 • code generator: generates assembly MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1
The use of intermediate code can also reduce the complexity of compilers: • m front ends and n backends share a common intermediate code. 1 1 2 2 IC m n Backends Front Ends
Compiler construction tools: • scanner generator (lex) • parser generator (yacc) • syntax-directed translation engines • automatic code generators • data-flow engines