330 likes | 342 Views
System Programming and administration. Lecture 4 Compiler : Overview of compilation process. Outline. Compiler Functions of compiler Compilation process Phases of compilation Incremental compiler. Compiler.
E N D
System Programming and administration Lecture 4 Compiler : Overview of compilation process
Outline • Compiler • Functions of compiler • Compilation process • Phases of compilation • Incremental compiler
Compiler • A compiler is a computer program(or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). Source Target code code Errors Compiler
Why ? • The most common reason for wanting to transform source code is to create an executable program.
Basic functions • Scanning: • Scan the character string of a program, analyze them (according to rules, called lexical rules), then figure out each token. Also called lexical analysis. The part of a compiler for this function is called scanner. • Parsing: • Pass through the sequence of tokens, parse them (according to rules, called grammars), then figure out each statement, also called syntactic analysis. The part of a compiler for this function is called parser. • (Object) code generation: • Each statement has its meaning (semantics), for each parsed statement, generate its code according to its meaning. Also called semantic analysis. The part of a compiler for this function is called code-generator.
The Analysis-Synthesis Model of Compilation • There are two parts to compilation: • Analysis determines the operations implied by the source program which are recorded in a tree structure • Synthesis takes the tree structure and translates the operations therein into the target program
compiler parts A compiler consists of three main parts: • frontend, • middle-end, • backend.
Scanner Parser source tokens IR Frontend • Frontend checks whether the program is correctly written in terms of the programming language syntax and semantics. • legal and illegal programs are recognized. • Errors are reported, if any, in a useful way, Type checking is also performed. • Frontend generates IR (intermediate representation) for the middle-end. • Split into two parts • Scanner: Responsible for converting character stream to token stream • Also strips out white space, comments • Parser: Reads token stream; generates IR
Middle-end • Middle-end is where the optimizations for performance take place. • Typical transformations for optimization are • Removal of useless or unreachable code, • Discovering and propagating constant values • Relocation of computation to a less frequently executed place • Middle-end generates IR for the following backend. • Most optimization efforts are focused on this part.
Backend • Backend is responsible for translation of IR into the target assembly code. • The target instruction(s) are chosen for each IR instruction. • Variables are also selected for the registers. • Backend utilizes the hardware by figuring out how to keep parallel FUs busy, filling delay slots, and so on.
Analysis consists of 3 phases • Linear Analysis • Hierarchical Analysis • Semantic Analysis
cross compiler • A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is run. • Cross compiler tools are used to generate executables for embedded system or multiple platforms. • It is used to compile for a platform upon which it is not feasible to do the compiling, like microcontrollers that don't support an operating system.
incremental compiler • The term incremental compiler may refer to two different types of compiler. • Imperative programming • Interactive Programming
incremental compiler • In imperative programming and software development, an incremental compiler is one that when invoked, takes only the changes of a known set of source files and updates any corresponding output files (in the compiler's target language, often bytecode) that may already exist from previous compilations. • By effectively building upon previously compiled output files, the incremental compiler avoids the wasteful recompilation entire source files, where most of the code remains unchanged. • For most incremental compilers, compiling a program with small changes to its source code is usually near instantaneous. • It can be said that an incremental compiler reduces the granularity of a language's traditional compilation units while maintaining the language's semantics, such that the compiler can append and replace smaller parts.
incremental compiler • In the interactive programming paradigm, and particularly in Prolog related literature, an incremental compiler refers to a compiler that is actually a part of the runtime system of the source language. • The compiler can be invoked at runtime on some source code or data structure managed by the program, which then produces a new compiled program fragment that is then immediately available for use by the runtime system. • This scheme allows for a degree of self-modifying code and requires metaprogramming language features. • The ability to add, remove and delete code while running is known as hot swapping.
Source Program 5 1 2 6 Code Optimizer Lexical Analyzer Code Generator Syntax Analyzer 3 Semantic Analyzer Error Handler Symbol-table Manager 4 Intermediate Code Generator Target Program Phases of a Compiler
Phases of the Compilation Process • Lexical analysis (scanning): the source text is broken into tokens. • Syntactic analysis (parsing): tokens are combined to form syntactic structures, typically represented by a parse tree. The parser may be replaced by a syntax-directed editor, which directly generates a parse tree as a product of editing. • Semantic analysis: intermediate code is generated for each syntactic structure. Type checking is performed in this phase. Complicated features such as generic declarations and operator overloading (as in Ada and C++) are also processed. • Machine-independent optimization: intermediate code is optimized to improve efficiency. • Code generation: intermediate code is translated to relocatable object code for the target machine. • Machine-dependent optimization: the machine code is optimized.
All are tokens Lexical Analysis • Function • Easiest Analysis, Scanning the program to be compiled and recognizing and Identify tokens that make up the source statements. which are the basic building blocks Position := initial + rate * 60 ; Blanks, Line breaks, etc. are scanned out
Scanner Example • Input text // this statement does very little if (x >= y) y = 42; • Token Stream • Note: tokens are atomic items, not character strings IF LPAREN ID(x) GEQ ID(y) ID(y) BECOMES INT(42) SCOLON RPAREN
Parser Example • Token Stream Input • Abstract Syntax Tree IF LPAREN ID(x) ifStmt RPAREN GEQ ID(y) >= assign ID(y) BECOMES INT(42) SCOLON ID(x) ID(y) ID(y) INT(42)
Examples of Token • Tokens: A sequence of characters to be treated as a single unit. • Tokens can be keywords, operators, identifiers, integers, floating-point numbers, character strings, etc. • Each token is usually represented by some fixed-length code, such as an integer, rather than as a variable-length character string • Token type, Token specifier (value) • Examples of tokens. – Reserved words (e.g. begin, end, struct, if etc.) – Keywords (integer, true etc.) – Operators (+, &&, ++ etc) – Identifiers (variable names, procedure names, parameter names) – Literal constants (numeric, string, character constantsetc.) – Punctuation marks (:, , etc.)
Syntactic Analysis • The source statements written by programmers are recognized as language constructs described by the grammar. • Building the parse tree for the statements being translated. • Bottom-up and top-down techniques. • Bottom-up: building the leave of the tree first which match the statements, and then combining into higher-level nodes until the root is reached. • Top-down: beginning from the root, i.e., the rule of the grammar specifying the goal of the analysis, and constructing the tree so that the leave match the statements being analyzed.
Code generation • Generate object code in the form of machine code directly or assembly language. • A basic technique: • Associate each rule (or an alternative rule) of the grammar with a routine, which translates the construct into object code according to its meaning/semantics. • Called semantic routine or code-generation routine. • Possibly generate an intermediate form so that optimization can be done to get more efficient code. • Data structures needed: • A list, or a queue, first-in-first-out, also a LISTCOUNT variable • A stack, first-in-last-out. • S(token): specifier of a token, i.e., a pointer to the symbol table or the integer value. • LOCCTR: location counter, indicating the next available address.
Example • Example program: read A read B sum := A + B write sum write sum / 2
Lexical Analysis • Tokens: id = letter ( letter | digit ) * [ except "read" and "write" ] literal = digit digit * ":=", "+", "-", "*", "/", "(", ")“ $$$ [end of file]
Syntax Analysis • Grammar in EBNF <pgm> -> <statement list> $$$ <stmt list> -> <stmt list> <stmt> | E <stmt> -> id := <expr> | read <id> | write <expr> <expr> -> <term> | <expr> <add op> <term> <term> -> <factor | <term> <mult op> <factor <factor> -> ( <expr> ) | id | literal <add op> -> + | - <mult op> -> * | /
Code Generation • Intermediate code: read pop A read pop B push A push B add pop sum push sum write push sum push 2 div write
Code Generation • Target code: .data A: .long 0 B: .long 0 sum: .long 0 .text main: jsr read movl d0,d1 movl d1,A jsr read movl d0,d1 movl d1,B movl A,d1
Code Generation movl B,d2 addl d1,d2 movl d1,sum movl sum,d1 movl d1,d0 jsr write movl sum,d1 movl #2,d2 divsl d1,d2 movl d1,d0 jsr write
Any questions?????????? • What are the phases of compilation and explain them? • What is incremental compiler?