1 / 28

薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh/ 98 Spring

Compiler TH 234, DTH 103. 薛智文 cwhsueh@csie.ntu.edu.tw http://www.csie.ntu.edu.tw/~cwhsueh/ 98 Spring. Why Study Compilers?. Excellent software-engineering example --- theory meets practice. Essential software tool. Influences hardware design, e.g., RISC, VLIW.

arawn
Download Presentation

薛智文 cwhsueh@csie.ntu.tw csie.ntu.tw/~cwhsueh/ 98 Spring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler TH 234, DTH 103 薛智文 cwhsueh@csie.ntu.edu.tw http://www.csie.ntu.edu.tw/~cwhsueh/ 98 Spring

  2. /27

  3. Why Study Compilers? • Excellent software-engineering example --- theory meets practice. • Essential software tool. • Influences hardware design, e.g., RISC, VLIW. • Tools (mostly “optimization”) for enhancing software reliability and security. /27

  4. Compilers & Architecture • Modern architectures have very complex structures, especially opportunities for parallel execution. • Sequential programs can only make effective use of these features via an optimizing compiler. • Hardware question: If we implemented this, could a compiler use it? /27

  5. Software Reliability • Optimization technology (data-flow analysis) used in: • Lock/unlock errors. • Buffers not range-checked. • Memory Leaks. • SQL injection bugs.. /27

  6. What this Course Offers? • Compiler methodology for both compiler implementation and related applications. • Theoretical framework. • Key algorithms. • Hands-on experience. • Nongoal: build a complete optimizing compiler. /27

  7. Course Outline • Part 1 --- Introduction. • Part 2 --- Scanner. • Part 3 --- Parser. • Part 4 --- Syntax-Directed Translation. • Part 5 --- Symbol Table. • Part 6 --- Intermediate Code Generation. • Part 7 --- Run Time Storage Organization. • Part 8 --- Optimization. • Part 9 --- How to Write a Compiler. • Part 10 --- A Simple Code Generation (PSEUDO) Example. /27

  8. Introduction source program Compiler target program input output • Compiler is one of language processors. /27

  9. What is a Compiler? source program Compiler target program input output • Definitions: • The software system translatesdescription of computations into a program executable by a computer. • Source and target must be equivalent! • Compiler writing spans: • programming languages; • machine architecture; • language theory; • algorithms and data structures; • software engineering. • History: • 1950: the first FORTRAN compiler took 18 man-years; • now: using software tools, can be done in a few months as a student’s project. /27

  10. An Interpreter Interpreter source program output input /27

  11. A Hybrid Compiler source program Translator intermediate program Virtual Machine output input /27

  12. A Language-Processing System Preprocessor Compiler Assembler Linker/Loader modified source program target assembly program relocatable machine code target machine code source program library files relocatable object files /27

  13. Applications 3 + 5 – 6 * 6 3 5 + 6 6 * – • Computer language compilers. • Translator: from one format to another. • query interpreter • text formatter • silicon compiler • infix notation  postfix notation: • pretty printers • · · · • Software productivity tools. /27

  14. Relations with Computational Theory • Computational theory: • a set of grammar rules ≡ the definition of a particular machine. • also equivalent to a set of languages recognized by this machine. • a type of machines: a family of machines with a given set of operations, or capabilities; • power of a type of machines ≡ the set of languages that can be recognized by this type of machines. /27

  15. A Language-Processing System Preprocessor Compiler Assembler Linker/Loader modified source program target assembly program relocatable machine code target machine code source program library files relocatable object files /27

  16. Phases of a Compiler character stream Lexical Analyzer (scanner) tokenstream Syntax Analyzer (parser) abstract-syntax tree Semantic Analyzer annotated abstract-syntax tree Intermediate Code Generator intermediaterepresentation Machine-Independent Code Optimizer optimizedintermediaterepresentation Code Generator relocatablemachinecode Machine-Dependent Code Optimizer target-machinecode front-end Error Handling Symbol Table back-end /27

  17. Lexical Analyzer (Scanner) • Actions: • Reads characters from the source program; • Groups characters into lexemes , i.e., sequences of characters that “go together”, following a given pattern ; • Each lexeme corresponds to a token . • the scanner returns the next token, plus maybe some additional information, to the parser; • The scanner may also discover lexical errors, i.e., erroneous characters. • The definitions of what a lexeme, token or badcharacter is depend on the definition of the source language. /27

  18. Scanner Example for C Symbol Table • Lexeme: C statement position = initial + rate* 60; (Lexeme) position = initial + rate* 60 ; < id,1> <=> <id,2> <+> <id,3> <*> <60> <;> (Token) ID ASSIGN ID PLUS ID TIME INT SEMI-COL • Arbitrary number of blanks (white spaces) between lexemes. • Erroneous sequence of characters, that are not parts of comments, for the C language: • control characters • @ • 2abc /27

  19. Syntax Analyzer (Parser) • Actions: • Group tokens into grammatical phrases , to discover the underlying structure of the source • Find syntax errors , e.g., the following C source line: (Lexeme) index = 12 * ; (Token) ID ASSIGN INT TIMES SEMI-COL Every token is legal, but the sequence is erroneous! • May find some static semantic errors , e.g., use of undeclared variables or multiple declared variables. • May generate code, or build some intermediate representation of the source program, such as an abstract-syntax tree. /27

  20. Parser Example for C = + < id,1> * <id,2> <id,3> 60 • Source code: position = initial + rate* 60 < id,1> <=> <id,2> <+> <id,3> <*> <60> • Abstract-syntax tree: • interior nodes of the tree are OPERATORS; • a node’s children are its OPERANDS; • each subtree forms a logical unit . • the subtree with * at its root shows that * has higher precedence than +, the operation “rate * 60” must be performed as a unit, not “initial + rate”. • Where is ”;”? Symbol Table /27

  21. Semantic Analyzer = + < id,1> = * <id,2> + < id,1> <id,3> 60 * <id,2> <id,3> int_to_float 60 • Actions: • Check for more static semantic errors, e.g., type errors . • May annotate and/or change the abstract syntax tree. Symbol Table /27

  22. Intermediate Code Generator = + < id,1> * <id,2> <id,3> int_to_float 60 • Actions: translate from abstract-syntax trees to intermediate codes. • One choice for intermediate code is 3-address code : • Each statement contains • at most 3 operands; • in addition to “=”, i.e., assignment, at most one operator. • An ”easy” and “universal” format that can be translated into most assembly languages. t1 = int_to_float(60) t2 = id3* t1 t3 = id2 + t2 id1 = t3 /27

  23. Optimizer • Improve the efficiency of intermediate code. • Goal may be to make code run faster , and/or to use the least number of registers · · · • Current trends: • to obtain smaller, but maybe slower, equivalent code for embedded systems; • to reduce power consumption. t1 = int_to_float(60) t2 = id3* t1 t3 = id2 + t2 id1 = t3 t1 = id3* 60.0 id1 = id2 + t1 /27

  24. Code Generation • A compiler may generate • pure machine codes (machine dependent assembly language) directly, which is rare now ; • virtual machine code. • Example: • PASCAL  compiler P-code interpreter execution • Speed is roughly 4 times slower than running directly generated machine codes. • Advantages: • simplify the job of a compiler; • decrease the size of the generated code: 1/3 for P-code ; • can be run easily on a variety of platforms • P-machine is an ideal general machine whose interpreter can be written easily; • divide and conquer; • recent example: JAVA and Byte-code. /27

  25. Code Generation Example LDF R2, id3 MULF R2, R2, #60.0 LDF R1, id2 ADDF R1, R1, R2 STF id1, R1 t1 = id3* 60.0 id1 = id2 + t1 /27

  26. Practical Considerations (1/2) • Preprocessing phase: • macro substitution: • #define MAXC 10 • rational preprocessing: add new features for old languages. • BASIC • C  C ++ • compiler directives: • #include <stdio.h> • non-standard language extensions. • adding parallel primitives /27

  27. Practical Considerations (2/2) • Passes of compiling • A pass reads an input file and writes an output file. • First pass reads the text file once. • May need to read the text one more time for any forward addressed objects, i.e., anything that is used before its declaration. • Example: C language goto error_handling; · · · error_handling: · · · /27

  28. Reduce Number of Passes • Each pass takes I/O time. • Back-patching : leave a blank slot for missing information, and fill in the empty slot when the information becomes available. • Example: C language when a label is used • if it is not defined before, save a trace into the to-be-processed table • label name corresponds to LABEL TABLE[i] • code generated: GOTO LABEL TABLE[i] when a label is defined • check known labels for redefined labels • if it is not used before, save a trace into the to-be-processed table • if it is used before, then find its trace and fill the current address into the trace • Time and space trade-off ! /27

More Related