1 / 34

Concordia University Department of Computer Science

Concordia University Department of Computer Science. COMP 442/6421 Compiler Design. Course Description. Instructor Name: Dr. Joey Paquet Office: EV-3-221 Phone: 7831 e-mail: paquet@cse.concordia.ca Web: www.cse.concordia.ca/~paquet. Course Description. Topic

lonnieb
Download Presentation

Concordia University Department of Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concordia UniversityDepartment of Computer Science COMP 442/6421 Compiler Design Joey Paquet, 2000, 2002, 2007, 2008

  2. Course Description • Instructor • Name: Dr. Joey Paquet • Office: EV-3-221 • Phone: 7831 • e-mail: paquet@cse.concordia.ca • Web: www.cse.concordia.ca/~paquet Joey Paquet, 2000, 2002, 2007, 2008

  3. Course Description • Topic • Compiler organization and implementation. • Lexical, syntax and semantic analysis. Code generation. • Outline • Design and implementation of a simple compiler. • Lectures related to the project. Joey Paquet, 2000, 2002, 2007, 2008

  4. Course Description • Grading • Assignments (4) : 40% • Final Examination : 30% • Final Project : 30% • Late assignment penalty: 50% per working day • Assignments and project are graded on: Correctness, Completeness, Design, Style, Documentation. Joey Paquet, 2000, 2002, 2007, 2008

  5. Project Description • Design and coding of a simple compiler • Individual work • Divided in four assignments • Final project is graded at the end of the semester, during a final demonstration • Testing is VERY important and up to you Joey Paquet, 2000, 2002, 2007, 2008

  6. Project Description • A complete compiler is a fairly complex and large program: from 10,000 to 1,000,000 lines of code. • Programming one will force you to go over your limits. • It uses most of the elements of the theoretical foundations of Computer Science. • It will probably be the most complex program you have ever written. Joey Paquet, 2000, 2002, 2007, 2008

  7. Introduction to Compilation • A compiler is a translation system. • It translates programs written in a high level language into a lower level language, generally machine (binary) language. source code target code compiler Source language Translator Target language Joey Paquet, 2000, 2002, 2007, 2008

  8. Introduction to Compilation • The only language that the processor understands is binary. a: Register addition (from a symbol table) b: First operand (R1) c: Second operand (R3) d: Third operand (R15) 000100000100111111 c a b d Joey Paquet, 2000, 2002, 2007, 2008

  9. Introduction to Compilation • Assembly language is the first higher level programming language. • 000100000100111111 <=> Add R1,R3,R15 • There is a one-to-one correspondence between lines of code and the machine code lines. • A op-code table is sufficient to translate assembly language into machine code. Joey Paquet, 2000, 2002, 2007, 2008

  10. Introduction to Compilation • Compared to binary, it greatly improved the productivity of programmers. Why? • Though a great improvement, it is not ideal: • Not easy to write • Even less easy to read and understand • Extremely architecture-dependent Joey Paquet, 2000, 2002, 2007, 2008

  11. X=Y+Z; L 3,Y Load working register with Y A 3,Z Add Z to working register ST 3,X Store the result in X 00001001001011 00010010010101 00100100101001 Introduction to Compilation • A compiler translates a given high-level language into assembler or machine code. Joey Paquet, 2000, 2002, 2007, 2008

  12. FORTRAN: The first compiler • The problems with assembly led to the development of the first compiler: FORTRAN. • Stands for FORmula TRANslation. • Developed between 1954 and 1957 at IBM by a team led by John Backus. • This was an incredible feat, as the theory of compilation was not available at the time. Joey Paquet, 2000, 2002, 2007, 2008

  13. Paving down the road • In parallel to that, Noam Chomsky was investigating on the structure of natural languages. • His studies led the way to the classification of languages according to their complexity (aka the Chomsky hierarchy). • This was used by various theoreticians in the 1960s and early 1970s to design a fairly complete set of solutions to the parsing problem. • These solutions have been used ever since. • As the parsing solutions became well understood, efforts were devoted to the development of parser generators. • The most commonly known is YACC (Yet Another Compiler Compiler). • Developed by Steve Johnson in 1975 for the Unix system. Joey Paquet, 2000, 2002, 2007, 2008

  14. Compilation vs. Interpretation • A compiler translates high-level instructions into machine code. An interpreter uses the computer to execute the program directly, statement by statement. • Advantage: immediate response • Drawbacks: inefficient with loops, restricted to single-file programs. Joey Paquet, 2000, 2002, 2007, 2008

  15. Compiler’s Environment • Building an executable from multiple files source code object code executable code compiler linker run-time libraries compiled modules Joey Paquet, 2000, 2002, 2007, 2008

  16. lexical analysis syntactic analysis semantic analysis low-level optimization target code generation high-level optimization Phases of a Compiler annotated tree syntax tree source code token stream front-end back-end intermediate code optimized target code target code Joey Paquet, 2000, 2002, 2007, 2008

  17. Lexical analysis • Transforms the initial stream of characters into a stream of tokens • keywords : while, to, do, int, main • identifiers : i, max, total, i1, i2 • literals : 123, 12.34, “Hello” • operators : +, *, and, >, < • punctuation : {, }, [, ], ; Joey Paquet, 2000, 2002, 2007, 2008

  18. S id ; E = Distance = rate * time; E E * id id Syntactic analysis • Attempts to build a valid parse tree from the grammatical description of the language. Joey Paquet, 2000, 2002, 2007, 2008

  19. Semantic Analysis • The semantics of a program is its meaning. • It is possible to have syntactically valid program that does not have any meaning. • Semantic analysis has two parts: • Semantic checking: Validating the semantics of a syntactically valid program and gathering information about the meaning of its constitents (attributes). • Semantic translation: Giving a meaning to a program using a pre-established language, typically a syntax tree decorated with attributes. This is often called an intermediate representation. Joey Paquet, 2000, 2002, 2007, 2008

  20. t1 = a*y; t2 = t1+z; x = t2; x = a*y+z; Semantic Translation: example • Breaks the statements into small pieces corresponding roughly to machine instructions. Joey Paquet, 2000, 2002, 2007, 2008

  21. t1 = a*y; t2 = t1+z; x = t2; t1 = a*y; x = t1+z; High-Level Optimization • The generated intermediate representation is often inefficient because of bad structure or redundancy. • This kind of optimization is not bound to the target machine’s architecture. Joey Paquet, 2000, 2002, 2007, 2008

  22. LE 4,a a in register 4 ME 4,y multiply by y AE 4,z add z STE 4,x store register 4 in x t1 = a*y; x = t1+z; Target Code Generation • Translates the optimized intermediate representation into the target code (normally machine language or assembler). Joey Paquet, 2000, 2002, 2007, 2008

  23. Passes, Front End and Back End • A pass consists in reading a high-level version of the program and writing a new lower-level version. • Several passes are often needed: • To resolve forward references • To limit the memory used by the different phases. Joey Paquet, 2000, 2002, 2007, 2008

  24. Low-Level Optimization • The generated target code is analyzed for inefficiencies such as dead code or code redundancy. • Care is taken to exploit as much as possible the CPU’s capabilities. • This phase is heavily architecture dependent. • Lots of research is still done in this very complex area. Joey Paquet, 2000, 2002, 2007, 2008

  25. Passes, Front End and Back End • The front-end is composed of: Lexical, Syntactic, Semantic analysis and High-level optimization. • In most compilers, most of the front-end is driven by the Syntactic analyzer. • It calls the Lexical analyzer for tokens and generates an abstract syntax tree when syntactic elements are recognized. • The generated tree (or other intermediate representation) is then analyzed and optimized in a separate process. • It has little or no concern with the target machine. Joey Paquet, 2000, 2002, 2007, 2008

  26. Passes, Front End and Back End • The back-end is composed of: Code generation and low-level optimization. • Uses the intermediate representation generated by the front-end to generate target machine code. • Heavily dependent on the target machine. • Independent on the programming language compiled. Joey Paquet, 2000, 2002, 2007, 2008

  27. System Support • Symbol table • Central repository of identifiers (variable or function names) used in the compiled program. • Contains information such as the data type or value in the case of constants. • Used to identify undeclared or multiply declared identifiers, as well as type mismatches. • Provides temporary variables for intermediate code generation. Joey Paquet, 2000, 2002, 2007, 2008

  28. System Support • Error handling procedures • Implement the compiler’s response to errors in the code it is compiling. • Provides useful insight to the user about where is the error and what it is. • Should find all errors in the whole program. • Can attempt to correct some errors and only give a warning. Joey Paquet, 2000, 2002, 2007, 2008

  29. System Support • Run-time system • Some programming languages concepts raise the need for dynamic memory allocation. What are they? • The running program must then be able to manage its own memory use. • Some will require a stack, others a heap. These are managed by the run-time system. Joey Paquet, 2000, 2002, 2007, 2008

  30. Writing of Early Compilers • The first C compiler executable C compiler (minimal) minimal C compiler source assembler executable C compiler (full) full C compiler source C compiler (minimal) Joey Paquet, 2000, 2002, 2007, 2008

  31. Writing Cross-Compilers • A Unix-MacIntosh C cross compiler Mac C compiler source code in Unix C Unix C compiler Mac C complier usable on Unix Mac C compiler source code in Unix C Mac C complier usable on Unix Mac C complier usable on Mac Joey Paquet, 2000, 2002, 2007, 2008

  32. Writing Retargetable Compilers • Two methods: • Make a strict distinction between front-end and back-end, then use different back-ends. • Generate code for a virtual machine, then build a compiler or interpreter to translate virtual machine code to a specific machine code. That is what we do in the project. Joey Paquet, 2000, 2002, 2007, 2008

  33. Summary • The first compiler was the assembler, a one-to-one direct translator. • Complex compilers were written incrementally, first using assemblers. • All compilation techniques are well known since the 60’s and early 70’s. Joey Paquet, 2000, 2002, 2007, 2008

  34. Summary • The compilation process is divided into phases. • The input of a phase is the output of the previous phase. • It can be seen as a pipeline, where the phases are filters that successively transform the input program into an executable. Joey Paquet, 2000, 2002, 2007, 2008

More Related