340 likes | 364 Views
Concordia University Department of Computer Science. COMP 442/6421 Compiler Design. Course Description. Instructor Name: Dr. Joey Paquet Office: EV-3-221 Phone: 7831 e-mail: paquet@cse.concordia.ca Web: www.cse.concordia.ca/~paquet. Course Description. Topic
E N D
Concordia UniversityDepartment of Computer Science COMP 442/6421 Compiler Design Joey Paquet, 2000, 2002, 2007, 2008
Course Description • Instructor • Name: Dr. Joey Paquet • Office: EV-3-221 • Phone: 7831 • e-mail: paquet@cse.concordia.ca • Web: www.cse.concordia.ca/~paquet Joey Paquet, 2000, 2002, 2007, 2008
Course Description • Topic • Compiler organization and implementation. • Lexical, syntax and semantic analysis. Code generation. • Outline • Design and implementation of a simple compiler. • Lectures related to the project. Joey Paquet, 2000, 2002, 2007, 2008
Course Description • Grading • Assignments (4) : 40% • Final Examination : 30% • Final Project : 30% • Late assignment penalty: 50% per working day • Assignments and project are graded on: Correctness, Completeness, Design, Style, Documentation. Joey Paquet, 2000, 2002, 2007, 2008
Project Description • Design and coding of a simple compiler • Individual work • Divided in four assignments • Final project is graded at the end of the semester, during a final demonstration • Testing is VERY important and up to you Joey Paquet, 2000, 2002, 2007, 2008
Project Description • A complete compiler is a fairly complex and large program: from 10,000 to 1,000,000 lines of code. • Programming one will force you to go over your limits. • It uses most of the elements of the theoretical foundations of Computer Science. • It will probably be the most complex program you have ever written. Joey Paquet, 2000, 2002, 2007, 2008
Introduction to Compilation • A compiler is a translation system. • It translates programs written in a high level language into a lower level language, generally machine (binary) language. source code target code compiler Source language Translator Target language Joey Paquet, 2000, 2002, 2007, 2008
Introduction to Compilation • The only language that the processor understands is binary. a: Register addition (from a symbol table) b: First operand (R1) c: Second operand (R3) d: Third operand (R15) 000100000100111111 c a b d Joey Paquet, 2000, 2002, 2007, 2008
Introduction to Compilation • Assembly language is the first higher level programming language. • 000100000100111111 <=> Add R1,R3,R15 • There is a one-to-one correspondence between lines of code and the machine code lines. • A op-code table is sufficient to translate assembly language into machine code. Joey Paquet, 2000, 2002, 2007, 2008
Introduction to Compilation • Compared to binary, it greatly improved the productivity of programmers. Why? • Though a great improvement, it is not ideal: • Not easy to write • Even less easy to read and understand • Extremely architecture-dependent Joey Paquet, 2000, 2002, 2007, 2008
X=Y+Z; L 3,Y Load working register with Y A 3,Z Add Z to working register ST 3,X Store the result in X 00001001001011 00010010010101 00100100101001 Introduction to Compilation • A compiler translates a given high-level language into assembler or machine code. Joey Paquet, 2000, 2002, 2007, 2008
FORTRAN: The first compiler • The problems with assembly led to the development of the first compiler: FORTRAN. • Stands for FORmula TRANslation. • Developed between 1954 and 1957 at IBM by a team led by John Backus. • This was an incredible feat, as the theory of compilation was not available at the time. Joey Paquet, 2000, 2002, 2007, 2008
Paving down the road • In parallel to that, Noam Chomsky was investigating on the structure of natural languages. • His studies led the way to the classification of languages according to their complexity (aka the Chomsky hierarchy). • This was used by various theoreticians in the 1960s and early 1970s to design a fairly complete set of solutions to the parsing problem. • These solutions have been used ever since. • As the parsing solutions became well understood, efforts were devoted to the development of parser generators. • The most commonly known is YACC (Yet Another Compiler Compiler). • Developed by Steve Johnson in 1975 for the Unix system. Joey Paquet, 2000, 2002, 2007, 2008
Compilation vs. Interpretation • A compiler translates high-level instructions into machine code. An interpreter uses the computer to execute the program directly, statement by statement. • Advantage: immediate response • Drawbacks: inefficient with loops, restricted to single-file programs. Joey Paquet, 2000, 2002, 2007, 2008
Compiler’s Environment • Building an executable from multiple files source code object code executable code compiler linker run-time libraries compiled modules Joey Paquet, 2000, 2002, 2007, 2008
lexical analysis syntactic analysis semantic analysis low-level optimization target code generation high-level optimization Phases of a Compiler annotated tree syntax tree source code token stream front-end back-end intermediate code optimized target code target code Joey Paquet, 2000, 2002, 2007, 2008
Lexical analysis • Transforms the initial stream of characters into a stream of tokens • keywords : while, to, do, int, main • identifiers : i, max, total, i1, i2 • literals : 123, 12.34, “Hello” • operators : +, *, and, >, < • punctuation : {, }, [, ], ; Joey Paquet, 2000, 2002, 2007, 2008
S id ; E = Distance = rate * time; E E * id id Syntactic analysis • Attempts to build a valid parse tree from the grammatical description of the language. Joey Paquet, 2000, 2002, 2007, 2008
Semantic Analysis • The semantics of a program is its meaning. • It is possible to have syntactically valid program that does not have any meaning. • Semantic analysis has two parts: • Semantic checking: Validating the semantics of a syntactically valid program and gathering information about the meaning of its constitents (attributes). • Semantic translation: Giving a meaning to a program using a pre-established language, typically a syntax tree decorated with attributes. This is often called an intermediate representation. Joey Paquet, 2000, 2002, 2007, 2008
t1 = a*y; t2 = t1+z; x = t2; x = a*y+z; Semantic Translation: example • Breaks the statements into small pieces corresponding roughly to machine instructions. Joey Paquet, 2000, 2002, 2007, 2008
t1 = a*y; t2 = t1+z; x = t2; t1 = a*y; x = t1+z; High-Level Optimization • The generated intermediate representation is often inefficient because of bad structure or redundancy. • This kind of optimization is not bound to the target machine’s architecture. Joey Paquet, 2000, 2002, 2007, 2008
LE 4,a a in register 4 ME 4,y multiply by y AE 4,z add z STE 4,x store register 4 in x t1 = a*y; x = t1+z; Target Code Generation • Translates the optimized intermediate representation into the target code (normally machine language or assembler). Joey Paquet, 2000, 2002, 2007, 2008
Passes, Front End and Back End • A pass consists in reading a high-level version of the program and writing a new lower-level version. • Several passes are often needed: • To resolve forward references • To limit the memory used by the different phases. Joey Paquet, 2000, 2002, 2007, 2008
Low-Level Optimization • The generated target code is analyzed for inefficiencies such as dead code or code redundancy. • Care is taken to exploit as much as possible the CPU’s capabilities. • This phase is heavily architecture dependent. • Lots of research is still done in this very complex area. Joey Paquet, 2000, 2002, 2007, 2008
Passes, Front End and Back End • The front-end is composed of: Lexical, Syntactic, Semantic analysis and High-level optimization. • In most compilers, most of the front-end is driven by the Syntactic analyzer. • It calls the Lexical analyzer for tokens and generates an abstract syntax tree when syntactic elements are recognized. • The generated tree (or other intermediate representation) is then analyzed and optimized in a separate process. • It has little or no concern with the target machine. Joey Paquet, 2000, 2002, 2007, 2008
Passes, Front End and Back End • The back-end is composed of: Code generation and low-level optimization. • Uses the intermediate representation generated by the front-end to generate target machine code. • Heavily dependent on the target machine. • Independent on the programming language compiled. Joey Paquet, 2000, 2002, 2007, 2008
System Support • Symbol table • Central repository of identifiers (variable or function names) used in the compiled program. • Contains information such as the data type or value in the case of constants. • Used to identify undeclared or multiply declared identifiers, as well as type mismatches. • Provides temporary variables for intermediate code generation. Joey Paquet, 2000, 2002, 2007, 2008
System Support • Error handling procedures • Implement the compiler’s response to errors in the code it is compiling. • Provides useful insight to the user about where is the error and what it is. • Should find all errors in the whole program. • Can attempt to correct some errors and only give a warning. Joey Paquet, 2000, 2002, 2007, 2008
System Support • Run-time system • Some programming languages concepts raise the need for dynamic memory allocation. What are they? • The running program must then be able to manage its own memory use. • Some will require a stack, others a heap. These are managed by the run-time system. Joey Paquet, 2000, 2002, 2007, 2008
Writing of Early Compilers • The first C compiler executable C compiler (minimal) minimal C compiler source assembler executable C compiler (full) full C compiler source C compiler (minimal) Joey Paquet, 2000, 2002, 2007, 2008
Writing Cross-Compilers • A Unix-MacIntosh C cross compiler Mac C compiler source code in Unix C Unix C compiler Mac C complier usable on Unix Mac C compiler source code in Unix C Mac C complier usable on Unix Mac C complier usable on Mac Joey Paquet, 2000, 2002, 2007, 2008
Writing Retargetable Compilers • Two methods: • Make a strict distinction between front-end and back-end, then use different back-ends. • Generate code for a virtual machine, then build a compiler or interpreter to translate virtual machine code to a specific machine code. That is what we do in the project. Joey Paquet, 2000, 2002, 2007, 2008
Summary • The first compiler was the assembler, a one-to-one direct translator. • Complex compilers were written incrementally, first using assemblers. • All compilation techniques are well known since the 60’s and early 70’s. Joey Paquet, 2000, 2002, 2007, 2008
Summary • The compilation process is divided into phases. • The input of a phase is the output of the previous phase. • It can be seen as a pipeline, where the phases are filters that successively transform the input program into an executable. Joey Paquet, 2000, 2002, 2007, 2008