2.77k likes | 3.14k Views
CSC 415: Translators and Compilers. Dr. Chuck Lillie. Course Outline. Major Programming Project Project Definition and Planning Implementation Weekly Status Reports Project Presentation. Translators and Compilers Language Processors Compilation Syntactic Analysis Contextual Analysis
E N D
CSC 415: Translators and Compilers Dr. Chuck Lillie
Course Outline • Major Programming Project • Project Definition and Planning • Implementation • Weekly Status Reports • Project Presentation • Translators and Compilers • Language Processors • Compilation • Syntactic Analysis • Contextual Analysis • Run-Time Organization • Code Generation • Interpretation
Project • Implement a Compiler for the Programming Language Triangle • Appendix B: Informal Specification of the Programming Language Triangle • Appendix D: Class Diagrams for the Triangle Compiler • Present Project Plan • What and How • Weekly Status Reports • Work accomplished during the reporting period • Deliverable progress, as a percentage of completion • Problem areas • Planned activities for the next reporting period
Chapter 1: Introduction to Programming Languages • Programming Language: A formal notation for expressing algorithms. • Programming Language Processors: Tools to enter, edit, translate, and interpret programs on machines. • Machine Code: Basic machine instructions • Keep track of exact address of each data item and each instruction • Encode each instruction as a bit string • Assembly Language: Symbolic names for operations, registers, and addresses.
Programming Languages • High Level Languages: Notation similar to familiar mathematical notation • Expressions: +, -, *, / • Data Types: truth variables, characters, integers, records, arrays • Control Structures: if, case, while, for • Declarations: constant values, variables, procedures, functions, types • Abstraction: separates what is to be performed from how it is to be performed • Encapsulation (or data abstraction): group together related declarations and selectively hide some
Programming Languages • Any system that manipulates programs expressed in some particular programming language • Editors: enter, modify, and save program text • Translators and Compilers: Translates text from one language to another. Compiler translates a program from a high-level language to a low-level language, preparing it to be run on a machine • Checks program for syntactic and contextual errors • Interpreters: Runs program without compliation • Command languages • Database query languages
Programming Languages Specifications • Syntax • Form of the program • Defines symbols • How phrases are composed • Contextual constraints • Scope: determine scope of each declaration • Type: • Semantics • Meaning of the program
Representation • Syntax • Backus-Naur Form (BNF): context-free grammar • Terminal symbols (>=, while, ;) • Non-terminal symbols (Program, Command, Expression, Declaration) • Start symbol (Program) • Production rules (defines how phrases are composed from terminals and sub-phrases) • N::=a|b|…. • Syntax Tree • Used to define language in terms of strings and terminal symbols
Representation • Semantics • Abstract Syntax • Concentrate on phrase structure alone • Abstract Syntax Tree
Contextual Constraints • Scope • Binding • Static: determined by language processor • Dynamic: determined at run-time • Type • Statically: language processor can detect all errors • Dynamically: type errors cannot be detected until run-time Will assume static binding and statically typed
Semantics • Concerned with meaning of program • Behavior when run • Usually specified informally • Declarative sentences • Could include side effects • Correspond to production rules
Chapter 2: Language Processors • Translators and Compilers • Interpreters • Real and Abstract Machines • Interpretive Compilers • Portable Compilers • Bootstrapping • Case Study: The Triangle Language Processor
Translators & Compilers • Translator: a program that accepts any text expressed in one language (the translator’s source language), and generates a semantically-equivalent text expressed in another language (its target language) • Chinese-into-English • Java-into-C • Java-into-x86 • X86 assembler
Translators & Compilers • Assembler: translates from an assembly language into the corresponding machine code • Generates one machine code instruction per source instruction • Compiler: translates from a high-level language into a low-level language • Generates several machine-code instructions per source command.
Translators & Compilers • Disassembler: translates a machine code into the corresponding assembly language • Decompiler: translates a low-level language into a high-level language Question: Why would you want a disassembler or decompiler?
Source Program Object Program Generate Object Code Semantic Analysis Translators & Compilers • Source Program: the source language text • Object Program: the target language text Compiler Syntax Check Context Constraints • Object program semantically equivalent to source program • If source program is well-formed
Translators & Compilers • Why would you want to do: • Java-into-C translator • C-into-Java translator • Assembly-language-into-Pascal decompiler
S T P L P L L M M Translators & Compilers P = Program Name L = Implementation Language M = Target Machine For this to work, L must equal M, that is, the implementation language must be the same as the machine language S = Source Language T = Target Language L = Translator’s Implementation Language S-into-T Translator is itself a program that runs on machine L
S T P P M S T M Translators & Compilers • Translating a source program P • Expressed in language T, • Using an S-into-T translator • Running on machine M
x86 x86 Translators & Compilers sort sort sort Java x86 Java x86 x86 x86 • Translating a source program sort • Expressed in language Java, • Using an Java-into-x86 translator • Running on an x86 machine The object program is running on the same machine as the compiler
PPC x86 Translators & Compilers sort sort sort Java PPC Java PPC PPC download x86 • Translating a source program sort • Expressed in language Java, • Using an Java-into-PPC translator • Running on an x86 machine • Downloaded to a PPC machine Cross Compiler: The object program is running on a different machine than the compiler
sort x86 x86 x86 x86 x86 x86 Translators & Compilers sort sort sort Java Java C C C x86 x86 • Translating a source program sort • Expressed in language Java, • Using an Java-into-C translator • Running on an x86 machine • Then translating the C program • Using an C-into x86 compiler • Running on an x86 machine • Into x86 object program Two-stage Compiler: The source program is translated to another language before being translated into the object program
Translators & Compilers • Translator Rules • Can run on machine M only if it is expressed in machine code M • Source program must be expressed in translator’s source language S • Object program is expressed in the translator’s target language T • Object program is semantically equivalent to the source program
Interpreters • Accepts any program (source program) expressed in a particular language (source language) and runs that source program immediately • Does not translate the source program into object code prior to execution
Source Program Interpreters Interpreter Fetch Instruction Analyze Instruction Program Complete Execute Instruction • Source program starts to run as soon as the first instruction is analyzed
Interpreters • When to Use Interpretation • Interactive mode – want to see results of instruction before entering next instruction • Only use program once • Each instruction expected to be executed only once • Instructions have simple formats • Disadvantages • Slow: up to 100 times slower than in machine code
Interpreters • Examples • Basic • Lisp • Unix Command Language (shell) • SQL
graph P Basic S x86 M Basic S S x86 M L Interpreters S interpreter expressed in language L Program P expressed in language S, using Interpreter S, running on machine M Program graph written in Basic running on a Basic interpreter executed on an x86 machine
Real and Abstract Machines • Hardware emulation: Using software to execute one set of machine code on another machine • Can measure everything about the new machine except its speed • Abstract machine: emulator • Real machine: actual hardware An abstract machine is functionally equivalent to a real machine if they both implement the same language L
C M P P M nmi nmi nmi M M nmi nmi nmi nmi M M C C Real and Abstract Machines New Machine Instruction (nmi) interpreter written in C nmi interpreter expressed in machine code M nmi interpreter written in C The nmi interpreter is translated into machine code M using the C compiler Compiler to translate C program into M machine code
Interpretive Compilers • Combination of compiler and interpreter • Translate source program into an intermediate language • It is intermediate in level between the source language and ordinary machine code • Its instructions have simple formats, and therefore can be analyzed easily and quickly • Translation from the source language into the intermediate language is easy and fast An interpretive compiles combines fast compilation with tolerable running speed
Java Java JVM JVM P P P M M JVM JVM Java M M JVM JVM M M Interpretive Compilers Java into JVM translator running on machine M JVM code interpreter running on machine M A Java program P is first translated into JVM-code, and then the JVM-code object program is interpreted
Portable Compilers • A program is portable if it can be compiled and run on any machine, without change • A portable program is more valuable than an unportable one, because its development cost can be spread over more copies • Portability is measured by the proportion of code that remains unchanged when it is moved to a dissimilar machine • Language affects protability • Assembly language: 0% portable • High level language: approaches 100% portability
Portable Compilers • Language Processors • Valuable and widely used programs • Typically written in high-level language • Pascal, C, Java • Part of language processor is machine dependent • Code generation part • Language processor is only about 50% portable • Compiler that generates intermediate code is more portable than a compiler that generates machine code
Java Java C JVM JVM M P P P JVM JVM M JVM JVM Java M M M JVM JVM JVM JVM JVM JVM Java M M M C C Note: C M Compiler exists; rewrite JVM interpreter from Java to C Portable Compilers Java JVM Java Rewrite interpreter in C
Bootstrapping • The language processor is used to process itself • Implementation language is the source language • Bootstrapping a portable compiler • A portable compiler can be bootstrapped to make a true compiler – one that generates machine code – by writing an intermediate-language-into-machine-code translator • Full bootstrap • Writing the compiler in itself • Using the latest version to upgrade the next version • Half bootstrap • Compiler expressed in itself but targeted for another machine • Bootstrapping to improve efficiency • Upgrade the compiler to optomize code generation as well as to improve compile efficiency
JVM JVM JVM JVM Java Java Java Java Java Java Java Java JVM JVM JVM JVM M M M M M M M M JVM JVM JVM JVM JVM Java Java M M M M M P P P JVM Java M M M M M M JVM JVM M M Bootstrapping Bootstrap an interpretive compiler to generate machine code First, write a JVM-coded-into-M translator in Java Next, compile translator using existing interpreter Use translator to translate itself Two stage Java-into-M compiler Translate Java-into-JVM-code translator into machine code
Ada-S Ada-S Ada-S Ada-S Ada-S Ada-S Ada-S Ada-S Ada Ada Ada C M M M M M M M M M M M M v2 v3 v1 v3 v1 v2 Ada-S Ada-S Ada-S Ada-S M M M M M M C C M M M Bootstrapping Full bootstrap v2 v1 Write Ada-S compiler in C Convert the C version of Ada-S into Ada-S version of Ada-S v1 v3 v2 Extend Ada-S compiler to (full) Ada compiler
Ada Ada Ada Ada Ada Ada Ada HM HM HM TM TM TM TM Ada Ada Ada HM HM HM HM P P P Ada TM TM HM TM Ada TM Ada TM Ada TM TM Ada HM HM Bootstrapping Half bootstrap
Ada Ada Ada Ada Ada Ada Ada Ms Ms Ms Mf Mf Mf Mf v2 v2 v2 v1 v2 v1 v1 Ada Ada Ada Ms Ms Ms Ms P Ada M Bootstrapping Bootstrap to improve efficiency
Chapter 3: Compilation • Phases • Syntactic Analysis • Contextual Analysis • Code Generation • Passes • Multi-pass Compilation • One-pass Compilation • Compiler Design Issues • Case Study: The Triangle Compiler
Phases • Syntactic Analysis • The source program is parsed to check whether it conforms to the source language’s syntax, and to determine its phrase structure • Contextual Analysis • The parsed program is analyzed to check whether it conforms to the source language's contextual constraints • Code Generation • The checked program is translated to an object program, in accordance with the semantics of the source and target languages
Phases Source Program Syntactic Analysis Error Report AST Contextual Analysis Error Report Decorated AST Code Generation Object Program
Syntactic Analysis • To determine the source program’s phrase structure • Parsing • Contextual analysis and code generation must know how the program is composed • Commands, expressions, declarations, … • Check for conformance to the source language’s syntax • Construct suitable representation of its phrase structure (AST) • AST • Terminal nodes corresponding to identifiers, literals, and operators • Sub trees representing the phases of the source program • Blanks and comments not in AST (no meaning) • Punctuation and brackets not in AST (only separate and enclose)
Contextual Analysis • Analyzes the parsed program • Scope rules • Type rules • Produces decorated AST • AST with information gathered during contextual analysis • Each applied occurrence of an identifier is linked ot the corresponding declaration • Each expression is decorated by its type T
Code Generation • The final translation of the checked program to an object program • After syntactic and contextual analysis is completed • Treatment of identifiers • Constants • Binds identifier to value • Replace each occurrence of identifier with value • Variables • Binds identifier to some memory address • Replace each occurrence of identifier by address • Target language • Assembly language • Machine code
Compiler Driver Contextual Analyzer Code Generator Syntactic Analyzer Compiler Driver Syntactic Analyzer Contextual Analyzer Code Generator Passes • Multi-pass compilation • Traverses the program or AST several times • One-pass compilation • Single traverse of program • Contextual analysis and code generation are performed ‘on the fly’ during syntactic analysis
Compiler Design Issues • Speed • Compiler run time • Space • Storage: size of compiler + files generated • Modularity • Multi-pass compiler more modular than one-pass compiler • Flexibility • Multi-pass compiler is more flexible because it generates an AST that can be traversed in any order by the other phases • Semantics-preserving transformations • To optimize code – must have multi-pass compiler • Source language properties • May restrict compiler choice – some language constructs may require multi-pass compilers
Chapter 4: Syntactic Analysis • Sub-phases of Syntactic Analysis • Grammars Revisited • Parsing • Abstract Syntax Trees • Scanning • Case Study: Syntactic Analysis in the Triangle Compiler
Structure of a Compiler Lexical Analyzer Source code Symbol Table tokens Parser & Semantic Analyzer parse tree Intermediate Code Generation intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code