310 likes | 461 Views
Week 1 – Lecture 1. Compiler Construction. Introduction The Textbook Assessment Overview. The Big Picture. In this course we will be constructing a compiler! Moving from a High Level Language to a Low Level Language Compilers are complex programs > 10,000 lines of code
E N D
Week 1 – Lecture 1 Compiler Construction • Introduction • The Textbook • Assessment • Overview
The Big Picture • In this course we will be constructing a compiler! • Moving from a High Level Language to a Low Level Language • Compilers are complex programs • > 10,000 lines of code • Integrate aspects from many different areas of CS • Formal language theory, algorithms, data structures, HLL & LLL (obviously), user interaction (error reporting)
L1 L2 Source Target What is a compiler? • A specialization of a language translator • Usually in CS: • the Source is a high level programming language • the Target is a machine code for a micro-processor C x86 processor
Applications of Compiler Techniques • Potential Source languages include: • Natural languages (English, French,….) • Circuit layout languages • Mark-up languages (HTML, XML, …) • Command line languages (SQL interface) • Potential Target languages include: • Natural languages • Printer drivers • Markup languages • e.g. HTML to RTF converter • Could involve many of the aspects we will cover in compiler construction
Compilers for Programming Languages • If we had 1 compiler for each {Source,Target} pair then we would have a lot of compilers! Source Languages Target Languages Pascal x86 (MMX) Sather C++ ARM C C# AMD K6 Java SPARC Compilers Prolog Fortran PowerPC 750 (G3) Haskell Lisp JVM
Source Modularity for Code Generation Compilers x86 ARM G4 Intermediate Representation Compiler portability (man gcc – lists different target machines)
Modularity for Source Languages? Targets Sources Compilers C Java Prolog Intermediate Representation Typically compilers only compile one source language – but the techniques used are very similar and are shared across different compilers
Typical Compiler Front-end Analysis Back-end Synthesis Independent of Source and Target languages Intermediate Representation Source Target course now week 6 Ideally: For a new Source language – we can add a new front-end to an existing back-end For a new Target language – we can add a new back-end to an existing front-end
Front End • Knowledge about the source language • Lexical structure (tokens) • Syntax • Programming constructs • Conditionals, iteration etc • Semantics • Type checking • Error-reporting • UI component • Often basic (and unhelpful!) • May vary if part of an IDE or standalone Source program Lexical analyser Symbol table Syntax analyser Error Handler Semantic analyser
Lexical Analysis Lexical Tasks the compiler has to perform: group together the 3 characters ‘max’ to form the single variable identifier max group together the 2 characters ‘<=’ to form the single relational operator <= (less than or equal to) int max = 20, x; read(x); if ( x <= max ) print(‘ok’); else print(‘too big’);
Syntactic Analysis • Recognise the if .. then … else structure • Group the x <= max into a single expression with a relational operator • Recognise the format of the variable declaration list • Such that x is correctly declared to be an int • Loops, program blocks (begin…end) • Arithmetic expressions, etc
Semantic analysis • Check that x<=max is a sensible thing to do • If x was a boolean and max a string then we would have a type error • Check that the ‘20’ is in fact an integer and so can be assigned to an int • And also (can be split over several phases) • Keep a note of all the variables used so we make sure they all refer to the same value (in memory)
Data Structures • Stream of text as the source file • Group together text into larger units from a limited set • Nearly all programming constructs can be represented as tree structures If statement statement if Boolean expression statement else Relational operator expression expression
Data Structures • Lexical Analyzer • Stream of tokens (enumerated type) • NUMBER OPERATOR NUMBER • Syntax Analyzer / Parser • Tree of program structure program assignment if_statement while_loop output_statement
Back-end • Knowledge about target processor / virtual machine • Instruction set • ‘costs’ of different: • op-codes • instructions • Registers • Memory Semantic analyser Intermediate code generator Symbol table manager Code optimiser Error handler Code generator
Putting it together Compiler A language-processing system Source program Skeletal source program Lexical analyser preprocessor Syntax analyser Source program compiler Semantic analyser Error Handler Symbol table Target asse mbly program assembler Intermediate code generator Relocatable machine code Loader link-editor Code optimiser Code generator Absolute machine code
Grammars • We define/describe HL languages with grammars • A Grammar consists of: • T, set of Terminals • N, set of Non-terminals • N T = • P, set of Productions • • Where and are members of T N • S, special member of N, the Start symbol • G = {T, N, P, S}
Type 0 Unrestricted Grammar Type 1 Context-Sensitive Grammar Type 2 Context Free Grammar Type 3 Regular Grammar Chomsky’s Grammar Hierarchy
Grammars • Type 0 (unrestricted) • , • and are unrestricted sequences, is not null • languages formed from Type 0 grammars can be recognised by non-deterministic Turing machines • Type 1 (context sensitive) • A B • A becomes B in the context of … • Complex for computer analysis
Grammars • Type 2 (context free) • A • A is a Non-terminal • is a member of T N (can be empty) • Equivalent to a push-down automaton • Type 3 (regular) • A wB, A w (right linear) • w is a string of Terminals • A and B are Non-Terminals • Finite state automata
In a compiler • Use the minimum complexity grammars that let us successfully cope with HL programming languages (and process them efficiently) • Regular grammars (=regular expressions) in the Lexical Analysis phase • ‘recognise the words’ • Context-free grammars in the Syntax Analysis phase • ’recognise the phrases’ • define our HLL as a grammar based on the output of the Lexical Analysis • Deal with context sensitivity in the Semantic Analysis phase
Source program Lexical Analyser Semantic Analyser Intermediate Representation Overall Front-End View Flex Text file Regular grammar tokens Tree structure Syntax Analyser Bison Context-free grammar Back-end Type-safe Tree structure Tree / Linearized tree
The Textbook Compilers: principles, techniques & tools Aho, Sethi & Ullman Addison-Wesley {‘The Dragon Book’}
Assessment • Building a compiler for a new language • Front-end • Lexical analysis • Parsing • Back end • Generating assembler code • Some formal and some practical • Formal more at the front-end
Programming & Tools • Lexical analysis generator – lex / flex • Parser generator – yacc / bison • C / C++ • To implement the remainder of the compiler • Unix environment • make files will be useful for coordinating lex and yacc
Instant Compilation • Consider the program: main() { int a = 3; a = a + 1; } Given a reasonably sensible assembly language a hand-compilation might be: LDA #3 STA 1 LDA 1 ADD a, #1 STA 1
& an Instant Compiler could look like … Switch( source_code_construct ) { case INT_DEC: print( “LDA #”, INT.value) print(“STA 1”) break case INT_ADD: print(“LDA 1”) print(“ADD a,#”, ADD.value) print(“STA 1”) break } /* end switch */
The Problems …. • Not efficient, (LDA #4; STA 1) • Only works for 1 variable • Only works at one location in memory • (usually let assembler deal with symbolic addresses) • Only has 2 programming constructs! • Not even slightly portable: • 1 instruction set & 1 source language
More problems… • No error reporting • type checking? • Assumes: • Program is correct • Recognition of programming language constructs • int a = 3 INT_DEC • Access to values • INT.value, ADD.value • 1:1 relationship between integers and memory locations
Solutions • We can view compilers as a solution to all of these problems • E.g. • Only compile correct programs to object code • Recognise all constructs in the language • Improve the efficiency of code • Execution speed • Memory usage • Meaningful error messages to the user • Cope with different target architectures
Why are compilers called compilers? • In early compilers one of the main tasks was connecting object program to • standard library functions, I/O devices • collecting information from different sources(e.g. libraries) • OS and processor dependent • This is now performed by ‘linkers’ • Compile – ‘construct by collecting from different sources’