180 likes | 426 Views
Compiler Construction. Vana Doufexi. Administrative info. Instructor Name: Vana Doufexi E-mail: vdoufexi@cs.northwestern.edu Office: Ford Building, #2-229 Hours: E-mail to set up appointment Teaching Assistant TBA. Administrative info. Course webpage
E N D
Compiler Construction Vana Doufexi
Administrative info • Instructor • Name: Vana Doufexi • E-mail: vdoufexi@cs.northwestern.edu • Office: Ford Building, #2-229 • Hours: E-mail to set up appointment • Teaching Assistant • TBA
Administrative info • Course webpage • http://www.cs.northwestern.edu/academics/courses/322 • contains: • news • staff information • lecture notes & other handouts • homeworks & manuals • policies, grades • newsgroup info • useful links • Newsgroup • Name: cs.322 • nntp: news.cs.northwestern.edu
What is a compiler • A program that reads a program written in some language and translates it into a program written in some other language • Modula-2 to C • Java to bytecodes • COOL to MIPS code • How was the first compiler created?
Why study compilers? • Application of a wide range of theoretical techniques • Data Structures • Theory of Computation • Algorithms • Computer Architecture • Good SW engineering experience • Better understanding of programming languages
Features of compilers • Correctness • preserve the meaning of the code • Speed of target code • Speed of compilation • Good error reporting/handling • Cooperation with the debugger • Support for separate compilation
Compiler structure • Use intermediate representation • Why? Front End Back End IR source code target code
Compiler Structure • Front end • Recognize legal/illegal programs • report/handle errors • Generate IR • The process can be automated • Back end • Translate IR into target code • instruction selection • register allocation • instruction scheduling • lots of NPC problems -- use approximations
Compiler Structure • Optimization • goals • improve running time of generated code • improve space, power consumption, etc. • how? • perform a number of transformations on the IR • multiple passes • important: preserve meaning of code
The Front End • Scanning (a.k.a. lexical analysis) • recognize "words" (tokens) • Parsing (a.k.a. syntax analysis) • check syntax • Semantic analysis • examine meaning (e.g. type checking) • Other issues: • symbol table (to keep track of identifiers) • error detection/reporting/recovery
The Scanner • Its job: • given a character stream, recognize words (tokens) • e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER • collect identifier information • e.g. IDENTIFIER corresponds to a lexeme (the actual word x) and its type (acquired from the declaration of x). • ignore white space and comments • report errors • Good news • the process can be automated
The Parser • Its job: • Check and verify syntax based on specified syntax rules • e.g. IDENTIFIER LPAREN RPAREN make up an EXPRESSION. • Coming soon: how context-free grammars specify syntax • Report errors • Build IR • often a syntax tree • Good news • the process can be automated
Semantic analysis • Its job: • Check the meaning of the program • e.g. In x=y, is y defined before being used? Are x and y declared? • e.g. In x=y, are the types of x and y such that you can assign one to the other? • Meaning may depend on context • Report errors
IRs • Graphical • e.g. parse tree, DAG • Linear • e.g. three-address code • Hybrid • e.g. linear for blocks of straight-line code, a graph to connect blocks • Low-level or high-level
The scanning process • Main goal: recognize words • How? by recognizing patterns • e.g. an identifier is a sequence of letters or digits that starts with a letter. • Lexical patterns form a regular language • Regular languages are described using regular expressions (REs) • Can we create an automatic RE recognizer? • Yes! (Hold that thought)
The scanning process • Definition: Regular expressions (over alphabet ) • is an RE denoting {} • If , then is an RE denoting {} • If r and s are REs, then • (r) is an RE denoting L(r) • r|s is an RE denoting L(r)L(s) • rs is an RE denoting L(r)L(s) • r* is an RE denoting the Kleene closure of L(r) • Property: REs are closed under many operations • This allows us to build complex REs.
The scanning process • Definition: Deterministic Finite Automaton • a five-tuple (, S, , s0, F) where • is the alphabet • S is the set of states • is the transition function (SS) • s0 is the starting state • F is the set of final states (F S) • Notation: • Use a transition diagram to describe a DFA • DFAs are equivalent to REs • Hey! We just came up with a recognizer!
The scanning process • Goal: automate the process • Idea: • Start with an RE • Build a DFA • How? • We can build a non-deterministic finite automaton (Thompson's construction) • Convert that to a deterministic one (Subset construction) • Minimize the DFA (Hopcroft's algorithm) • Implement it • Existing scanner generator: flex