500 likes | 616 Views
CS 320: Compiling Techniques. David Walker. People. David Walker (Professor) 412 Computer Science Building dpw@cs.princeton.edu office hours: after each class Dan Dantas (TA) 417 Computer Science Building ddantas@cs.princeton.edu office hours: Mondays 2-3 PM. Information. Web site:
E N D
CS 320: Compiling Techniques David Walker
People • David Walker (Professor) • 412 Computer Science Building • dpw@cs.princeton.edu • office hours: after each class • Dan Dantas (TA) • 417 Computer Science Building • ddantas@cs.princeton.edu • office hours: Mondays 2-3 PM
Information • Web site: • www.cs.princeton.edu/courses/archive/spring04/cos320/index.htm • Mailing list:
Books • Modern Compiler Implementation in ML • Andrew Appel • required • Elements of ML Programming • Jeffrey D. Ullman • also: online references; see Web site
Assignment 0 • Write your name and other information on the sheet circulating • Find, skim and bookmark the course web pages • Subscribe to course e-mail list • Begin assignment 1 • Read chapter 1 Appel • Figure out how to run & use SML • Due next Thursday 12
What is a compiler? • A compiler is program that translates a source language into an equivalent target language
What is a compiler? while (i > 3) { a[i] = b[i]; i ++ } C program compiler does this mov eax, ebx add eax, 1 cmp eax, 3 jcc eax, edx assembly program
What is a compiler? class foo { int bar; ... } Java program compiler does this struct foo { int bar; ... } C program
What is a compiler? class foo { int bar; ... } Java program compiler does this ........ ......... ........ Java virtual machine program
What is a compiler? \newcommand{ .... } Latex program compiler does this \sfd\sf\fadg Tex program
What is a compiler? \newcommand{ .... } Tex program compiler does this \sfd\sf\fadg Postscript program
What is a compiler? • Other places: • Web scripts are compiled into HTML • assembly language is compiled into machine language • hardware description language is compiled into a hardware circuit • ...
text file to abstract syntax lexing; parsing abstract syntax to intermediate form (IR) analysis; optimizations; data layout IR to machine code code generation; register allocation Compilers are complex front-end middle-end back-end
Tiger Source Language simple imperative language Instruction Trees as intermediate form (IR) type checking; data layout on the stack Code Generation instruction selection algorithms; register allocation via graph coloring Course project front-end middle-end back-end
Standard ML • Standard ML is a domain-specific language for building compilers • Support for • Complex data structures (abstract syntax, compiler intermediate forms) • Memory management like Java • Large projects with many modules • Advanced type system for error detection
Introduction to ML • You will be responsible for learning ML on your own. • Today I will cover some basics • Resources: • Jeffrey Ullman “Elements of ML Programming” • Robert Harper’s “an introduction to ML” • See course webpage for pointers and info about how to get the software
Intro to ML • Highlights • Data Structures for compilers • Data type definitions • Pattern matching • Strongly-typed language • Every expression has a type • Certain errors cannot occur • Polymorphic types provide flexibility • Flexible Module System • Abstract Types • Higher-order modules (functors)
Intro to ML • Interactive Language • Type in expressions • Evaluate and print type and result • Compiler as well • High-level programming features • Data types • Pattern matching • Exceptions • Mutable data discouraged
Preliminaries • Read – Eval – Print – Loop - 3 + 2;
Preliminaries • Read – Eval – Print – Loop - 3 + 2; > 5: int
Preliminaries • Read – Eval – Print – Loop - 3 + 2; > 5: int - it + 7; > 12 : int
Preliminaries • Read – Eval – Print – Loop - 3 + 2; > 5: int - it + 7; > 12 : int - it – 3; > 9 : int - 4 + true; stdIn:17.1-17.9 Error: operator and operand don't agree [literal] operator domain: int * int operand: int * bool in expression: 4 + true
Preliminaries • Read – Eval – Print – Loop - 3 div 0; Failure : Div - run-time error
Basic Values - (); > () : unit => like “void” in C (sort of) => the uninteresting value/type - true; > true : bool - false; > false : bool - if it then 3+2 else 7; “else” clause is always necessary > 7 : int - false andalso loop_Forever; > false : bool and also, or else short-circuit eval
Basic Values Integers - 3 + 2 > 5 : int - 3 + (if not true then 5 else 7); > 10 : int No division between expressions and statements Strings - “Dave” ^ “ “ ^ “Walker”; > “Dave Walker” : string - print “foo\n”; foo > 3 : int Reals - 3.14; > 3.14 : real
Using SML/NJ • Interactive mode is a good way to start learning and to debug programs, but… • Type in a series of declarations into a “.sml” file - use “foo.sml” [opening foo.sml] … list of declarations with their types
Larger Projects • SML has its own built in interactive “make” • Pros: • It automatically does the dependency analysis for you • No crazy makefile syntax to learn • Cons: • May be more difficult to interact with other languages or tools
Compilation Manager sources.cm a.sig b.sml c.sml Group is a.sig b.sml c.sml • % sml • OS.FileSys.chDir “~/courses/510/a2”; • CM.make(); looks for “sources.cm”, analyzes dependencies • [compiling…] compiles files in group • [wrote…] saves binaries in ./CM/ • - CM.make’“myproj/”(); specify directory
What is next? • ML has a rich set of structured values • Tuples: (17, true, “stuff”) • Records: {name = “Dave”, ssn = 332177} • Lists: 3::4::5::nil or [3,4]@[5] • Datatypes • Functions • And more! • Rather than list all the details, we will write a couple of programs
An interpreter • Interpreters are usually implemented as a series of transformers: lexing/ parsing evaluate print stream of characters abstract syntax abstract value stream of characters
A little language (LL) • An arithmetic expression e is • a boolean value • an if statement (if e1 then e2 else e3) • an integer • an add operation • a test for zero (isZero e)
LL abstract syntax in ML datatype term = Bool of bool | If of term * term * term | Num of int | Add of term * term | IsZero of term -- constructors are capitalized -- constructors can take a single argument of a particular type type of a tuple another eg: string * char vertical bar separates alternatives
LL abstract syntax in ML Add Add (Num 2, Num 3) represents the expression “2 + 3” Num Num 2 3
LL abstract syntax in ML If If (Bool true, Num 0, Add (Num 2, Num 3)) represents “if true then 0 else 2 + 3” Add Bool Num true Num Num 0 3 2
Function declarations function name function parameter fun isValue t = case t of Num n => true | Bool b => true | _ => false default pattern matches anything
What is the type of the parameter t? Of the function? function name function parameter fun isValue t = case t of Num n => true | Bool b => true | _ => false default pattern matches anything
What is the type of the parameter t? Of the function? fun isValue (t:term) : bool = case t of Num n => true | Bool b => true | _ => false val isValue : term -> bool ML does type inference => you need not annotate functions yourself (but it can be helpful)
A type error fun isValue t = case t of Num _ => true | _ => false ex.sml:22.3-24.15 Error: types of rules don't agree [literal] earlier rule(s): term -> int this rule: term -> bool in rule: _ => false
A type error Actually, ML will give you several errors in a row: ex.sml:22.3-25.15 Error: types of rules don't agree [literal] earlier rule(s): term -> int this rule: term -> bool in rule: Successor t2 => true ex.sml:22.3-25.15 Error: types of rules don't agree [literal] earlier rule(s): term -> int this rule: term -> bool in rule: _ => false
A very subtle error fun isValue t = case t of num => true | _ => false The code above type checks. But when we test it refined the function always returns “true.” What has gone wrong?
A very subtle error fun isValue t = case t of Num 0 => 1 | Add(Num t1,Num t2) => t1 + t2 | _ => 0 The code above type checks. But when we test it refined the function always returns “true.” What has gone wrong? -- num is not capitalized (and has no argument) -- ML treats it like a variable pattern (matches anything!)
Exceptions exception Error of string fun debug s : unit = raise (Error s)
Exceptions exception Error of string fun debug s : unit = raise (Error s) in SML interpreter: - debug "hello"; uncaught exception Error raised at: ex.sml:15.28-15.35
Evaluator fun isValue t = ... exception NoRule fun eval t = case t of Bool _ | Num _ => t | ...
Evaluator ... fun eval t = case t of Bool _ | Num _ => t | If(t1,t2,t3) => let val v = eval t1 in case v of Bool b => if b then (eval t2) else (eval t3) | _ => raise NoRule end let statement for remembering temporary results
Evaluator exception NoRule fun eval1 t = case t of Bool _ | Num _ => ... | ... | Add (t1,t2) => case (eval v1, eval v2) of (Num n1, Num n2) => Num (n1 + n2) | (_,_) => raise NoRule
Finishing the Evaluator fun eval1 t = case t of ... | ... | Add (t1,t2) => ... | IsZero t => ... be sure your case is exhaustive
Finishing the Evaluator fun eval1 t = case t of ... | ... | Add (t1,t2) => ... What if we forgot a case?
Finishing the Evaluator fun eval1 t = case t of ... | ... | Add (t1,t2) => ... What if we forgot a case? ex.sml:25.2-35.12 Warning: match nonexhaustive (Bool _ | Zero) => ... If (t1,t2,t3) => ... Add (t1,t2) => ...