Types, Type Inference and Unification

Types, Type Inference and Unification MoolySagiv Slides by Kathleen Fisher and John Mitchell Cornell CS 6110

Summary (Functional Programming) • Lambda Calculus • Basic ML • Advanced ML: Modules, References, Side-effects • Closures and Scopes • Type Inference and Type Checking

Outline • General discussion of types • What is a type? • Compile-time versus run-time checking • Conservative program analysis • Type inference • Discuss algorithm and examples • Illustrative example of static analysis algorithm • Polymorphism • Uniform versus non-uniform implementations

Language Goals and Trade-offs • Thoughts to keep in mind • What features are convenient for programmer? • What other features do they prevent? • What are design tradeoffs? • Easy to write but harder to read? • Easy to write but poorer error messages? • What are the implementation costs? Architect Programmer Programming Language Q/A Tester Compiler, Runtime environ-ment DiagnosticTools

What is a type? • A type is a collection of computable values that share some structural property. Examples Non-examples int string int bool (int  int)  bool [a]  a [a] * a  [a] 3, True, \x->x Even integers f:int  int | x>3 => f(x) > x *(x+1) Distinction between sets of values that are types and sets that are not types is language dependent

Advantages of Types • Program organization and documentation • Separate types for separate concepts • Represent concepts from problem domain • Document intended use of declared identifiers • Types can be checked, unlike program comments • Identify and prevent errors • Compile-time or run-time checking can prevent meaningless computations such as 3 + true – “Bill” • Support optimization • Example: short integers require fewer bits • Access components of structures by known offset

What is a type error? • Whatever the compiler/interpreter says it is? • Something to do with bad bit sequences? • Floating point representation has specific form • An integer may not be a valid float • Something about programmer intent and use? • A type error occurs when a value is used in a way that is inconsistent with its definition • Example: declare as character, use as integer

Type errors are language dependent • Array out of bounds access • C/C++: run-time errors • OCaml/Java: dynamic type errors • Null pointer dereference • C/C++: run-time errors • OCaml: pointers are hidden inside datatypes • Null pointer dereferences would be incorrect use of these datatypes, therefore static type errors

Compile-time vs Run-time Checking • JavaScript and Lisp use run-time type checking • f(x) Make sure f is a function before calling f • OCaml and Java use compile-time type checking • f(x) Must have f: A  B and x : A • Basic tradeoff • Both kinds of checking prevent type errors • Run-time checking slows down execution • Compile-time checking restricts program flexibility • JavaScript array: elements can have different types • OCaml list: all elements must have same type • Which gives better programmer diagnostics?

Expressiveness • In JavaScript, we can write a function like Some uses will produce type error, some will not • Static typing always conservative Cannot decide at compile time if run-time error will occur! function f(x) { return x < 10 ? x : x(); } if (complicated-boolean-expression) then f(5); else f(15);

Type Safety • Type safe programming languages protect its own abstractions • Type safe programs cannot go wrong • No run-time errors • But exceptions are fine • The small step semantics cannot get stuck • Type safety is proven at language design time

Relative Type-Safety of Languages • Not safe: BCPL family, including C and C++ • Casts, unions, pointer arithmetic • Almost safe: Algol family, Pascal, Ada • Dangling pointers • Allocate a pointer p to an integer, deallocate the memory referenced by p, then later use the value pointed to by p • Hard to make languages with explicit deallocation of memory fully type-safe • Safe: Lisp, Smalltalk, ML, Haskell, Java, JavaScript • Dynamically typed: Lisp, Smalltalk, JavaScript • Statically typed: OCaml, Haskell, Java, Rust If code accesses data, it is handled with the type associated with the creation and previous manipulation of that data

Type Checking vs Type Inference • Standard type checking: • Examine body of each function • Use declared types to check agreement • Type inference: • Examine code without type information • Infer the most general types that could have been declared intf(intx) { return x+1; }; intg(inty) { return f(y+1)*2; }; intf(intx) { return x+1; }; intg(inty) { return f(y+1)*2; }; ML and Haskell are designed to make type inference feasible

The Type Inference Problem • Input: A program without types (e.g., Lambda calculus) • Output: A program with type for every expression (e.g., typed Lambda calculus) • Every expression is annotated with its most general type

Why study type inference? • Types and type checking • Improved steadily since Algol 60 • Eliminated sources of unsoundness • Become substantially more expressive • Important for modularity, reliability and compilation • Type inference • Reduces syntactic overhead of expressive types • Guaranteed to produce most general type • Widely regarded as important language innovation • Illustrative example of a flow-insensitive static analysis algorithm

History • Original type inference algorithm • Invented by Haskell Curry and Robert Feys for the simply typed lambda calculus in 1958 • In 1969, Hindley • extended the algorithm to a richer language and proved it always produced the most general type • In 1978, Milner • independently developed equivalent algorithm, called algorithm W, during his work designing ML • In 1982, Damas proved the algorithm was complete. • Currently used in many languages: ML, Ada, Haskell, C# 3.0, F#, Visual Basic .Net 9.0. Have been plans for Fortress, Perl 6, C++0x,…

Type Inference: Basic Idea • Example • What is the type of the expression? • + has type: int  int  int • 2 has type: int • Since we are applying + to x we need x : int • Therefore fun x -> 2 + x has type int  int fun x -> 2 + x-: int -> int = <fun>

Imperative Example x := b[z] a [b[y]] := x

Type Inference: Basic Idea • Example • What is the type of the expression? • 3 has type: int • Since we are applying f to 3 we need f : int a and the result is of type a • Therefore fun f -> f 3 has type(int a) a fun f => f 3(int -> a) -> a = <fun>

Type Inference: Basic Idea • Example • What is the type of the expression? fun f => f (f 3)(int -> int) -> int = <fun>

Type Inference: Basic Idea • Example • What is the type of the expression? fun f => f (f “hi”)(string -> string) -> string = <fun>

Type Inference: Basic Idea • Example • What is the type of the expression? fun f => f (f 3, f 4)

Type Inference: Complex Example let square = z. z * z in f.x. y. if (f x y) then (f (square x) y) else (f x (f x y)) * : int (int  int) z : int square : int  int f : a  (b  bool), x: a, y: b a: int b: bool (int  bool  bool) int bool  bool

Unification • Unifies two terms • Used for pattern matching and type inference • Simple examples • int * x and y * (bool * bool) are unifiable for y = intand x = (bool * bool) • int * int and int * boolare not unifiable

Substitution Types: <type> ::= int | float | bool |… | <type>  <type> | <type> * <type> | variable Terms: <term> ::= constant | variable | f(<term>, …, <term>) The essential task of unification is to find a substitution that makes the two terms equal f(x, h(x, y)) {x g(y), y z} = f(g(y), h(g(y), z) The terms t1 and t2 are unifiable if there exists a substiitution S such that t1 S = t2 S Example: t1 =f(x, g(y)), t2=f(g(z), w)

Most General Unifiers (mgu) • It is possible that no unifier for given two terms exist • For example x and f(x) cannot be unified • There may be several unifiers • Example: t1 =f(x, g(y)), t2=f(g(z), w) • S = {x  g(z), w  g(y)} • S’ = {x  g(f(a, b)), y f(b, a), z f(a, b), w  g(f(b, a)} • When a unifier exists, there is always a most general unifier (mgu) that is unique up to renaming • S is the most general unifier of t1 and t2 if • It is a unifier of t1 and t2 • For every other unifier S’ of t1 and t2 there exists a refinement of S to give S’ • mgu can be efficiently computed • mgu(f(x, g(y)), f(g(z), w)) ={x  g(z), w g(y)} • mgu({yg(w)}, f(x, g(y)), f(g(z), w)) = {y g(w), x  g(z), w g(g(w))}

Type Inference with mgu • Example • What is the type of the expression? fun f => f (f “hi”)(string -> string) -> string = <fun> f:T1.apply(f:T1,apply(f:T1,“hi”:string):T2):T3 mgu(T1,stringT2) ={T1stringT2} =S mgu(S, T1,T2T3)={T1stringT2,T2string, T3string}

Type Inference Algorithm • Parse program to build parse tree • Assign type variables to nodes in tree • Generate constraints: • From environment: literals (2), built-in operators (+), known functions (tail) • From form of parse tree: e.g., application and abstraction nodes • Solve constraints using unification • Determine types of top-level declarations

Step 1: Parse Program • Parse program text to construct parse tree let f x = 2 + x Infix operators are converted to Curied function application during parsing: (not necessary) 2 + x  (+) 2 x

Step 2: Assign type variables to nodes fx = 2 + x Variables are given same type as binding occurrence

Step 3: Add Constraints let f x = 2 + x t_0 = t_1 -> t_6 t_4 = t_1 -> t_6 t_2 = t_3 -> t_4 t_2 = int -> )int -> int( t_3 = int

Step 4: Solve Constraints t_0 = t_1 -> t_6 t_4 = t_1 -> t_6 t_2 = t_3 -> t_4 t_2 = int -> )int -> int( t_3 = int t_3 -> t_4 = int -> (int -> int) t_3 = int t_4 = int -> int t_0 = t_1 -> t_6 t_4 = t_1 -> t_6 t_4 = int -> int t_2 = int -> )int -> int( t_3 = int t_1 -> t_6 = int -> int t_0 = int -> int t_1 = int t_6 = int t_4 = int -> int t_2 = int -> )int -> int( t_3 = int t_1 = int t_6 = int

Step 5:Determine type of declaration t_0 = int-> int t_1 = int t_6 = int-> int t_4 = int-> int t_2 = int-> int-> int t_3 = int let f x = 2 + x valf : int-> int =<fun>

Constraints from Application Nodes fx • Function application (apply f to x) • Type of f (t_0 in figure) must be domain  range • Domain of f must be type of argument x (t_1 in fig) • Range of f must be result of application (t_2 in fig) • Constraint: t_0 = t_1 -> t_2 t_0 = t_1 -> t_2

Constraints from Abstractions let f x = e • Function declaration: • Type of f (t_0 in figure) must domain  range • Domain is type of abstracted variable x (t_1 in fig) • Range is type of function body e (t_2 in fig) • Constraint: t_0 = t_1 -> t_2 t_0 = t_1 -> t_2

Inferring Polymorphic Types let f g = g 2 val f : (int -> t_4) -> t_4 = <fun> • Example: • Step 1: Build Parse Tree

Inferring Polymorphic Types let f g = g 2 val f : (int -> t_4) -> t_4 = fun • Example: • Step 2: Assign type variables

Inferring Polymorphic Types let f g = g 2 val f : (int -> t_4) -> t_4 = <fun> • Example: • Step 3: Generate constraints t_0 = t_1 -> t_4 t_1 = t_3 -> t_4 t_3 = int

Inferring Polymorphic Types let f g = g 2 val f : (int -> t_4) -> t_4 = <fun> • Example: • Step 4: Solve constraints t_0 = t_1 -> t_4 t_1 = t_3 -> t_4 t_3 = int t_0 = (int-> t_4) -> t_4 t_1 = int-> t_4 t_3 = int ::  :

Inferring Polymorphic Types let f g = g 2 val f : (int -> t_4) -> t_4 = <fun> • Example: • Step 5: Determine type of top-level declaration Unconstrained type variables become polymorphic types t_0 = (int-> t_4) -> t_4 t_1 = int-> t_4 t_3 = int ::  :

Using Polymorphic Functions let f g = g 2 val f : (int -> t_4) -> t_4 = <fun> • Function: • Possible applications: let add x = 2 + x val add : int-> int = <fun> f add :- int = 4 let isEvenx = mod (x, 2) == 0 valisEven: int-> bool = <fun> f isEven :- bool= true

Recognizing Type Errors • Function: • Incorrect use • Type error: cannot unify bool bool and int t let f g = g 2 val f : (int -> t_4) -> t_4 = <fun> let not x = if x then true else false valnot : bool-> bool = <fun> f not > Error: operator and operand don’t agree operator domain: int-> a operand: bool-> bool

Another Example let f (g,x) = g (g x) valf : ((t_8 -> t_8) * t_8) -> t_8 • Example: • Step 1: Build Parse Tree

Another Example let f (g,x) = g (g x) valf : ((t_8 -> t_8) * t_8) -> t_8 • Example: • Step 2: Assign type variables

Another Example let f (g,x) = g (g x) valf : ((t_8 -> t_8) * t_8) -> t_8 • Example: • Step 3: Generate constraints t_0 = t_3 -> t_8 t_3 = (t_1, t_2) t_1 = t_7 -> t_8 t_1 = t_2 -> t_7

Another Example let f (g,x) = g (g x) valf : ((t_8 -> t_8) * t_8) -> t_8 • Example: • Step 4: Solve constraints t_0 = t_3 -> t_8 t_3 = (t_1, t_2) t_1 = t_7 -> t_8 t_1 = t_2 -> t_7 t_0 = (t_8 -> t_8, t_8) -> t_8

Another Example let f (g,x) = g (g x) valf : ((t_8 -> t_8) * t_8) -> t_8 • Example: • Step 5: Determine type of f t_0 = t_3 -> t_8 t_3 = (t_1 * t_2) t_1 = t_7 -> t_8 t_1 = t_2 -> t_7 t_0 = (t_8 -> t_8 * t_8) -> t_8

Pattern Matching • Matching with multiple cases • Infer type of each case • First case: • Second case: • Combine by unification of the types of the cases let isempty l = match l with |[] -> true | _ -> false [t_1] -> bool t_2 -> bool valisempty : [t_1] -> bool = <fun>

Bad Pattern Matching • Matching with multiple cases • Infer type of each case • First case: • Second case: • Combine by unification of the types of the cases let isempty l = match l with |[] -> true | _ -> 0 [t_1] -> bool t_2 -> int Type Error: cannot unify bool and int

Recursion let recconcat a b = match a with | [] -> b | x::xs -> x :: concatxs b • To handle recursion, introduce type variables for the function: • Use these types to conclude the type of the body: • Pattern matching first case: • Pattern matching second case: concat : t_1 -> t_2 -> t_3 [t_4] -> t_5 -> t_5 unify [t_4] with t_1,t_5 with t_2, t_5 with t_3 t_1 =[t_4] and t_2 = t_3 = t_5 [t_6] -> t_7 -> [t_6] unify [t_6] with t_1, t_7 with t_2, [t_6] with t_3 unify [t_6] with t_1, t_7 with t_2, t_3 with [t_6]

Types, Type Inference and Unification