460 likes | 653 Views
Lecture #12, Feb. 21, 2007. Basic Types Constructed Types Representing types as data Describing type systems as rules Type rules for ML Type equality Type coercions Sub typing Purpose of type systems Kinds of type systems Primitive types Constructed types Type checking
E N D
Lecture #12, Feb. 21, 2007 • Basic Types • Constructed Types • Representing types as data • Describing type systems as rules • Type rules for ML • Type equality • Type coercions • Sub typing • Purpose of type systems • Kinds of type systems • Primitive types • Constructed types • Type checking • Attribute grammars • Inherited attributes • Synthesized attributes • Adding attributes to trees • Programs for computing attribute computations.
Assignments • Reading • Chapter 4, • Sections 4.3 (Attribute Grammar) • 4.4 (Adhoc syntax directed translation). • Pages 171-200 • Quiz on all of chapter 4 read so far (4.1-4.4) on Monday.
Type Checking • Type Checking, assigns a consistent type to every expression and statement. • Generally there are two kinds of types: • Basic types: int, real, bool, string etc. • Constructed types: array, list, products (pairs, triples, etc), pointers, records, functions. These contain instances of other types. • In ML types are quite simple. Constructed types include things like functions ( int - > string ), tuples (int*bool*string), lists (int list), etc. • In mini Java types are more complex because of classes and inheritance
Representing Types datatype MLtype = Unit | Int | Char | Bool . . . | Product of MLtype list | Arrow of (MLtype * MLtype); • Product and arrow types are the only constructed types in this example.
Describing Type Systems • A type system gives rules for assigning types to expressions and statements based upon the types of their sub-expressions and sub-statements. • A standard way to talk about type systems is to use an inference notation. • Let S (or some other symbol) stand for a mapping from names to types. • Then rules are of the form: S |- x S |- w -------------------- (usually x & w are “sub-pieces” of y) S |- y • Which is read as: “To show S derives y, show S derives x, and S derives w”.
Simple Rules • We always have simple rules, such as: (S x) = t --------------------- S |- x : t and S |- 5 : int • To show S derives x (x a variable) has type t, show that the mapping S applied to x is t. • And the integer 5 has type int (regardless of what's derivable from S, that’s why there is nothing above the line). Think of S as a table. In this table we look up the types of simple objects like variables. So (S x) = t means that when we look up the type of x we find the type t For Primitives, like constants, we just know their types.
Complex Rules • To implement rules like this we would use an attribute computation. • Note that the mapping, S, might change as we move around the program. • Declarations add to S, adding types to new variables. • Exiting a local scope means removing things from S • S is usually implemented as an inherited attribute. And the types that we derive annotate the tree and are a synthesized attributes.
Rules for ML expressions (S x) = t --------------------- S |- x : t (where x = is a variable) S |- n : int (where n = an integer constant like 5 or 23) S |- c : char (where c = character constant like #”a” ) S |- b : bool (where b = boolean like true or false)
ML expression types (cont) S |- x : a S |- f : a -> t ------------------------------------ S |- f x : t S |- x : t1 S |- y : t2 (S <+>)= t1 * t2 -> t3 ----------------------------- where <+> is a binary operator like + or * S |- x <+> y : t3 Note how the domain of the function must have the same type as the actual argument.,
ML statement types • Statements in ML are semi-colon separated expressions inside of parentheses. • E.g. (print x; x + 1) • The expressions before that last are executed only for their side effects. They can have any type. The type of the last expression is the type of the statement. S |- ei : ai S |= en : t -------------------------------------- S |- (e1; … ;en) : t
Assignment and If expressions S |- x : t ref S |- y : t ------------------------------------- S |- x := y : unit S |- x : bool S |= s1 : t S |- s2 : t --------------------------------------- S |- if x then s1 else s2 : t
ML while stmt S |- e : bool S |- s : a ------------------------------ S |- while e do s : unit
ML anonymous function types S+(x,a) |- e : b ------------------------------ S |- (fn x => e) :: a -> b S+(x,a) means add the mapping of variable x to tha type a to the table S. If S already has a mapping for x, then overwrite it with a
Implementing the Rules • To implement the rules, use an inductive function which takes an expression and returns a MLtype. • Any error that occurs indicates the expression can’t be well typed. • The mapping S is an inherited attribute. • That means it changes as we move around the program • If a type appears more than once in any rule, it must be the same for all occurrences. • This requires that we check that two types are equal. • In an language without polymorphic types, this is simple. The function below illustrates structural equality for types
Type Equality fun typeeq (x,y) = case (x,y) of (Void,Void) => true | (Int,Int) => true | (Char,Char) => true | (Bool,Bool) => true | (Arrow(d1,r1),Arrow(d2,r2)) => typeeq(d1,d2) andalso typeeq(r1,r2) | (Product(ss),Product(ts)) => (listeq ss ts) | (_,_) => false and listeq (x::xs) (y::ys) = typeeq(x,y) andalso listeq xs ys | listeq [] [] = true | listeq _ _ = false Note we need mutually recursive functions
Type Equality in more complicated type systems • For more complicated type systems type equality can be quite difficult without some simplifying rules. • For example any type system that allows names for types, or recursive type definitions may not be able to use structural equality. • Why? A system with names says before comparing for equality substitute out each name. • What if a name has a recursive definition? • Names are important in real systems because they allow recursive definitions, but hard to test for equality. Such systems often have two ways of declaring types. And use name equality. • i.e.. don’t substitute a definition for a name. • Two types with different names could have identical definitions, but not be equal.
Example type intlist = pointer(variant record tag nil {}, tag cons { car : int; cdr : intlist }); datatype 'a list = nil | cons of 'a * 'a list; • If we tried to compare two recursive things for equality using structural equality we could get into an infinite loop.
Type Coercions • Using type systems to infer coercions. • Sometimes we would like operators to be overloaded, so we have to infer which one to use. • Type checking not only annotates the tree it might insert things as well. typecheck: (string -> Pascaltype) -> (string Exp) -> (string Exp') where Exp' is an annotated type similar to Exp but with type annotations.
Example Algorithm fun typecheck S x = case x of ... | (Binop(oper,x,y)) => let val x' = typecheck S x val y' = typecheck S y in case (tagof x', tagof y') of (Int,Int) => Binop'(Int, intversion oper, x', y') | (Real,Real) => Binop'(Real, realversion oper, x', y') | (Int,Real) => Binop'(Real, realversion oper, int2real x', y') | (Real,Int) => Binop'(Real, realversion oper, x', int2real y') end ...
Sub Typing • In some languages there is subtyping. • For example the type 3 .. 12 is a subtype of the type Int. • A function that expects an Int could take an element which was a subtype of Int. This is called subsumption. • Such rules might be expressed as: S |- xi : si & (si <= ti) S |- P : t1 * ... * tn -> Void --------------------------------- S |- P(x1, ... ,xn) : Void • The type system would have to be able to check the <= relationship between types, just as we computed type equality.
Types • Purpose • Types describe both the form and behavior of valid programs • Safety • Bad things do not happen • Expressiveness • Operator overloading • Context sensiitve meaning • Can’t be expressed in the syntax of the language without richer types of grammars. • Efficiency • By choosing implementation that depend on type information the most efficient ones can be used • Representation information • Types often express knowledge about how a value is represented • How much space it takes up • Whether it is a pointer
Kinds of Type Systems • Untyped • No type information is carried by the data • No type checking is done, any kind of type mistake causes a run time error. The error is often mysterious • Core dump • Examples: parts of C, assembly language • Dynamically typed • Data carries type information • Tags • Pointers into discreet ranges • Operations test the type information before running • Type errors are caught at run-time, often tell what exactly went wrong • E.g. Lisp, Basic, certain parts of the class herrachy in Java • Statically typed • Data may or may not have type information • Errors are caught be disallowing programs that don’t type check • E.g. ML, Haskel, Parts of Java.
Basic Types • Numbers • Int • Int64 • Unsigned • . . . • Characters • Traditionally 8 bit • Usually 16 or 32 bit with the advent of unicode and extended character sets. • Booleans • Sometimes 1 bit • Often the same representation as int
Constructed Types • Arrays • Homogeneous • Contiguous • Strings • Usually special syntax for string constants • Sometimes arrays • Sometimes linked list representation • Enumerated types • Finite number of elements • In ML • Datatype Color = Red | Blue | Green | Yellow | Purple | Orange
Products • Products are heterogeneous aggregates • Structures • Tuples • Records • Sometimes have named fields, sometimes uses pattern matching or integer indexing • person.age • fun f (name,age,address) = . . . • person.#1
Pointers • Pointers are a low level (implementation level) mechanism to describe indirection. • In some languages, Notably C, one can do pointer arithmetic. • Advantages • Uniform representation size. All pointers have the same size. • Sharing (change what is pointed to, all other pointers observe the change). • Disadvantages • Null or dangling pointers • Sometimes hard to know what type is the thoing pointed to
Unions • When an element of a type can take on one of a number of different forms • Variants • Unions • Datatypes in ML • Classes in Java
Type Checking • Trys to catch errors where data is used in a manner inconsistent with its definition • Operators take specific types as operands • Functions take specific types are arguments • Only pointer types can be de-referenced • Only union types can be “cased” over
Type Checking • Typing is directed by the syntax, or the structure of the program. • Usually performed by walking the abstract syntax tree. • Type checking attaches a type to every sub-expression (or piece of syntax) • This type is always given in terms of the type of the sub-expressions • Often we think of this type being an attribute of each syntax node. • Attribute grammars provide a natural way of describing this.
Examples • Grammar E -> E < E E -> E andalso E E -> number • Abstract Type type exp = And of exp * exp | Less of exp * exp | Num of int bool andalso bool < bool true int 3 2 int • attributes flowing up the tree are “synthesised” • attributes flowing down the tree are “inherited”
Example 2 synthesized attributes • Meaning = Code to compute E (represented as a (string * string) ) fun mean x = case x of Plus(x,y) => let val (namex,codex) = (mean x) val (namey,codey) = (mean y) val new = newtemp() in (new, codex ^ codey ^ (new ^ “ = “ ^namex ^ “ + “ ^ namey)) end | Times(x,y) => let val (namex,codex) = (mean x) val (namey,codey) = (mean y) val new = newtemp() in (new, codex ^ codey ^ (new ^ “ = “ ^namex ^ “ * “ ^ namey)) end | Num n => let val new = newtemp() in (new, new ^” = “^(int2str n)) end
Example 2 (cont) (“T5”, “T1 = 5 T2 = 3 T3 = 2 T4 = T2 * T3 T5 = T1 + T4”) (“T4”, “T2 = 3 T3 = 2 T4 = T2 * T3” ) + * 5 (“T1”,“T1 = 5”) 3 2 (“T2”,“T2 = 3”) (“T3”, “T3 = 2”)
Fdecl Args Type Name Decl int Decl f Type Name Type Name Body t s bool int Decl Stmt Inherited attributes (int,f)(int,s)(bool,t) int f(int s, bool t) { int temp ; temp = 0 if (t ) return s; return (temp+3); } (int,f) (bool,t) (Int,s) (int,f)(int,s)(bool,t) (int,temp) Type Name (int,f)(int,s)(bool,t)(int,temp) temp int
Attribute Grammar Computations • Decorating the syntax tree. • Computing Synthesized attributes proceeds from the leaves to the root. • Synthesized computations are implemented by an inductive function, where the value computed (returned) is the synthesized attribute. • Computing Inherited attributes passes information from the root to the leaves. • Inherited computations are implemented by an inductive function with an extra parameter.
Example: Synthesized datatype exp = Int of int | Real of real | Op of exp * string * exp; datatype value = I of int | R of real; exception mix_matched_type fun operate x s y = case (x,s,y) of (I n,"+",I m) => I (n+m) | (I n,"*",I m) => I (n*m) | (R n,"+",R m) => R (n+m) | (R n,"*",R m) => R (n*m) | _ => raise mix_matched_type
Example (cont.) fun translate e = case e of Int n => I n | Real r => R r | Op(x,s,y) => let val xv = translate x val yv = translate y in operate xv s yv end • Note that information flows Up the tree. • We recursively translate the leaves before we translate a node. • This causes a bottom up flow.
Explicitly Annotating the tree • If we want to build a tree which has explicit annotations we need to define a type which has “room” for the annotations. • Use polymorphism to encode a type with “room” for an annotation at each node. (the a in the types below) datatype Exp a = Int' of a * int | Real' of a * real | Op' of a * (Exp a) * string * (Exp a); fun getattr e = case e of Int'(x,n) => x | Real'(x,r) => x | Op'(x,a,s,b) => x;
Explicit Annotation Example fun translate e = case e of Int’(_,n) => Int'(I n,n) | Real’(_,r) => Real'(R r,r) | Op’(_,x,s,y) => let val xv = translate x val yv = translate y in Op'(operate (getattr xv) s (getattr yv), v,s,yv) end • Note we ignore the attribute (use the wild card pattern) on the way down, and rebuild the tree with the correct attribute on the way up.
Inherited Attributes • Consider an expression language with declarations and implicit coercions. • Grammar: E -> id E -> ( E ) E -> id : Type in E E -> E op E • datatype: datatype exp = Id of string | ItoR of exp (explicit coercion) | DeclI of string * exp | DeclR of string * exp | Op of exp * string * exp; • Example: • var x : int in • var y : real in (x + y) * x x needs to be coerced to a real
var var x int var x int var real * y real * y + Real + x Real y x y x x Inherited Attribute Tree var x : int in var y : real in (x + y) * x We want We have
[ ] var [ (x, int) ] x int var [ (y, real), (x, int),] real y * [ (y, real), (x, int) ] + x [ (y, real), (x, int) ] [ (y, real), (x, int) ] y x Inherited Computation
Simultaneous Synthesized Computation [ (x, int), (y, real) ] real, Op(I2R(Var(x)),”*” Op(I2R(Var(x)) “+”, Var(y))) * [ (y, real), (x, int) ] x [ (y, real), (x, int) ] int,Var(x) + real, Op(I2R(Var(x)), “+”, Var(y)) [ (y, real), (x, int) ] [ (y, real), (x, int) ] x y int,Var(x) real,Var(y)
The Computation • The computation uses an extra parameter of type list(string * type) as the inherited attribute. • It returns a type * exp as the synthesized attribute. • We need a modified Op constructor that uses the type info to add the explicit I2R annotations. fun AnnOp ( (t1,e1),oper,(t2,e2) ) = case (t1,t2) of (int,int) => (int, Op(e1,oper,e2)) | (real,int) => (real, Op(e1,oper,I2R e2)) | (int,real) => (real, Op(I2R e1,oper, e2)) | (real,real) => (real, Op(e1,oper,e2))
Algorithm fun translate e types = case e of Var s => (lookup s types,Var s) | DeclI(s,e) => DeclI(s, translate e ( (s,int) :: types )) | DeclR(s,e) => DeclR(s, translate e ( (s,real) :: types )) | Op(x,s,y) => let val xv = translate x types val yv = translate y types in AnnOp(xv,s,yv) end • Note how the extra parameter types is used to add information and pass it “down” the tree.
Overview • The key to writing successful attribute computations is thinking ahead. • First identify what the synthesized attributes are. Think of a type that will represent these. This is the return type of the computation. • Second identify what the inherited attributes are. For each one there will be an extra parameter. • The final type will be something like: syntaxtree -> inh1 -> inh2 -> (syn1 * syn2)
Tagging the tree • Sometimes the synthesized attribute needs to be added to the tree rather than be returned. • Again thinking ahead is important. • The tree needs room for the extra attribute • The tree used as input doesn’t contain any interesting values as the attribute. The attributes in the input are usually ignored • Rather than return the synthesized attribute, the program returns a new tree. • The final type will be something like: syntaxtree a -> inh1 -> inh2 -> syntaxtree (syn1 * syn2) Note the tag is originally some type that we ignore. But the output is a new tree with filled in attribute (or attributes as a tuple)