720 likes | 2.54k Views
Kanat Bolazar March 23, 2010. Compiler Design 16. Type Checking. Type Checking. The general topic of type checking includes two parts Type synthesis – assigning a type to each expression in the language
E N D
Kanat Bolazar March 23, 2010 Compiler Design16. Type Checking
Type Checking • The general topic of type checking includes two parts • Type synthesis – assigning a type to each expression in the language • Type checking – making sure that these types are used in contexts where they are legal, catching type-related errors • Strongly typed languages are ones in which every expression can be assigned an unambiguous type • Weakly typed languages could have run-time errors due to type incompatibility • Statically typed languages vs. dynamically type languages • Capable of being checked at compile time vs. not • Static type checking vs. dynamic type checking • Actually check types at compile time vs. run time • Run time type checking is less efficient because type information must be stored
Dynamic Typing Example: Duck Typing • Rule: "If it walks like a duck, quacks like a duck, call it a duck" • No need to declare ahead of time as subtype of duck • Just define the operations. Python example: • class Person: • def quack(self): • print "The person imitates a duck." • def in_the_forest(duck): • duck.quack() • def game(): • john = Person() • in_the_forest(john)
Base Types • Numbers • integer • C specifies length in relative terms, short and long; OS and machine-dependent; makes porting programs to other OS and machine harder. • Java specifies specific lengths: byte (8), short (16), int (32) and long (64) bits respectively. • floating point numbers • Many languages have two sizes • Can use IEEE representation standards • Java float and double are 32 and 64 bits • Characters • Single letter, digit, symbol, etc; used to be 8 bit ASCII standard • Now can also be a 16 bit Unicode • Booleans • Two values: true and false • C uses the 0 and 1 subrange of integers
Java Example: Strings are Objects • Some languages have strings as a base type with catenation operators • In Java, strings are objects of class String: • "this".length() returns 4 • In Java, variables of an object type hold reference only: • String a = "this"; • String b = a; // reference to same string object, ref count = 2 • if (a == b) ... // reference comparison: returns true • This last check is not how you want to check String comparison; reference may not be same but value equal: • if (a.equals(b)) ... // true if a and b have same string value • if (a == b) ... // true only if a and b point to same object in heap
Compound Types: Arrays and Strings • Arrays – aggregate of values of the same type • Arrays have a base type for elements, may have an indexing range for each dimension int a [100][25], in C • If the indexing range is known, then the compiler can compute space allocation, otherwise indexing is relative and space is allocated by a run-time allocator • The main operation is indexing; some languages allow whole array operations • Strings – sequence of characters • Also can have bit strings • C treats strings as arrays of characters • Strings can have comparison operators that use lexicographic order “fee” < “fie”
More Compound Types • Records or Structures – components may have different types and may be indexed by names struct { double r; int i; } • Representation as ordered product of elements • Variant records or Unions – a component may be one of a choice of types union { double r; int i; } • Representation can save space for the largest element and may have a tag to indicate which type the value is • Take care not to have run-time (value) errors
Other Types • Enumerated types – the programmer can create a type name for a specific set of constant values enum WeekDay {Sunday, Monday, … Saturday} • Representation as for a small set • Pointers – abstraction of addresses • Can create a reference to, or derefence an object • Distinguish between “pointer to integer” and “pointer to boolean”, etc. • C allows arithmetic on pointers • Void • Classes – may or may not create new types • Classes can be represented by an extended type of record for data and methods • But are not just types since they also have features such as inheritance
Function Types, New Type Names • Procedure and function types are sometimes called signatures • Give number and types of parameters • May include parameter-passing information such as by value or by reference • Give type of result (or indicate no result) strlength : String -> unsigned int • Type declarations or type definitions allow the programmer to assign new type names to a type expression • These names should also be stored in the symbol table • Will have scope and other attributes • Definitions may be recursive in some languages • If so, size of object will be unknown
Representing Types • Types can be represented as expressions, or quite often as trees: array(9) function type arglist result type argn type arg1 type
Type Equivalence • Name equivalence – in this kind of equivalence, two types must have the same name • Assumes that the programmer will introduce new names exactly when they want types to be different if you say t1 = int and t2 = int, t1 and t2 are different • Structural equivalence – two objects are interchangeable if their types have the same fields with equivalent types. int x[10][10] and int y[10][10]x and y have equivalent types, the 10 by 10 arrays • More complex situations arise in structural equivalence of other compound types, e.g. may have mutually recursive type definitions • Type checking rules extend this to a notion of type compatibility
Type Synthesis and Type Checking • Assigning types to language expressions (type synthesis) can be done by a traversal of the abstract syntax tree • At each node, a type checking rule will say which types are allowed • Description here is for languages with type declarations • Constants are assigned a type • If there is not a known type, then there will be a set of possibles • Variables are looked up in the symbol table • Note that we are assuming an L-attributed grammar so that declarations are processed first • Assignment • The type of the assignable entity on the left must be the typeEqual to the type of the value on the right
Types for Expressions • Arithmetic and other operators have result types defined in terms of the types of the subnodes in the tree • Statements have substructures that need to be checked for type correctness • Condition of if and while statements must have type boolean • Array reference • Suppose we have exp1 -> exp2[exp3], then an adhoc SDT:if (isArrayType(exp2.type)) and typeEqual(exp3.type, integer) then exp1.type = exp2.type.basetype // get the basetype child else type-error(exp1) • Function calls have similar rules to check the signature of the function name the parameters
Issues for typeEqual • This is sometimes called type compatibility • Overloading • Arithmetic operators 2 + 3 means integer addition 2.0 + 3.0 means floating pt addition • Language may have an arithmetic operator table, telling which type of operator will be used based on the expressions Type of: a b a+b int int int int float float int double double float float float float double double double double double
Overloading Functions • Can declare the same function (or method) name with different numbers and types of parameters int max(int x,y) double max(double x,y) • Java and C++ allow such overloaded declarations • Need to augment the symbol table functionality to allow for the name to have multiple signatures • The lookup procedure is always give a name to look up – we can add a typelist argument and the lookup can tell us if there is a function declared with that signature • Or the lookup procedure is given the name to look up – and in the case of a method, it can return sets of allowable signatures
Conversion and Coercion • The typeEqual comparison is commonly extend to allow arithmetic expressions of mixed type and for other cases of types which are compatible, but not equal • If mixed types are allowed in an arithmetic expression, then a conversion should be inserted (into the AST or the code) 2 * 3.14 becomes code like t1 = float ( 2) t2 = t1 * 3.14 • Conversion from one type to another is said to be implicit it if is done automatically by the compiler, and is also called coercion. • Conversion is said to be explicit if the programmer must write the conversion • Called casts in C and Java languages
Widening and Narrowing • The rules for Java type conversion distinguishes between • Widening conversions which preserve information • Narrowing conversion which can lose information • Conversions between primitive types in Java:(there is also widening and narrowing for references, i.e. objects) • Widening Narrowing (usually with a cast) double double float float long long int int short byte char char short byte
Generating Type Conversions • For an arithmetic expressions e1 e2 op e3, the algorithm can generally use widening, to possibly a third type that is greater than both the types of e2 and e3 in the widening tree: Let the new type of e1 be the max of e2.type and e3.type Generate a widening conversion of e1 if necessary Generate a widening conversion of e2 if necessary Set the type (and later the code) of e1 • Type conversions and coercions also apply to assignment if r is a double and i is an int: allow r = i; • C also allows i = r, with the corresponding loss of information • For classes, we have the subtype principle, objects of subclasses can be assigned to objects of superclasses (with the corresponding loss of information), but not vice versa
Continuing Type Synthesis and Checking Rules • Expressions: the two expressions involved with boolean operators, such as &&, must both be boolean • Functions: the type of each actual parameter must be typeEqual to its formal parameter • Classes: • if specified, the parent of the class must be a properly declared class • If a class says that it implements an interface, then all methods of the interface must be implemented
Polymorphic Typing • A language is polymorphic is language constructs can have more than one type procedure swap(anytype x, y)where anytype is considered to be a type variable • Polymorphic functions have type patterns or type schemes, instead of actual type expressions • The type checker must check that the types of the actual parameters fit the pattern • Technically, the type checker must find a substitution of actual types for type variables that satisfies the type equivalence between the formal type pattern and the actual type expression • In complex cases with recursion, may need to do unification to solve the substitution problem • Most notably in the language ML • Java and C# allow polymorphic type constructors, called generics
References • Original slides by Nancy McCracken. • Keith Cooper and Linda Torczon, Engineering a Compiler, Elsevier, 2004. • Kenneth C. Louden, Compiler Construction: Principles and Practices, PWS Publishing, 1997. • Aho, Lam, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley, 2006. (The purple dragon book) • Charles Fischer and Richard LeBlanc, Jr., Crafting a Compiler with C, Benjamin Cummings, 1991.