440 likes | 547 Views
Modern Compiler Design. T8 – Semantic Analysis: Type Checking . Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University gretay@post.tau.ac.il http://www.cs.tau.ac.il/~gretay. Example. String. String. “drive” + “drink”. string. (in Java ). ERROR. in Cool ?. Int.
E N D
Modern Compiler Design T8 – Semantic Analysis: Type Checking Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University gretay@post.tau.ac.il http://www.cs.tau.ac.il/~gretay
Example String String “drive” + “drink” string (in Java ) ERROR in Cool ? Int String 12 + “Rumster” ERROR
Executable code exe COOL code txt Today Goals • Implementing Type Checking for Cool • Cool Type Rules Coming up: MIPS tutorial LexicalAnalysis Syntax Analysis Parsing AST SymbolTableetc. Inter.Rep. (IR) Code Gen.
Inheritance Graph • What do we put in it? • all predefined classes: Int, String, Bool, Object, IO • all user-defined classes • Why do we put it there ? • to be able to check that something is a type • to be able to compare types (for type checking) • Implementation • mapping class names to nodes in the graph • node points to its superclass node
Major Semantic Tasks • Inheritance graph construction and checking • Symbol table construction • construct symbol table for features using inheritance tree • assign enclosing scope for each AST node • Scope checking • check program using symbol tables • Check for Main class and main method • Type checking • uses inheritance graph and symbol tables • assign type for each AST node
Semantic Checks • Scope rules • identifiers are defined • no multiple definition of same identifier • local variables are defined before used • … • Type rules • which types can be combined with certain operator • assignment of expression to variable • formal and actual parameters of a method call • …
Notation for Inference Rules • By tradition inference rules are written Hypothesis1 Hypothesis2 ... Hypothesisn Conclusion • means “it is provable that ...” • Cool type rules have hypothesis and conclusion O,M,C e: T [ NAME] conditions type environment
Type Checking - Implementation • Single traversal over AST • Types passed up the tree • Type environment passed down the tree
globals Symbol kind Foo class Foo Symbol kind test Int->Int x Int test Symbol c Int O(x) = Int int_const object int_const val=32 name=x val=45 O,M,C x: Int O,M,C int_const : Int Example … lt plus neg lookup(x) : Int : Int (45 + x) < (~32)
O,M,C E1 : Int O,M,C E2 : Int O,M,C E1 + E2 : Int O,M,C E1 : Int O,M,C ~E1 : Int O,M,C E1 : Int O,M,C E2 : Int O,M,C E1 < E2 : Bool O(x) = Int int_const object int_const val=45 name=x val=32 O,M,C x: Int O,M,C int_const : Int Example … lt : Bool : Int : Int plus neg : Int : Int : Int (45 + x) < (~32)
Cool Types • How types are represented ? • AbstractSymbol • compare types by comparing AbstractSymbols • When are types are created? • during lexer/parsing • predefined types
Cool Types • The types are • Class Names • SELF_TYPE • The user declares types for identifiers • The semantic analysis infers types for expressions • Every expression has a unique type • A language’s typesystem specifies which operations are valid for which types • The goal of type checking is to ensure that operations are used with the correct types • Enforces intended interpretation of values • Cool is statically typed: type checking during compilation
Cool Types Environment O, M, K e: T • gives types to free identifiers in the current scope • information about the formal parameters and return type of methods • the class in which an expression e appears • Why separate object/methods? • In Cool, the method and object identifiers live in different name spaces type environment
Cool Types Environment O, M, K e: T • O --- mapping Object Id’s to types • symbol table - current scope • O(x) = T • M --- mapping methods to method signatures • M(C, f) = (A, B, D) • means there is a method f(a:A, b:B): D defined in class C (or its ancestor) • K --- the class in which expression e appears • when SELF_TYPE is involved type environment
O(x) = T O,M,C x: T O,M,C int_const : Int Type rules O, M, K e1 : Int O, M, K e2 : Int O, M, K e1 + e2 : Int • Rules give templates describing how to type expressions • By filling the templates, we can produce complete typings for expressions
O,M,C E1 : Int O,M,C E2 : Int O,M,C E1 + E2 : Int 1 is an integer O,M,C 1 : Int 2 is an integer O,M,C 2: Int [Int] [Int] [Add] O,M,C 1+2: Int O,M,C int_const : Int Example 1 +2 • Templates: • Inference:
Object IO D E C A B Subtyping • Define a relation on classes • X X • X Y if X inherits from Y • X Z if X Y and Y Z A C B Object E D and D E
Least Upper Bounds • Z is the least upper bound of X and Y lub(X, Y)=Z • X Z and Y Z Z is upper bound • X Z’ and Y Z’ Z Z’Z is the least upper bound
lub(A,B) = C lub(C,D) = D lub(C,E) = Object In Cool, the least upper bound of two types is their least common ancestor in the inheritance tree Object IO D E C A B Least Upper Bounds
Dynamic and Static Types in Cool • A variable of static type A can hold the value of static type B if B A class A { ... } class B inherits A { ... } class Main ( x: A new A ; ... x new B ; } static type of x is A dynamic type of x is A dynamic type of x is B
Dynamic and Static Types • The dynamic type of an object is the class that is used in the new expression • a runtime notion • even languages that are statically typed have dynamic types • The static type of an expression captures all the dynamic types that the expression could have • a compile-time notion
Static vs. Dynamic Typing • Static typing proponents say: • Static checking catches many programming errors • Prove properties of your code • Avoids the overhead of runtime type checks • C, Java, Cool, ML • Dynamic typing proponents say: • Static type systems are restrictive • Rapid prototyping difficult with type systems • Complicates the programming language and the compiler • Compiler optimizations can hide costs • machine code • In practice, most code is written in statically typed languages with escape mechanisms • Unsafe casts in C, Java • union in C
Soundness • A type system is sound if e. dynamic_type(e) static_type(e) • whenever the inferred type of e is T ( denoted by O,M,C e : T ) • Then e evaluates to a value of type T in all executions of the program • We only want sound rules • But some sound rules are better than others
O (x) = T0 O,M,K e1 : T1 T1 T0 O,M,K x e1 : T1 [Assign] A Assign class A { foo() : A { … } }; class B inherits A { }; ... let x:B in x (new B).foo(); let x:A in x (new B).foo(); let x:Object in x (new B).foo(); ERROR OK OK
O,M,K e0: T O(T/x) e1 : T1 O,M,K let x: Te0 in e1 : T1 O,M,K e0: T O(T0/x) e1 : T1 T T0 O,M,K let x: T0 e0 in e1 : T1 Let Rule with Initialization • Both rules are sound but more programs typecheck with the second one • uses subtyping Weak rule class A { foo():C { … } }; class B inherits A { }; ... let x:A new B in x.foo();
O,M,K e0 :Bool O,M,K e1: T1 O,M,K e2 : T2 O,M,K if e0 then e1 else e2 fi : lub(T1, T2) Object IO D E C A B If-Then-Else foo(a:A, b:B, c:C, e:E) : D { if (a < b) then e else c fi } lub(E,C) = Object Is Object D ? ERROR
Object IO D E IO C A B Case O,M,K e : T O[X/x],M,K e1: E O[Y/y],M,K e2: F O[Z/z],M,K e3: G O,M,K case e of x: X =>e1; y:Y =>e2 ; z:Z=>e3 esac : lub(E,F,G) foo(d:D) : D { case d of x : IO => let a:A(new A) in x; y : E => (new B); z : C => z; esac }; lub(IO,B,C) = Object and Object D ERROR
Object IO D E A C A B Case O,M,K e : T O[X/x],M,K e1: E O[Y/y],M,K e2: F O[Z/z],M,K e3: G O,M,K case e of x: X =>e1; y:Y =>e2 ; z:Z=>e3 esac : lub(E,F,G) foo(d:D) : D { case d of x : IO => let a:A(new A) in a; y : E => (new B); z : C => z; esac }; lub(A,B,C) = C and C D OK
SELF_TYPE Example • What can be a dynamic type of object returned by foo() ? • any subtype of A class A { foo() : A { self } ; }; class B inherits A { ... }; class Main { B x (new B).foo(); }; class A { foo() : SELF_TYPE { self } ; }; class B inherits A { ... }; class Main { B x (new B).foo(); }; ERROR OK
SELF_TYPE • Research idea • Helps type checker to accept more correct programs C,M,K (new A).foo() : A C,M,K (new B).foo() : B • SELF_TYPE is NOT a dynamictype • Meaning of SELF_TYPE depends on where it appears textually
Where can SELF_TYPE appear ? • Parser checks that SELF_TYPE appears only where a type is expected • How ? • But SELF_TYPE is not allowed everywhere a type can appear
Where can SELF_TYPE appear ? • class T1 inherits T2 { … } • T1, T2 cannot be SELF_TYPE • x : SELF_TYPE • attribute • let • case • new SELF_TYPE • creates an object of the same type as self • foo@T(e) • T cannot be SELF_TYPE • foo(x:T1):T2 {…} • only T2 can be SELF_TYPE
new Example class A { foo() : A { new SELF_TYPE }; }; class B inherits A { … } … (new A).foo(); (new B).foo(); creates A object creates B object
Subtyping for SELF_TYPE • SELF_TYPEc SELF_TYPEc • SELF_TYPEc C • It is always safe to replace SELF_TYPEc with C • SELF_TYPEc T if C T • T SELF_TYPEc is always false • because SELF_TYPEc can denote any subtype of C
lub(T,T’) for SELF_TYPE • lub(SELF_TYPEc, SELF_TYPEc) = SELF_TYPEc • lub(T, SELF_TYPEc) = lub(T,C) • the best we can do
New Rules O, M, K self : SELF_TYPEk O, M, K new SELF_TYPE : SELF_TYPEk
Other rules • A use of SELF_TYPE refers to any subtype of the current class • Except in dispatch • because the method return type of SELF_TYPE might have nothing to do with the current class
O, M c : C O, M a : A O, M b : B M(C, foo) = (A1, B1,SELF_TYPE) A A1 , B B1 O, M, K c.foo(a, b): C Dispatch O, M c : C O, M a : A O, M b : B M(C, foo) = (A1, B1, D1) A A1 , B B1, D1 SELF_TYPE O, M, K c.foo(a, b) : D1 • which class is used to find the declaration of foo() ?
Example class A { foo() : A { self }; }; class B inherits A { … } … (new A).foo(); (new B).foo(); returns A object returns B object
Static Dispatch O, M, K c : C O, M, K a : A O, M, K b : B M(C, f) = (A1, B1, D1) A A1, B B1, C C1, D1 SELF_TYPE O, M, K c.f@C1(a, b): D1 • if we dispatch a method returning SELF_TYPE in class C1, do we get back C1 ? • No. SELF_TYPE is the type of self, which may be a subtype of the class in which the method appears O, M, K c : C O, M, K a : A O, M, K b : B M(C, f) = (A1, B1,SELF_TYPE) A A1, B B1, C C1 O, M, K c@C1.f@(a, b): C
SELF_TYPE Example class A { deligate : B; callMe() : ??? { deligate.callMe(); } ; }; class B { callMe() : SELF_TYPE { self }; }; class Main { A a (new A).callMe(); }; SELF_TYPE ERROR
Error Handling • Error detection is easy • Error recovery: what type is assigned to an expression with no legitimate type ? • influences type of enclosing expressions • cascading errors let y : Int x + 2 in y + 3 • Better solution: special type No_Type • inheritance graph can be cyclic
Summary • Type is a set of values • Type rules are logical rules of inference • Soundness of (static) type systems • for every expression e, for every value v of e at runtime v values_of(static_type(e)) • static_type(e) may actually describe more values • can reject correct programs • Becomes more complicated with subtyping (inheritance)
Major Semantic Tasks • Inheritance graph construction and checking • Symbol table construction • construct symbol table for features using inheritance tree • assign enclosing scope for each AST node • Scope checking • check program using symbol tables • Check for Main class and main method • Type checking • uses inheritance graph and symbol tables • assign type for each AST node • Remaining Semantic Checks • checks that require type + symbol table information combined • Disclaimer: This is a naïve phase ordering. Phases could be mixed, combined, optimized, re-ordered etc.