Chap 6: Type Checking/Semantic Analysis

Chap 6: Type Checking/Semantic Analysis Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre

Overview • Type Checking and Semantic Analysis is a Critical Features of Compilers and Compilation • Passing a Syntax Check (Parsing) not Sufficient • Type Checking Provides Vital Input • Software Engineers Assisted in Debugging Process • We’ll Focus on Classical Type Checking Issues • Background and Motivation • Type Analysis • The Notion of a Type System • Examining a Simple Type Checker • Other Key Typing Concepts • Concluding Remarks/Looking Ahead

Background and Motivation • Recall....

Background and Motivation • What we have achieved • All the “words” (Tokens) are known • The tree is syntactically correct • What we still do not know... • Does the program make sense ? • What we will not try to find out • Is the program correct ? • This is Impossible! • Our Concern: • Does it Compile? • Are all Semantic Errors Removed? • Do all Types and their Usage Make Sense?

Background and Motivation • The program makes “sense” • Programmer’s intent is clear • Program is semantically unambiguous • Data-wise • We know what each name denotes • We know how to represent everything • Flow-wise • We know how to execute all the statements • Structure-wise • Nothing is missing • Nothing is multiply defined • The program is correct • It will produce the expected input

Tasks To Perform • Scope Analysis • Figure out what each name refers to • Understand where Scope Exists (See Chapter 7) • Type Analysis • Figure out the type of each name • Names are functions, variables, types, etc. • Completeness Analysis • Check that everything is defined • Check that nothing is multiply defined

Output ? • What the analysis produce • Data structures “on the side” • To describe the types(resolve the types) • To describe what each name denotes (resolve the scopes) • A Decorated tree • Add annotations in the tree • Possibly.... Semantic Errors!

Pictorially

Type Analysis • Purpose • Find the type of every construction • Local variables • Actuals for calls • Formals of method calls • Objects • Methods • Expressions • Rationale • Types are useful to catch bugs!

Type Analysis • Why Bother ? A type system is a tractablesyntactic method for proving the absence of certain programbehaviors by classifying phrases according to the kind of values they compute.

Uses • Many! • Error detection • Detect early • Detect automatically • Detect obvious and subtle flaws • Abstraction • The skeleton to provide modularity • Signature/structure/interface/class/ADT/.... • Documentation • Program are easier to read • Language Safety guarantee • Memory layout • Bound checking • Efficiency

How It works ? • Classify programs according to the kind of values computed Set of All Programs Set of All Reasonable Programs Set of All Type-Safe Programs Set of All Correct Programs

How do we do this ? • Compute the type of every sentence • On the tree • With a tree traversal • Some information will flow up (synthesized) • Some information will flow down (inherited) • Questions to answer • What is a type ? • How do I create types ? • How do I compute types ?

Types... • Types form a language! • With • Terminals... • Non-terminals.... • And a grammar! • Alternatively • Types can be defined inductively • Base types (a.k.a. the terminals) • Inductive types (a.k.a. grammatical productions)

Base Types • What are the base types ? • int • float • double • char • void • bool • error

Inductive Type Definition • Purpose • Define a type in terms of other simple/smaller types • Example • array • pointer • reference • Pair (products in the book) • structure • function • methods • classes • ...

Relation to Grammar ? Type → array ( Type ,Type ) → pair( Type , Type ) → tuple( Type+ ) → struct( FieldType+) → fun ( Type ) : Type → method ( ClassType , Type ) : Type → pointer( Type ) → reference( Type ) → ClassType → BasicType ClassType → class ( name [ , Type] ) FieldType → name: Type BasicType → int | bool | char | float | double | void | error

Type Terms • What is that ? • It is a sentence in the type language • Example • int • pair(int,int) • tuple(int,bool,float) • array(int,int) • fun(int) : int • fun(tuple(int,char)) : int • class(“Foo”) • method(class(“Foo”), tuple(int,char)) : int

So... fun(tuple(int,char)) : int • If • we have Type Term • we have a Type Language • We can parse it and obtain.... • Type Trees!

The Notion of a Type System • Logical Placement of Type Checker: • Role of Type Checker is to Verify Semantic Contexts • Incompatible Operator/Operands • Existence of Flow-of Control Info (Goto/Labels) • Uniqueness w.r.t. Variable Definition, Case Statement Labels/Ranges, etc. • Naming Checks Across Blocks (Begin/End) • Function Definition vs. Function Call • Type Checking can Occur as “Side-Effect” of Parsing via a Judicious Use of Attribute Grammars! • Employ Type Synthesis Interm. Repres. Token Stream Parser Syntax Tree Type Checker Syntax Tree Int. Code Generator

Example of type synthesis • Assume the program if (1+1 == 2) then 1 + 3 else 2 * 3

Yes... But.... • What about identifiers ? • Key Idea • Type for identifiers are inherited attributes! • Inherits • From the definition • To the use site. int n; .... if (n == 0) then 1 else n

Example int n; .... if (n == 0) then 1 else n

The Notion of a Type System • Type System/Checker Based on: • Syntactic Language Construct • The Notion of Types • Rules for Assigning Types to Language Constructs • Strength of Type Checking (Strong vs. Weak) • Strong vs. Weak • Dynamic vs. Static • OODBS/OOPLS Offer Many Variants • All Expression in Language MUST have Associated Type • Basic (int, real, char, etc.) • Constructed (from basic and constructed types) • How are Type Expression Defined and Constructed?

Type Expressions • A Basic Type is a Type Expression • Examples: Boolean, Integer, Char, Real • Note: TypeError is Basic Type to Represent Errors • A Type Expression may have a Type Name which is also a Type Expression • A Type Constructor Applied to Type Expression Results in a Type Expression • Array(I,T): I is Integer Range, T is Type Expr. • Product: T1T2 is Type Expr if T1, T2 Type Exprs. • Record: Tuple of Field Names & Respective Type • Pointer(T): T is a Type Expr., Pointer(T) also

Type Expressions • A Type Constructor Applied to Type Expression Results in a Type Expression (Continued) • Functions: • May be Mathematically Characterized with Domain Type D and Range Type R • F: D  R • int  int  int • char  char  pointer(int) • A Type Expression May Contain Variables whose Values are Type Expressions • Called Type Variables • We’ll Omit from our Discussion …

Key Issues for Type System • Classical Type System Approaches • Static Type Checking (Compile Time) • Dynamic Type Checking (Run Time) • How is each Handled in C? C++? Java? • Language Level Issues: • Sound Type System (ML) • No Dynamic Type Checking is Required • All Type Errors are Determined Statically • Strongly Typed Language (Java, Ada) • Compiler Guarantees no Type Errors During Execution • Weakly Typed Language (C, LISP) • Allows you to Break Rules at Runtime • What about Today’s Web-based Languages?

The Notion of a Type System • Types System: Rules Used by the Type Checker to Asign Types to Expressions and Verify Consistency • Type Systems are Language/Compiler Dependent • Different Versions of Pascal have Different Type Systems • Same Language Can have Multiple Levels of Type Systems (C Compiler vs. Lint in Unix) • Different Compilers for Same Language May Implement Type Checking Differently • GNU C++ vs. MS Studio C++ • Sun Java vs. MS Java (until Sun forced off market) • What are the Key Issues?

First: Add Typing into Symbol Table P → D ; E D → D ; D D → id: T {addtype(id.entry, T.type)} T → char {T.type:= char} T → int {T.type:= int} T → array [ num] of T1 {T.Type:= array(1..num.val, T1.type} T → T {T.type:= pointer(T1.type)} Notes: • Assume Lexical recognition of id (in Lexical Analyzer) Adds id to Symbol Table • Thus – we Augment this with T.Type E → literal {E.type:= char} E → number {E.type:= integer} E → id {E.type:= lookup(id.entry) }

Remaining Typing More Complex • E1 E2 May be Mod, Array, or Ptr Expression • Useful Extensions would Include Boolean Type and Extending Expression with Rel Ops, AND, OR, etc. E → E1mod E2 {E.type := if E1.type = integer and E2.type = integer then integer else type_error} E → E1[E2] {E.type := if E2.type = integer and E1.type = array(s, t) then t else type_error} E → E1 {E.type := if E1.type = pointer(t) then t else type_error}

Extending Example to Statements • These Extensions are More Complex from a Type Checking Perspective • Right Now, only Individual Statements are Checked P → D ; S S → id =: E {S.type:=if id.type =E.type then void else type_error)} S → if E then S1 {S.type:=if E.type = boolean then S1.type else type_error)} S → while E do S1 {S.type:=if E.type = boolean then S1.type else type_error)} S → S1 ; S2 {S.type:=if S1.type = void and S2.type = void then void else type_error)}

What are Main Issues in Type Checking? • Type Equivalence: • Conditions under which Types are the Same • Tracking of Scoping – Nested Declarations • Type Compatibility • Conversion/casting, Nonconverting casts, Coercion • Type Inference • Determining the Type of a Complex Expression • Reviewing Remaining Concepts of Note • Overloading, Polymorphism, Generics • In OO Case: • Classes are OO version of a type • Issues Need to Consider Way Program in OO • In Older Languages like C, these are Critical

Structural vs. Name Equivalence of Types • Two Types are “Structurally Equivalent” iff they are Equivalent Under Following 3 Rules: • SE1: A Type Name is Structurally Equivalent to Itself • SE2: T1 and T2are Structurally Equivalent if they are Formed by Applying the Same Type Constructors to Structurally Equivalent Types • SE3: After a Type Declaration: Type n=T, the Type Name n is Structurally Equivalent to T • SE3 is “Name Equivalence” • What Do Programming Languages Use? • C: All Three Rules • Pascal: Omits SE2 and Restricts SE3 to be a Type Name can only be Structurally Equivalent to Other Type Names

Type Equivalence • Structural equivalence: equivalent if built in the same way (same parts, same order) • Name equivalence: distinctly named types are always different • Structural equivalence questions • What parts constitute a structural difference? • Storage: record fields, array size • Naming of storage: field names, array indices • Field order • How to distinguish between intentional vs. incidental structural similarities? • An argument for name equivalence: “They’re different because the programmer said so; if they’re

Type Equivalence Records and Arrays • Would record types with identical fields, but different name order, be structurally equivalent? • When are arrays with the same number of elements structurally equivalent? type PascalRec = record a : integer; b : integer end; val MLRec = { a = 1, b = 2 }; val OtherRec = { b = 2, a = 1 }; type str = array [1..10] of integer; type str = array [1..2 * 5] of integer; type str = array [0..9] of integer;

Consider Name Equivalence in Pascal • How are Following Compared: • By Rules SE1, SE2, SE3, allare Equivalent! • However: • Some Implementations of Pascal • next, last – Equivalent • p, q, r, - Equivalent • Other Implementations of Pascal • next, last – Equivalent • q, r, - Equivalent • How is Following Interpreted? type link = cell; var next : link; last : link; p : cell; q, r : cell; type link = cell; np = cell; npr = cell; var next : link; last : link; p : np; q, r : npr;

What about Classes and Equivalence? • Are these SE1? SE2? Or SE3? • What Does Java Require? public class person { private String lastname, firstname; private String loginID; private String password; }; public class user { private String lastname, firstname; private String loginID; private String password; };

Checking Structural Equivalence • Employ a Recursive Algorithm to Check SE2: • Algorithm Adaptable for Other Versions of SE • Constructive Equivalence Means Following are Same: • X: array[1..10] of int; • Y: array[1..10] of int;

Alias Types and Name Equivalence • Alias types are types that purely consist of a different name for another type • Is Integer assignable to a Stack_Element? Levels? • Can a Celsius and Fahrenheit be assigned to each other? • Strict name equivalence: aliased types are distinct • Loose name equivalence: aliased types are equivalence • Ada allows additional explicit equivalence control: TYPE Stack_Element = INTEGER; TYPE Level = INTEGER; TYPE Celsius = REAL; TYPE Fahrenheit = REAL; subtype Stack_Element is integer; type Celsius is new real; type Fahrenheit is new real;

Why is Degree of Type Equivalence Critical? • Governs how Software Engineers Develop Code… Why? • SE2 Alone Doesn’t Promote Well Designed, Thought Out, Software … Why? • Impacts on Team-Oriented Software Development… How? • With SE2 Alone, Errors are Harder to Locate and Correct… Why? • Increases Compilation Time with SE2 Alone … Why?

Scoping • What is the problem ? • Consider this example program class Foo { int n; Foo() { n = 0;} int run(int n) { int i; int j; i = 0; j = 0; while (i < n) { int n; n = i * 2; j = j + n; } return j; } };

Resolving the Issue • Observation • Scopes are always properly nested • Each new definition could have a different type • Idea • Make the typing environment sensitive to scopes • New operations on typing env. • Entering a scope • Effect: New declarations overload previous one • Leaving a scope • Effect: Old declarations become current again • What are the Issues? • Activating and Tracking Scopes!

Scoping • The Scopes class Foo { int n; Foo() { n = 0;} int run(int n) { int i; int j; i = 0; j = 0; while (i < n) { int n; n = i * 2; j = j + n; } return j; } }; Class Scope Method Scope Body Scope Block Scope Key point: Non-shadowed names remain visible

Handling Scopes • From a declarative standpoint • Introduce a new typing environment • Initially equal to the copy of the original • Then augmented with the new declarations • Discard environment when leaving the scope • From an implementation point of view • Environment directly accounts for scoping • How ? • Scope chaining!

Scope Chaining • Key Ideas • One scope = One hashtable • Scope chaining = A linked list of scopes • Abstract Data Type • Semantic Environment • pushScope • Add a new scope in front of the linked list • popScope • Remove the scope at the front of the list • lookup(name) • Search for an entry for name. If nothing in first scope, start scanning the subsequent scopes in the linked list.

Scope Chaining • Advantages • Updates are non-destructive • When we pop a scope, the previous list is unchanged since addition are only done in the top scope • The current list of scopes can be saved (when needed)

Entering & Leaving Scopes • Easy to find out... • Use the tree structure! • Entering scope • When entering a class • When entering a method • When entering a block • Leaving scope • End of class • End of method • End of block • We’ll Revisit in Chapter 7 on Runtime Environment!

Type Conversion • Certain contexts in certain languages may require exact matches with respect to types: • aVar := anExpression • value1 + value2 • foo(arg1, arg2, arg3, … , argN) • Type conversion seeks to follow these exact match rules while allowing programmers some flexibility in the values used • Using structurally-equivalent types in a name-equivalent language • Types whose value ranges may be distinct but intersect (e.g. subranges) • Distinct types with sensible/meaningful corresponding values (e.g. integers and floats)

Type Conversion • Refers to the Conversion Between Different Types to Carry out Some Action in a Program • Often Abused within a Programming Language (C) • Typically Used in Arithmetic/Boolean Expressions • r := i + r; (Pascal) • f := i + c; (C) • Two Kinds of Conversion: • Implicit: Automatically done by Compiler • Explicit: Type-Casts: Programmer Initiated (Ord, Chr, Trunc) • If X is a real array, which works faster? Why • for I:=1 to N do X[I] := 1; • for I:=1 to N do X[I] := 1.0; • A Good Optimizing Compiler will Convert 1st option!

Chap 6: Type Checking/Semantic Analysis