CSC321: Programming Languages

CSC321: Programming Languages Chapter 5: Types 5.1 Type Errors 5.2 Static and Dynamic Typing 5.3 Basic Types 5.4 NonBasic Types 5.5 Recursive Data Types 5.6 Functions as Types 5.7 Type Equivalence 5.8 Subtypes 5.9 Polymorphism and Generics 5.10 Programmer-Defined Types CSC321: Programming Languages

Types • A type is a collection of values and operations on those values. • Example: Integer type has values ..., -2, -1, 0, 1, 2, ... and operations +, -, *, /, <, ... • The Boolean type has values true and false and operations , , . • Computer types have a finite number of values due to fixed size allocation; problematic for numeric types. CSC321: Programming Languages

Type Problems • Exceptions: • Smalltalk uses unbounded fractions. • Haskell type Integer represents unbounded integers. • Floating point problems? • Even more problematic is fixed sized floating point numbers: • 0.2 is not exact in binary. • So 0.2 * 5 is not exactly 1.0 • Floating point is inconsistent with real numbers in mathematics. CSC321: Programming Languages

Purpose of Types • In the early languages, Fortran, Algol, Cobol, all of the types were built in. • If needed a type color, could use integers; but what does it mean to multiply two colors. • Purpose of types in programming languages is to provide ways of effectively modeling a problem solution. CSC321: Programming Languages

Data Interpretation • Machine data carries no type information. • Basically, just a sequence of bits. • Example: 0100 0000 0101 1000 0000 0000 0000 0000 • The floating point number 3.375 • The 32-bit integer 1,079,508,992 • Two 16-bit integers 16472 and 0 • Four ASCII characters: @ X NUL NUL CSC321: Programming Languages

Type Errors • A type error is any error that arises because an operation is attempted on a data type for which it is undefined. • Type errors are common in assembly language programming. • High level languages reduce the number of type errors. • A typesystem provides a basis for detecting type errors. CSC321: Programming Languages

Type Checking • A type system imposes constraints such as the values used in an addition must be numeric. • Cannot be expressed syntactically in EBNF. • Some languages perform type checking at compile time (e.g., C). • Other languages (e.g., Perl) perform type checking at run time. • Still others (e.g., Java) do both. CSC321: Programming Languages

Static and Dynamic Typing • A language is staticallytyped if the types of all variables are fixed when they are declared at compile time. • A language is dynamicallytyped if the type of a variable can vary at run time depending on the value assigned. • Can you give examples of each? CSC321: Programming Languages

Strongly Typing • A language is stronglytyped if its type system allows all type errors in a program to be detected either at compile time or at run time. • A strongly typed language can be either statically or dynamically typed. • Union types are a hole in the type system of many languages. • Most dynamically typed languages associate a type with each value. CSC321: Programming Languages

Basic Types • Terminology in use with current 32-bit computers: • Nibble: 4 bits • Byte: 8 bits • Half-word: 16 bits • Word: 32 bits • Double word: 64 bits • Quad word: 128 bits • In most languages, the numeric types are finite in size. So a + b may overflow the finite range. • Unlike mathematics: a+(b+c)(a+b)+c • Also in C-like languages, the equality and relational operators produce an int, not a Boolean CSC321: Programming Languages

Overloading • An operator or function is overloaded when its meaning varies depending on the types of its operands or arguments or result. • Java: a + b (ignoring size) • integer add • floating point add • string concatenation • Mixed mode: one operand an int, the other floating point, or one string and the other int CSC321: Programming Languages

Type Conversion • A type conversion is a narrowing conversion if the result type permits fewer bits, thus potentially losing information. • Otherwise it is termed a widening conversion. • Should languages ban implicit narrowing conversions? Why? • C++/Java: • Explicit narrowing conversion • Implicit widening conversion • How? CSC321: Programming Languages

Non-basic Types • Enumeration: • enum day {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday}; • enum day myDay = Wednesday; • In C/C++ the above values of this type are 0, ..., 6. • More powerful in Java: for (day d : day.values()) Sytem.out.println(d); CSC321: Programming Languages

Pointers • C, C++, Ada, Pascal • Java??? • Value is a memory address • Indirect referencing • Operator in C: * • Example: A Simple Linked List in C struct Node { int key; struct Node* next; }; struct Node* head; CSC321: Programming Languages

Pointer Operations • If T is a type and ref T is a pointer: & : T → ref T * : ref T → T • For an arbitrary variable x: *(&x) = x • Increment/decrement • String Copy void strcpy(char *p, char *q) { while (*p++ = *q++) ; } CSC321: Programming Languages

Problems with Pointers • Bane of reliable software development • Error-prone • Buffer overflow, memory leaks • Particularly troublesome in C • float sum(float a[ ],int n) { int i; float s = 0.0; for (i = 0; i<n; i++) s += a[i]; return s; • } • float sum(float *a,int n) { int i; float s = 0.0; for (i = 0; i<n; i++) s += *a++; return s; • } CSC321: Programming Languages

Dangling References • A dangling reference is a pointer to storage that is being used for another purpose; typically, the storage has been deallocated • Dangling reference may cause a program either crashing or giving the wrong results • Dangling reference may be caused by • Deallocation operations: a storage that is being pointed to by a pointer has been deallocated. • Procedure/function/method returns: the address of a local variable in a procedure is returned as the return value. Consider the following situation (C/C++): CSC321: Programming Languages

Arrays • A set of elements, but different languages have different ways to deal with arrays • Basically, an array has the following attributes: • Data type – the type of all elements • Dimension – number of dimensions: 1-dim, 2-dim, 3-dim, etc • Range of dimensions – lower and upper bound for each dimension • Subscripts – index. Most languages use integer, but others such as Pascal/Ada allow any discrete types CSC321: Programming Languages

Arrays Layout • 1-dim array: consecutive locations • High-dim array: • Row-major layout: consecutive locations arranged row by row e.g. a[3][3]: a[0,0], a[0,1], a[0,2], a[1,0], a[1,1], a[1,2], a[2,0], a[2,1], a[2,2] • Column-major layout: arranged column by column e.g. a[3][3]: a[0,0], a[1,0], a[2,0], a[0,1], a[1,1], a[2,1], a[0,2], a[1,2], a[2,2] CSC321: Programming Languages

Arrays Element Accessing • Element notation: • C/C++, Java, Algol: a[i][j][k] • Pascal: a[i, j, k] • Fortran: a(i, j, k), same as a function. Actually, Fortran treats accessing array elements as performing function evaluation • Pointer accessing: • C/C++ can use pointers to access array elements CSC321: Programming Languages

Arrays and Pointers • Equivalence between arrays and pointers: a = &a[0] • If either e1 or e2 is type: ref T: e1[e2]=*((e1)+(e2)) • Example: a is float[ ] and i int: a[i] = *(a + i) • double a[16]; • double *b; • b = &a[0]; // assign the base address of a to b • b++; // move b to the next location: a[1] • b + i; // move b to the next i-th location: a[i+1] CSC321: Programming Languages

Arrays Indexing • Index computation: Loc • Assume: base – the location of the first element low – the lower bound upp – the upper bound w – the length of one element • 1-dim: Loc(a[i]) = base + w*(i-low) = (base – low *w) + i*w = c + i*w, where c = base – low*w is a constant • 2-dim: Loc(a[i,j]) = base + w*[ r*(i-low1) + (j-low2) ] = base - w*( r*low1 + low2) + w*(r*i + j) where r = the number of rows above a[i,j], which is (upp2-low2 + 1) 3-dim: Loc(a[i,j,k]) = base + w*[r2*(r1*(i-low1) + (j-low2)) + (k-low3)] where r1 = (upp2-low2+1), r2 = (upp3-low3+1) ?? CSC321: Programming Languages

Arrays Binding • Name-size binding – bound evaluation: how many storage cells should be allocated to an array • Static binding – static evaluation: fixed size, determined at compile time, allocated at load time • Advantages: easy; efficient • Disadvantages: inflexible, waste storage • Dynamic binding – dynamic evaluation: changeable size, determined at run time, allocated dynamically • E.g. Java: Declare: double a[]; Create: a = new double[10]; a = new double[15]; • Extendable arrays: APL, SNOBOL, Perl • Java also has a kind of extendable array: Vector v = new Vector(5); CSC321: Programming Languages

Arrays Binding • Semi-dynamic binding – evaluation upon procedure entry: Ada, Pascal, C/C++, Java • The size (range of dimension) is a parameter. • Once the parameter is determined, the array size is fixed E.g. C/C++: double fun(int n) { double a[n]; …} Ada: declare m, n: Integer; begin declare type Matrix is array(Integer range<>, Integer range<>) of Real; a: Matrix(1..m, 1..n); begin … end … end CSC321: Programming Languages

Array Attributes • Some languages treat array as object/package, and maintain its attributes • E.g. Java: length --- a.length  the size of a • Ada: range: a’range Lower bound: a’first Upper bound: a’last • With these attributes, when an array is passed as a parameter, do not need to specify the number of elements in the array • However, some languages have to specify the length as a parameter • E.g. in C, array is passed by reference (call-by-reference) and does not have the length attribute. CSC321: Programming Languages

Strings • A special kind of array Now so fundamental, directly supported. • In C, a string is a 1D array with the string value terminated by a NUL character (value = 0). • In Java, Perl, Python, a string variable can hold an unbounded number of characters. • Libraries of string operations and functions. • Comparison, selection of substring, searching, matching (Most languages) • Replacement, appending, substring deletion (SNOBOL) • Concatenation: produce a new string • Java: String (static) and StringBuffer (dynamic) CSC321: Programming Languages

Structures/Records • Composed from simpler data items • Each data item (field) has a name • Data items may have different types (recall array, all elements must be in the same type) • Access to fields mostly uses .(dot) expression Pascal: type Date = record day: Integer; month: Integer; year: Integer; end var adate: Date; Access: adate.day C/C++: struct Date { int day; int month; int year; } C: struct Date adate; C++: Date adate; Access: adate.day CSC321: Programming Languages

Structures/Records • The order of fields in the record has no effect on the meaning • When a variable of record is declared, it is allocated storage • Record assignment is to assign all fields in a record • Collection of elements of different types • Used first in Cobol, PL/I • Absent from Fortran, Algol 60 • Common to Pascal-like, C-like languages • Omitted from Java as redundant CSC321: Programming Languages

Comparison of Arrays and Records • Component:all elements in an array have the same type, while components (fields) of a record may be different • Access:array elements are accessed by subscripts and element locations are determine at run time, while recordcomponents are accessed by dot expression and component names are determined at compile time • Assignment:array assignment is to assign the base location, while record assignment is to assign all components CSC321: Programming Languages

Unions • C: union • Pascal: case-variant record • Logically: multiple views of same storage • Useful in some systems applications type union = record case b : boolean of true : (i : integer); false : (r : real); end; var tagged : union; begin tagged := (b => false, r => 3.375); put(tagged.i); -- error end CSC321: Programming Languages

Simulated union type class Value extends Expression { // Value = int intValue | boolean boolValue Type type; int intValue; boolean boolValue; Value(int i) { intValue = i; type = new Type(Type.INTEGER); } Value(boolean b) { boolValue = b; type = new Type(Type.BOOLEAN); } } CSC321: Programming Languages

Recursive Data Type • Type definition refers to another type that refers to the current type definition • E.g. typelink = cell; // refer to cell typecell = record data: integer; next: link; // refer to link end; CSC321: Programming Languages

Dynamic Data Type • With dynamic data structures, you can not only operate on the data items in the structure, but also change the structure • Usually constructed with static data structures (such as arrays and records) and pointers • E.g. Linked lists, Trees • The basic components of these structures are records that contain pointers: nodes. Usually, each node has two parts: data items, and links (pointers/references) CSC321: Programming Languages

Functions as Types • Pascal example: • function newton(a, b: real; function f: real): real; • Know that f returns a real value, but the arguments to f are unspecified. • Java example • public interface RootSolvable { • double valueAt(double x); • } • public double Newton(double a, double b, RootSolvable f); CSC321: Programming Languages

Type Names and Equivalence • Type names • Each type may or may not have a name, such as int, Integer, Char, etc. are type names; while array [0..9] of integer is an anonymous type.Name equivalence • Type equivalence • Name equivalence • Structural equivalence CSC321: Programming Languages

Type Name Equivalence • Name equivalence: have the same name type • Pure name equivalence: exactly same name • Transitive name equivalence: • if S is equivalent to T which is equivalent to U, then S is equivalent to U • Type expression equivalence: • A type name is equivalent to itself. • Two type expressions are equivalent if they are formed by applying the same constructor to equivalent expressions. CSC321: Programming Languages

Type Structural Equivalence • Two type expressions are structurally equivalent if and only if they are equivalent under the following three rules • A type name is structurally equivalent to itself • Two types are structurally equivalent if they are formed by applying the same type constructor to structurally equivalent types • After a type declaration, typen = T, the type name n is structurally equivalent to T • Examples • integer is structurally equivalent to integer • Types S and T are structurally equivalent if they are declared as • type S = array [0..9] of integer; • type T = array [0..9] of integer; • Types S and integer are structurally equivalent if S is declared as: type S = integer; CSC321: Programming Languages

Type Equivalence struct complex { float re, im; }; struct polar { float x, y; }; struct { float re, im; } a, b; struct complex c, d; struct polar e; int f[5], g[10]; which are equivalent types: a, b, c, d, e, f, g? CSC321: Programming Languages

Type Equivalence Implementations • Pascal uses name equivalence, but ambiguous. • Modula-2 (the successor of Pascal) defines two types S and T to be compatible if • S and T are the same name, or • S = T is a type declaration: type S = T; or type T = S; or • One is a subrange of the other; or • Both are subranges of the same basic types • C uses structural equivalence for all types except records (struct) CSC321: Programming Languages

Subtypes • A subtype is a type that has certain constraints placed on its values or operations. • In Ada subtypes can be directly specified. subtype one_to_ten is Integer range 1 .. 10; type Day is (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday); subtype Weekend is Day range Saturday .. Sunday; type Salary is delta 0.01 digits 9 range 0.00 .. 9_999_999.99; subtype Author_Salary is Salary digits 5 range 0.0 .. 999.99; Integer i = new Integer(3); ... Number v = i; ... Integer x = (Integer) v; //Integer is a subclass of Number, // and therefore a subtype CSC321: Programming Languages

Polymorphism and Generics • A function or operation is polymorphic if it can be applied to any one of several related types and achieve the same result. • An advantage of polymorphism is that it enables code reuse. CSC321: Programming Languages

Polymorphism • Comes from Greek, and means having many forms • Example: overloaded built-in operators and functions + - * / == != ... • Ada, C++: define + - ... for new types • Java overloaded methods: number or type of parameters • Example: class PrintStream print, println defined for: boolean, char, int, long, float, double, char[ ] String, Object • Java: + also used for string concatenation • Java: instance variable, method: e.g., name, name( ) CSC321: Programming Languages

Generics • Ada generics: generic sort (Figs. 5.10 and 5.11) • parametric polymorphism • type binding delayed from code implementation to compile time • procedure sort is new generic_sort(integer); • Java generic classes • C/C++ templates CSC321: Programming Languages

Programmer-defined Types • Recall the definition of a type: • A set of values and a set of operations on those values. • Structures allow a definition of a representation • Problems: • Representation is not hidden • Type operations cannot be defined • Solutions: ADT – Abstract Data Types CSC321: Programming Languages

CSC321: Programming Languages