620 likes | 798 Views
Types. Aaron Bloomfield CS 415 Fall 2005. Why have Types?. Types provide context for operations a+b what kind of addition? pointer p = new object how much space? Limit valid set of operations int a = “string” know error before run time. Computer Hardware & Types. Remember CS216?
E N D
Types Aaron Bloomfield CS 415 Fall 2005
Why have Types? • Types provide context for operations • a+b what kind of addition? • pointer p = new object how much space? • Limit valid set of operations • int a = “string” know error before run time
Computer Hardware & Types • Remember CS216? • Bits in data are generally untyped • Bits interpreted as having a type, such as: • Instruction • Address • Floating-point number • Character • High-level language interprets bits
Type assignment • Language-dependent • a+b example again: • Ada: a and b declared with static types • Perl: $a and $b are dynamically typed • Scheme: checked at run-time • Smalltalk: checked at run-time • Bliss: completely untyped
Static vs. Dynamic Typing • First, what does strongly typed mean? • Prevents invalid operations on a type • Statically typed: strongly typed and type checks can be performed at compile time • Ada, Pascal: almost all type checking done at compile time, with just a few exceptions • C, C++, Java, Ocaml • Dynamically typed: type checking is delayed until run time • Scheme, Smalltalk, Perl
Defining/Declaring a Type • Fortran: variable’s type known from name • Letters I-N = integer type • Otherwise = real (floating-point) type • Scheme, Smalltalk: run-time typing • Most languages (such as C++): user explicitly declares type • int a, char b
Classifying types • A type is either one of a small group of built-in types or a user-defined composite • Built-in types include: • Integers • Characters: ASCII, Unicode • Boolean (not in C/C++) • Floating points (double, real, float, etc.) • Rarer types: • Complex (Scheme, Fortran) • Rational (Scheme) • Fixed point (Ada)
Number precision • Is the precision specified by the type in the language implementation? • Java, C, C++ Fortran: user specifies precision • Java: byte, short, int, long • Most other languages: implementation • Is the precision consistent across platforms? • Java: Yes • C/C++: No
Number precision • If user doesn’t specify type, code can lose portability due to differences in precision • In C/C++, precision is platform-dependent • Neat example in Haskell • 2 ^ 200 1606938044258990275541962092341162602522202993782792835301376 • Support for infinite precision types • Java provides BigInteger and BigDecimal for this purpose
Type Equivalence • Determining when the types of two values are the same struct student { string name; string address;} struct school { string name; string address;} • Are these the same? Should they be?
Structural Equivalence • The same components, put together the same way = same type • Algol-68, early Pascal, C (with exceptions) • And ML, somewhat • Straightforward and easy to implement • Definition varies from lang to lang • eg. Does the ORDER of the fields matter? • Back to example: are they the same? struct student { string name; string address;} struct school { string name; string address;} • Yes, they are (with structural equivalence)
Name Equivalence • More popular recently • A new name or definition = a new type • Java, current Pascal, Ada • Assume that: if programmer wrote two different definitions, then wanted two types • Is this a good or bad assumption to make? • Back to example: are they the same? struct student { string name; string address;} struct school { string name; string address;} • No, they are not (with name equivalence)
Problem! Aliases • Modula-2 example: TYPE celsius_temp = REAL; TYPE fahren_temp = REAL; VAR c : celsius_temp; f : fahren_temp; … f := c; • Modula has loose name equivalence, so this is okay • But normally it probably should be an error
Types of Name Equivalence • Strict: aliases are distinct types • Loose: aliases are equivalent types • Ada has both: • type test_score is integer; • type celsius_temp is new integer; type fahren_temp is new integer; • A derived type is incompatible with its parent type • now f := c will generate an error
Type Conversion • Conversion needed when certain types are expected but not received • a+b expecting either 2 ints or 2 floats • Switching between types may result in overflow • example: floating-point integer
Coercion • Can automatically force values to another type for the given context • a+b revisited: • Without coercion, if both aren’t the same type then there’s a compile-time error • With coercion, if either is floating-point, then addition is floating-point; otherwise int addition • Issue: Things are happening without you asking them to occur! Good/Bad?
Data Types Records (Structures) Variants (Unions) Arrays Sets
Records • Also known as ‘structs’ and ‘types’. • C struct resident { char initials[2]; int ss_number; bool married; }; • fields – the components of a record, usually referred to using dot notation.
Nesting Records • Most languages allow records to be nested within each other. • Pascal type two_chars = array [1..2] of char; type married_resident = record initials: two_chars; ss_number: integer; incomes: record husband_income: integer; wife_income: integer; end; end;
Memory Layout of Records • Fields are stored adjacently in memory. • Memory is allocated for records based on the order the fields are created. • Variables are aligned for easy reference. Optimized for space Optimized for memory alignment 4 bytes / 32 bits 4 bytes / 32 bits
Simplifying Deep Nesting • Modifying records with deep nesting can become bothersome. book[3].volume[7].issue[11].name := ‘Title’; book[3].volume[7].issue[11].cost := 199; book[3].volume[7].issue[11].in_print := TRUE; • Fortunately, this problem can be simplified. • In Pascal, keyword with “opens” a record. with book[3].volume[7].issue[11] do begin name := ‘Title’; cost := 199; in_print := TRUE; end;
Simplifying Deep Nesting • Modula-3 and C provide better methods for manipulation of deeply nested records. • Modula-3 assigns aliases to allow multiple openings with var1 = book[1].volume[6].issue[12], var2 = book[5].volume[2].issue[8] DO var1.name = var2.name; var2.cost = var1.cost; END; • C allows pointers to types • What could you write in C to mimic the code above?
Variant Records • variant records – provide two or more alternative fields. • discriminant – the field that determines which alternative fields to use. • Useful for when only one type of record can be valid at a given time.
Variant Records – Pascal Example type resident = record initials: array [1..2] of char; case married: boolean of true: ( husband_income: integer; wife_income: integer; ); false: ( income: real; ); id_number: integer; end; Case is TRUE Case is FALSE
Unions • A union is like a record • But the different fields take up the same space within memory union foo { int i; float f; char c[4]; } • Union size is 4 bytes!
Union example (from an assembler) union DisasmInst { #ifdef BIG_ENDIAN struct { unsigned char a, b, c, d; } chars; #else struct { unsigned char d, c, b, a; } chars; #endif int intv; unsigned unsv; struct { unsigned offset:16, rt:5, rs:5, op:6; } itype; struct { unsigned offset:26, op:6; } jtype; struct { unsigned function:6, sa:5, rd:5, rt:5, rs:5, op:6; } rtype; };
Another union example void CheckEndian() { union { char charword[4]; unsigned int intword; } check; check.charword[0] = 1; check.charword[1] = 2; check.charword[2] = 3; check.charword[3] = 4; #ifdef BIG_ENDIAN if (check.intword != 0x01020304) { /* big */ cout << "ERROR: Host machine is not Big Endian.\nExiting.\n"; exit (1); } #else #ifdef LITTLE_ENDIAN if (check.intword != 0x04030201) { /* little */ cout << "ERROR: Host machine is not Little Endian.\nExiting.\n"; exit (1); } #else cout << "ERROR: Host machine not defined as Big or Little Endian.\n"; cout << "Exiting.\n"; exit (1); #endif // LITTLE_ENDIAN #endif // BIG_ENDIAN }
Arrays • Group a homogenous type into indexed memory. • Language differences: A(3) vs. A[3]. • Brackets are preferred since parenthesis are typically used for functions/subroutines. • Subscripts are usually integers, though most languages support any discrete type.
Array Dimensions • C uses 0 -> (n-1) as the array bounds. • float values[10]; // ‘values’ goes from 0 -> 9 • Fortran uses 1 -> n as the array bounds. • real(10) values ! ‘values’ goes from 1 -> 10 • Some languages let the programmer define the array bounds. • var values: array [3..12] of real; (* ‘values’ goes from 3 -> 12 *)
Multidimensional Arrays • Two ways to make multidimensional arrays • Both examples from Ada • Construct specifically as multidimensional. matrix: array (1..10, 1..10) of real; -- Reference example: matrix(7, 2) • Looks nice, but has limited functionality. • Construct as being an array of arrays. matrix: array (1..10) of array (1..10) of real; -- Reference example: matrix(7)(2) • Allows us to take ‘slices’ of data.
Array Memory Allocation • An array’s “shape” (dimensions and bounds) determines how its memory is allocated. • The time at which the shape is determined also plays a role in determining allocation. • At least 5 different cases for determining memory allocation:
Array Memory Allocation • Global lifetime, static shape: • The array’s shape is known at compile time, and exists throughout the entire program. • Array can be allocated in static global memory. • int global_var[30]; void main() { }; • Local lifetime, static shape: • The array’s shape is known at compile time, but exists only as locally needed. • Array is allocated in subroutine’s stack frame. • void main() { int local_var[30]; }
Array Memory Allocation • Local lifetime, bound at elaboration time: • Array’s shape is not known at compile time, and exists only as locally needed. • Array is allocated in subroutine’s stack frame and divided into fixed-size and variable-sized parts. • main() { var_ptr = new int[size]; } • Arbitrary lifetime,bound at elaboration time: • Array is just references to objects. • Java does not allocate space; just makes a reference to either new or existing objects. • var_ptr = new int[size];
Array Memory Allocation • Arbitrary lifetime, dynamic shape • The array may shrink or grow as a result of program execution. • The array must be allocated from the heap. • Increasing size usually requires allocating new memory, copying from old memory, then de-allocating the old memory.
Memory Layout Options • Ordering of array elements can be accomplished in two ways: • row-major order – Elements travel across rows, then across columns. • column-major order – Elements travel across columns, then across rows. Row-major Column-major
day[0] S u n d a y M o n day[1] d a y T u e s d a day[2] y W e d n e s d a day[3] y T h u r s d a y day[4] F r i d a y S a day[5] t u r d a y day[6] Row Pointers vs. Contiguous Allocation • Row pointers – an array of pointers to an array. Creates a new dimension out of allocated memory. • Avoids allocating holes in memory. Array = 57 bytes Pointers = 28 bytes Total Space = 85 bytes
Row Pointers vs. Contiguous Allocation • Contiguous allocation - array where each element has a row of allocated space. • This is a true multi-dimensional array. • It is also a ragged array Array = 70 bytes
Array Address Calculation • Calculate the size of an element (1D) • Calculate the size of a row (2D) • row = element_size * (Uelement - Lelement + 1) • Calculate the size of a plane (3D) • plane = row_size * (Urows - Lrows + 1) • Calculate the size of a cube (4D) : :
A A(i) A(i, j) A(i, j, k) Memory Array Address Calculation • Address of a 3-dimenional array A(i, j, k) is: address of A + ((i - Lplane) * size of plane) + ((j - Lrow) * size of row) + ((k - Lelement) * size of element)
Sets • Introduced by Pascal, found in most recent languages as well. • Common implementation uses a bit vector to denote “is a member of”. • Example: U = {‘a’, ‘b’, …, ‘g’} A = {‘a’, ‘c’, ‘e’, ‘g’} = 1010101 • Hash tables needed for larger implementations. • Set of integers = (232 values) / 8 = 536,870,912 bytes
Enumerations • enumeration – set of named elements • Values are usually ordered, can compare enum weekday {sun,mon,tue,wed,thu,fri,sat} if (myVarToday > mon) { . . . } • Advantages • More readable code • Compiler can catch some errors • Is sun==0 and mon==1? • C/C++: yes; Pascal: no • Can also choose ordering in C enum weekday {mon=0,tue=1,wed=2…}
Lists • list – the empty list or a pair consisting of an object (list or atom) and another list (a . (b . (c . (d . nil)))) • improper list – list whose final pair contains two elements, as opposed to the empty list (a . (b . (c . d))) • basic operations: cons, car, cdr, append • list comprehensions (e.g. Miranda and Haskell) [i * i | i <- [1..100]; i mod 2 = 1]
Recursive Types • recursive type - type whose objects may contain references to other objects of the same type • Most are records (consisting of reference objects and other “data” objects) • Used for linked data structures: lists, trees struct Node { Node *left, *right; int data; }
Recursive Types • In reference model of variables (e.g. Lisp, Java), recursive type needs no special support. Every variable is a reference anyway. • In value model of variables (e.g. Pascal, C), need pointers.
Value vs. Reference • Functional languages • almost always reference model • Imperative languages • value model (e.g. C) • reference model (e.g. Smalltalk) • implementation approach: use actual values for immutable objects • combination (e.g. Java)
Pointers • pointer – a variable whose value is a reference to some object • pointer use may be restricted or not • only allow pointers to point to heap (e.g. Pascal) • allow “address of” operator (e.g. ampersand in C) • pointers not equivalent to addresses! • how reclaim heap space? • explicit (programmer’s duty) • garbage collection (language implementation’s duty)
Value Model – More on C • Pointers and single dimensional arrays interchangeable, though space allocation at declaration different int a[10]; int *b; • For subroutines, pointer to array is passed, not full array • Pointer arithmetic • <Pointer, Integer> addition int a[10]; int n; n = *(a+3); • <Pointer, Pointer> subtraction and comparison int a[10]; int * x = a + 4; int * y = a + 7; int closer_to_front = x < y;