Types

Types

Definition • A typeis a set Vof values, and a set Oof operations onto V. • Examples from C++: • The inttype: V = {INT_MIN, ... -1, 0, 1, ..., INT_MAX-1, INT_MAX} O = {<<, >>, +, -, *, /, %, =, ++, --, ...} • The chartype: V = {NUL, ..., '0', …, '9', …, 'A', ..., 'Z', ..., ‘a', ..., 'z', DEL} O = {<<, >>, =, ++, --, isupper(), islower(), toupper(), … } • The stringtype: V = {"", "A", "B", "C", ..., "AA", "AB", "AC", ..., "AAA", ...} O = {<<, >>, +, +=, [], find(), substr(), … }

Primitive Data Types • Nearly all languages provide a set of primitive data types. E.g. Name V C++ Ada Lisp bool false, true boolbooleanboole char the set of chars char character character int the integers intinteger integer real the realsdouble float real

Integer • Most common primitive data type • Often several sizes are supported • Usually supported directly by hardware • Leftmost bit representing sign • Python’s long integer type not supported directly by hardware • Negative representations • Sign-magnitude • Twos-complement • Ones-complement

Floating Point • Models real numbers, but the representations are often only approximations e.g. π or e would require infinite space on any system • Worse, simple decimal numbers in base-10 can not be represented exactly in base-2 • e.g. 0.1 is 0.0001100110011... in binary • Arithmetic operations result in loss of accuracy • Rounding and truncation errors • Languages for scientific use support at least two representations: float and double

Floating-Point Representations • IEEE floating-point standard 754

Decimal to Binary • Conversion splits the integer and fractional parts, converts each separately to binary • e.g. 3.37510 • 3 is 112 • 0.375 is 0.0112 (0.25 + 0.125 = 2-2+2-3) • 3.37510 = 11.0112 = 1.10112x 21 • First 1 is implicit in the representation • Exponent uses a bias of 127, that is, 127 is added to the true exponent • Thus, 3.37510 is: • 0 1000 0000 101 1000 0000 0000 0000 0000 • Note: zero is represented as all zeros

Boolean Types • Perhaps the simplest of all • Range of only two: 0 (for false) or 1 (for true) • Introduced in ALGOL 60, and included in most general-purpose languages since • Could be implemented in a single bit, but often implemented using a byte

Character Types • Stored as numeric encodings • Most common encoding: ASCII • Uses 0 to 127 to code 128 different characters • ISO 8859-1 (Latin-1) another 8-bit character code • Allows 256 different characters • Unicode began in 1991, a 16-bit character code • In 2000, also specified a 32-bit version • Java was the first widely used language to use Unicode • JavaScript, Python, Perl and C# have followed

Character String Types • Values consist of sequences of characters • Design issues: • Strings as character array, or primitive? • Static or dynamic length? • Common operations • Assignment • Catenation • Substring reference • Comparison • Pattern Matching

Strings in various languages • Historically, neither Fortran nor Algol 60 had support for strings • Cobol had statically sized strings • C, C++ strings are arrays of chars • Inherently unsafe • Ada strings are fixed size • Fortran 95, Perl, Java, Python have String as a built-in type

String Length Options • Static length • Length of string is set when created, cannot be changed e.g. Python, Ruby • Limited dynamic length • Varying length up to a declared maximum • Use end of string character • e.g. C • Dynamic length • e.g. JavaScript, Perl, standard C++ library • Advantage: maximum flexibility • Disadvantage: overhead of dynamic storage allocation and deallocation

Creating New Types • Given the fundamental types, new types can be created via type constructors. • Each constructor has 3 components: • The syntax used to denote that constructor; • The set of elements produced by that constructor; and • The operations associated with that constructor. • Three constructors: Product, Function, and Kleene closure

Constructor 1: Product The product constructor is the basis for aggregates. • The product of two sets A and B is denoted A B. • AB consists of all ordered pairs (a, b): a  A, b B. ABC consists of all ordered triples (a, b, c): a  A, b  B , c  C. AB … N consists of all ordered n-tuples (a, b, …, n): a  A, b  B , …, n  N. Example: the set bool  char has 256 elements: { …, (true, 'A'), (false, 'A'), (true, 'B'), (false, 'B'), …, }. • Operations associated with product are the projection operations: • first, applied to an n-tuple (s1, s2, …, sn) returns s1. • second, applied to an n-tuple (s1, s2, …, sn) returns s2. • nth, applied to an n-tuple (s1, s2, …, sn) returns sn.

struct Student { int id; double gpa; char gender; }; Student aStudent; Product Example: C++ Structs Formally, a Student consists of: intreal char Formally, a particular Student: aStudent.id = 12345; aStudent.gpa = 3.75; aStudent.gender = 'F'; is the 3-tuple: (12345, 3.75, 'F'). The C++ “dot-operator” is a projection operation: cout << aStudent.id // extract id << aStudent.gpa // extract gpa << aStudent.gender // extract gender << endl;

Constructor 2: Function The function constructor is the basis for subprograms. • The set of all functions from a set A to a set B is denoted (A) B. • A particular function f mapping A to B is denoted f (A)  B. Examples: • The set (char)  bool contains all functions that map char values into bool values, some C examples of which include: isupper('A') true islower('A') false isalpha('A') true isdigit('A') false isalnum('A') true isspace('A') false • The set (char)  char contains all functions that map char values into char values, some C examples of which include: tolower('A') 'a' toupper('a') 'A'

struct IntPair { int a, b; }; int Add(IntPair ip) { return ip.a + ip.b; }; Function and Product What does this set contain? (int  int)  int • All functions that map pairs of integers into an integer. Examples? +((2, 3)) 5 -((2,3)) -1 *((2, 3)) 6 /((2,3)) 0 Suppose we define an aggregate named IntPair: and then define a function named Add(): Add() is a member of the set: (intint) int • The function constructor lets us create new operations for a language.

Function Arity Product serves to denote an aggregate or an argument-list. • What does this set contain? (intint) bool • All functions that map pairs of integers into a boolean. Examples? ==((2, 3))false !=((2,3))true <((2, 3))true >((2,3)) false • Definition: • The number of operands an operation requires is its arity. • Operations with 1 operand are unary operations, with arity-1. • Operations with 2 operands are binary operations, with arity-2. • Operations with 3 operand are ternary operations, with arity-3. • ...

intminimum(int first, int second) { return (first < second) ? first : second; }; Example: Ternary Operation The C/C++ conditional expression has the form: <expr>0 ? <expr>1 : <expr>2 producing <expr>1 if <expr>0 is true, and producing <expr>2 if <expr>0 is false. Here is a simple minimum() function using it: The C/C++ conditional expression is a ternary operation, which in this case is a member of the set: ?:(boolintint) int

C++ Expr Category Value Lisp Expr Category Value Operator Positioning Operators are also categorized by theirposition relative to their operands: • Infix operators appear between their operands: 1 + 2 • Prefix operators appear before their operands: +1 2 • Postfix operators appear after their operands: 1 2 + * + 2 3 - 4 2  (2 + 3) * (4 - 2)  2 3 + 4 2 - * Prefix, infix, and postfix notation are different conventions for the same thing; a language may choose any of them: x < y binary, infix true, false (< x y) binary, prefix true, false ++x unary, prefix x+1 (incf x) unary, prefix x+1 11 + 12 binary, infix 23 (+ 11 12) binary, prefix 23 !flag unary, prefix neg. of flag (not flag) unary, prefix neg. of flag cout << x binary, infix cout (princ x str) binary, prefix x x++ unary, postfix x None

Constructor III: Kleene Closure Kleene Closure is the basis for representing sequences. • The Kleene Closure of a set A is denoted A*. • The Kleene Closure of a set is the set of all tuples that can be formed using elements of that set. Example: The Kleene Closure of bool --bool* -- is the infinite set: { (), (false), (true), (false, false), (false, true), (true, false), (true, true), (false, false, false), … } • For a tuple t  A*, the operations include: null(())true null((false))false null(A*) bool null((true))false first((true, false))true first(A*)  A first((false, true)) false rest((true, true, false))(true,false) rest(A*)  A* rest((false, true, true))(true, true)

Kleene Closure Example If char is the set of ASCII characters, what is char* ? • The infinite set of all tuples formed from ASCII characters. (AKA the set of all character strings). The C/C++ notation: "Hello" is just a different syntax for: ( 'H', 'e', 'l', 'l', 'o' ) Thus, int* denotes a sequence (array, list, …) of integers; int intStaticArray[32]; int * intDynamicArray = new int[n], vector<int> intVec; list<int> intList; real* denotes a sequence (array, list, …) of reals; and so on.

void print(ostream out, int * a) { if ( !null(a) ) { out << first(a) << ' '; print(out, rest(a)); } }; char & operator[](int * a, int i) { if (i > 0) return operator[](rest(a), i-1); else return first(a); }; Sequence Operations Sequence operations can be built via null(),first(), and rest() • An output operation can be defined like this (pseudocode): • A subscript operation can be defined like this (pseudocode): In Lisp: first is called car rest is called cdr.

Practise Using Constructors Give formal descriptions for: • The logical and operation (&&): • How many operands does it take? 2 bool, bool • What types are its operands? • What type of value does it produce? bool So && is a member of (bool  bool)  bool • The C++/STL substring operation ( str.substr(i,n) ): 3 • How many operands does it take? string, int, int • What types are its operands? • What type of value does it produce? string So substr() is a member of: (string intint)  string • The logical negation operation (!):

C++ record: struct Student { int myID; string myName; bool iAmFullTime; double myGPA; }; • An accessormethod: struct Student { int myID; int id() const ; string myName; bool iAmFullTime; double myGPA; }; Practise • How does this affect our Student description?

More Practise • A “completely functional” class: class Student { public: Student(); Student(int, string, bool, double); int id() const; string name() const; boolfullTime() const; double gpa() const; void read(istream &); void print(ostream &) const; private: intmyID; string myName; booliAmFullTime; double myGPA; };

Summary of Constructors • Product constructor allows us to add record types • Record is an aggregate of values of unrestricted types, with each value being accessible via a name (i.e. projection) • Kleene closure constructor allows us to add sequence types • A sequence is an aggregate of values of the same type • e.g. Arrays (adjacent memory), Lists (possibly non-adjacent) • Function constructor allows us to add operations • Using available operations e.g. projection, first(), rest()

Ordinal Types • A type in which the range of possible values can be easily associated with the set of positive integers. • e.g. integer, char, boolean • Two user-defined ordinal types often supported • Enumerations • Subrange

Modeling Real-World Values Suppose we want to model the seven “ROY G BIV” colors. const int RED=0, ORANGE=1, YELLOW=2, GREEN=3, BLUE=4, INDIGO=5, VIOLET=6; intaColor = BLUE; One approach: This approach requires the human to map colors to integers. Instead: enum Color { RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET } ; Color aColor = BLUE; Most imperative languages support such enumerations… Ada: type Color = ( RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET ) ; aColor : Color := BLUE; An enumeration is a type whose values are explicitly listed.

Enumerations: Compiler-Side An enumeration’s values must be valid identifiers: <enumeration-type> ::= enum identifier { <id-list> } ; and the compiler treats a declaration: enum NewType { id0, id1, id2, …, idN-1 }; as being (approximately) equivalent to: const int id0=0, id1=1, id2=2 …, idN-1=N-1 }; Thus, after processing enum Color { RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET }; so far as the compiler is concerned: RED==0 && ORANGE==1 && YELLOW==2 && … && VIOLET==6

Enumerations: User Side Enumerations thus provide an automaticmeans of mapping: (identifier) int whose chief benefit is better program readability: enumElementName { HYDROGEN, HELIUM, … }; ElementNameanElement; // ... switch (anElement) { case HYDROGEN: atomicNumber = 1; break; case HELIUM: atomicNumber = 2; break; … } Enumerations allow real-world ‘values’ to be represented using real-world names, instead of (arbitrary) integers.

Color Element Red Violet Hydrogen E113 Orange … Indigo Helium … E112 Enumerations and SmallTalk OO purists replace enums with class hierarchies: This permits the creation of real-world objects: // Smalltalk aColor := new Blue. // Smalltalk anElement := new Helium. as opposed to real-world valuesprovided by an enumeration. For this reason, “pure” OO languages like Smalltalk don’t provide an enumeration mechanism.

Subrange • A type whose values are a subset of an existing type // Ada subtype TestScore is Integer range 0..100; subtype CapitalLetter is Character range 'A'..'Z'; type DaysOfWeek is (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday); subtype WeekDay is DaysOfWeek range Monday..Friday; If a subrange variable is declared: WeekDay today; and assigned an invalid value: today := Saturday; then an exception occurs that, if not caught, halts the system. This is an essential feature for life-critical systems.

Array Types • An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate, relative to the first element.

Array Design Issues • What types are legal for subscripts? • Are subscripting expressions in element references range checked? • When are subscript ranges bound? • When does allocation take place? • What is the maximum number of subscripts? • Can array objects be initialized? • Are any kind of slices allowed?

Array Indexing • Indexing (or subscripting) is a mapping from indices to elements array_name (index_value_list)  an element • Index Syntax • FORTRAN, PL/I, Ada use parentheses • Ada explicitly uses parentheses to show uniformity between array references and function calls because both are mappings • Most other languages use brackets

Arrays Index (Subscript) Types • FORTRAN, C: integer only • Pascal: any ordinal type (integer, Boolean, char, enumeration) • Ada: integer or enumeration (includes Boolean and char) • Java: integer types only • C, C++, Perl, and Fortran do not specify range checking • Java, ML, C# specify range checking

Subscript Binding and Array Categories • Three choices to make: • Type of binding to subscript ranges • Time of binding to storage • Location of storage • Static: subscript ranges are statically bound and storage allocation is static (before run-time) • Advantage: efficiency (no dynamic allocation)

Subscript Binding and Array Categories (continued) • Fixed stack-dynamic: subscript ranges are statically bound, but the allocation is done at declaration time (during execution) • Advantage: space efficiency • Stack-dynamic: subscript ranges are dynamically bound and the storage allocation is also dynamic (done at run-time) • Advantage: flexibility (the size of an array need not be known until the array is to be used)

Subscript Binding and Array Categories (continued) • Fixed heap-dynamic: similar to fixed stack-dynamic: storage binding is dynamic but fixed after allocation (i.e., binding is done when requested and storage is allocated from heap, not stack) • Heap-dynamic: binding of subscript ranges and storage allocation is dynamic and can change any number of times • Advantage: flexibility (arrays can grow or shrink during program execution)

Examples • C and C++ arrays that include static modifier are static • C and C++ arrays without static modifier are fixed stack-dynamic • Ada arrays can be stack-dynamic • C and C++ provide fixed heap-dynamic arrays • C# includes a second array class ArrayList that provides heap-dynamic • Perl and JavaScript support heap-dynamic arrays

Array Initialization • Some languages allow initialization at the time of storage allocation • C, C++, Java, C# example int list [] = {4, 5, 7, 83} • Character strings in C and C++ char name [] = “freddie”; • Java initialization of String objects String[] names = {“Bob”, “Jake”, “Joe”};

More Examples • Ada • List : array (1..5) of Integer := (1, 3, 5, 7, 9); • Bunch : array (1..5) of Integer := (1=>17, 3=>34, others =>0); • Python • [expression for iterate_varin array if condition] • [x * x] for x in range(12) if x%3 == 0] • [0, 9, 36, 81]

Array Operations • Ada allows array assignment but also catenation (&) • Fortran provides elemental operations • Operate between pairs of array elements • For example, + operator between two arrays results in an array of the sums of the element pairs of the two arrays • APL provides the most powerful array processing operations for vectors and matrixes as well as unary operators (for example, to reverse column elements)

Rectangular and Jagged Arrays • A rectangular array is a multi-dimensioned array in which all of the rows have the same number of elements and all columns have the same number of elements • myArray[3,7] • A jagged matrix has rows with varying number of elements • Possible when multi-dimensioned arrays actually appear as arrays of arrays • myArray[3][7]

Slices • A slice is some substructure of an array; nothing more than a referencing mechanism • Slices are only useful in languages that have array operations

Slice Examples • Fortran 95 Integer, Dimension (10) :: Vector Integer, Dimension (3, 3) :: Mat Integer, Dimension (3, 3) :: Cube Vector (3:6) is a four element array

Slices Examples in Fortran 95

Implementation of Arrays • Access function maps subscript expressions to an address in the array • Access function for single-dimensioned arrays: address(list[k]) = address (list[lower_bound]) + ((k-lower_bound) * element_size) (arrayBaseAddress - firstIndexElementSize) + kElementSize • At Issue: There is an efficiency-vs-convenience tradeoff: • Accesses to 0-relative arrays require two fewer operations: (arrayBaseAddress - 0ElementSize) + iElementSize = arrayBaseAddress + iElementSize • Programmer-specified index values can be pretty convenient: type LetterCounter is array(CapitalLetter) of integer; type DailySales is array(WeekDay) of real;

Accessing Multi-dimensioned Arrays • Two common ways: • Row major order (by rows) – used in most languages • column major order (by columns) – used in Fortran • Efficiency issue: sequential memory accesses will be faster • For each dimension of an array, one add and one multiply instruction are required for the access function.

Types

Types

Presentation Transcript

Types

Types

Types

Types

Types

Types

TYPES TYPES OF FERMENTER

Types

Types

Types

Types

Types

Types

Types

Types

Types

Types

TYPES

Types

Types

Types

Types