660 likes | 882 Views
Types. Definition. A type is a set V of values, and a set O of operations onto V . Examples from C++: The int type: V = {INT_MIN, ... -1, 0, 1, ..., INT_MAX-1, INT_MAX} O = {<<, >>, +, -, *, /, %, =, ++, --, ...} The char type:
Definition • A typeis a set Vof values, and a set Oof operations onto V. • Examples from C++: • The inttype: V = {INT_MIN, ... -1, 0, 1, ..., INT_MAX-1, INT_MAX} O = {<<, >>, +, -, *, /, %, =, ++, --, ...} • The chartype: V = {NUL, ..., '0', …, '9', …, 'A', ..., 'Z', ..., ‘a', ..., 'z', DEL} O = {<<, >>, =, ++, --, isupper(), islower(), toupper(), … } • The stringtype: V = {"", "A", "B", "C", ..., "AA", "AB", "AC", ..., "AAA", ...} O = {<<, >>, +, +=, [], find(), substr(), … }
Primitive Data Types • Nearly all languages provide a set of primitive data types. E.g. Name V C++ Ada Lisp bool false, true boolbooleanboole char the set of chars char character character int the integers intinteger integer real the realsdouble float real
Integer • Most common primitive data type • Often several sizes are supported • Usually supported directly by hardware • Leftmost bit representing sign • Python’s long integer type not supported directly by hardware • Negative representations • Sign-magnitude • Twos-complement • Ones-complement
Floating Point • Models real numbers, but the representations are often only approximations e.g. π or e would require infinite space on any system • Worse, simple decimal numbers in base-10 can not be represented exactly in base-2 • e.g. 0.1 is 0.0001100110011... in binary • Arithmetic operations result in loss of accuracy • Rounding and truncation errors • Languages for scientific use support at least two representations: float and double
Floating-Point Representations • IEEE floating-point standard 754
Decimal to Binary • Conversion splits the integer and fractional parts, converts each separately to binary • e.g. 3.37510 • 3 is 112 • 0.375 is 0.0112 (0.25 + 0.125 = 2-2+2-3) • 3.37510 = 11.0112 = 1.10112x 21 • First 1 is implicit in the representation • Exponent uses a bias of 127, that is, 127 is added to the true exponent • Thus, 3.37510 is: • 0 1000 0000 101 1000 0000 0000 0000 0000 • Note: zero is represented as all zeros
Boolean Types • Perhaps the simplest of all • Range of only two: 0 (for false) or 1 (for true) • Introduced in ALGOL 60, and included in most general-purpose languages since • Could be implemented in a single bit, but often implemented using a byte
Character Types • Stored as numeric encodings • Most common encoding: ASCII • Uses 0 to 127 to code 128 different characters • ISO 8859-1 (Latin-1) another 8-bit character code • Allows 256 different characters • Unicode began in 1991, a 16-bit character code • In 2000, also specified a 32-bit version • Java was the first widely used language to use Unicode • JavaScript, Python, Perl and C# have followed
Character String Types • Values consist of sequences of characters • Design issues: • Strings as character array, or primitive? • Static or dynamic length? • Common operations • Assignment • Catenation • Substring reference • Comparison • Pattern Matching
Strings in various languages • Historically, neither Fortran nor Algol 60 had support for strings • Cobol had statically sized strings • C, C++ strings are arrays of chars • Inherently unsafe • Ada strings are fixed size • Fortran 95, Perl, Java, Python have String as a built-in type
String Length Options • Static length • Length of string is set when created, cannot be changed e.g. Python, Ruby • Limited dynamic length • Varying length up to a declared maximum • Use end of string character • e.g. C • Dynamic length • e.g. JavaScript, Perl, standard C++ library • Advantage: maximum flexibility • Disadvantage: overhead of dynamic storage allocation and deallocation
Creating New Types • Given the fundamental types, new types can be created via type constructors. • Each constructor has 3 components: • The syntax used to denote that constructor; • The set of elements produced by that constructor; and • The operations associated with that constructor. • Three constructors: Product, Function, and Kleene closure
Constructor 1: Product The product constructor is the basis for aggregates. • The product of two sets A and B is denoted A B. • AB consists of all ordered pairs (a, b): a A, b B. ABC consists of all ordered triples (a, b, c): a A, b B , c C. AB … N consists of all ordered n-tuples (a, b, …, n): a A, b B , …, n N. Example: the set bool char has 256 elements: { …, (true, 'A'), (false, 'A'), (true, 'B'), (false, 'B'), …, }. • Operations associated with product are the projection operations: • first, applied to an n-tuple (s1, s2, …, sn) returns s1. • second, applied to an n-tuple (s1, s2, …, sn) returns s2. • nth, applied to an n-tuple (s1, s2, …, sn) returns sn.
struct Student { int id; double gpa; char gender; }; Student aStudent; Product Example: C++ Structs Formally, a Student consists of: intreal char Formally, a particular Student: aStudent.id = 12345; aStudent.gpa = 3.75; aStudent.gender = 'F'; is the 3-tuple: (12345, 3.75, 'F'). The C++ “dot-operator” is a projection operation: cout << aStudent.id // extract id << aStudent.gpa // extract gpa << aStudent.gender // extract gender << endl;
Constructor 2: Function The function constructor is the basis for subprograms. • The set of all functions from a set A to a set B is denoted (A) B. • A particular function f mapping A to B is denoted f (A) B. Examples: • The set (char) bool contains all functions that map char values into bool values, some C examples of which include: isupper('A') true islower('A') false isalpha('A') true isdigit('A') false isalnum('A') true isspace('A') false • The set (char) char contains all functions that map char values into char values, some C examples of which include: tolower('A') 'a' toupper('a') 'A'
struct IntPair { int a, b; }; int Add(IntPair ip) { return ip.a + ip.b; }; Function and Product What does this set contain? (int int) int • All functions that map pairs of integers into an integer. Examples? +((2, 3)) 5 -((2,3)) -1 *((2, 3)) 6 /((2,3)) 0 Suppose we define an aggregate named IntPair: and then define a function named Add(): Add() is a member of the set: (intint) int • The function constructor lets us create new operations for a language.
Function Arity Product serves to denote an aggregate or an argument-list. • What does this set contain? (intint) bool • All functions that map pairs of integers into a boolean. Examples? ==((2, 3))false !=((2,3))true <((2, 3))true >((2,3)) false • Definition: • The number of operands an operation requires is its arity. • Operations with 1 operand are unary operations, with arity-1. • Operations with 2 operands are binary operations, with arity-2. • Operations with 3 operand are ternary operations, with arity-3. • ...
intminimum(int first, int second) { return (first < second) ? first : second; }; Example: Ternary Operation The C/C++ conditional expression has the form: <expr>0 ? <expr>1 : <expr>2 producing <expr>1 if <expr>0 is true, and producing <expr>2 if <expr>0 is false. Here is a simple minimum() function using it: The C/C++ conditional expression is a ternary operation, which in this case is a member of the set: ?:(boolintint) int
C++ Expr Category Value Lisp Expr Category Value Operator Positioning Operators are also categorized by theirposition relative to their operands: • Infix operators appear between their operands: 1 + 2 • Prefix operators appear before their operands: +1 2 • Postfix operators appear after their operands: 1 2 + * + 2 3 - 4 2 (2 + 3) * (4 - 2) 2 3 + 4 2 - * Prefix, infix, and postfix notation are different conventions for the same thing; a language may choose any of them: x < y binary, infix true, false (< x y) binary, prefix true, false ++x unary, prefix x+1 (incf x) unary, prefix x+1 11 + 12 binary, infix 23 (+ 11 12) binary, prefix 23 !flag unary, prefix neg. of flag (not flag) unary, prefix neg. of flag cout << x binary, infix cout (princ x str) binary, prefix x x++ unary, postfix x None
Constructor III: Kleene Closure Kleene Closure is the basis for representing sequences. • The Kleene Closure of a set A is denoted A*. • The Kleene Closure of a set is the set of all tuples that can be formed using elements of that set. Example: The Kleene Closure of bool --bool* -- is the infinite set: { (), (false), (true), (false, false), (false, true), (true, false), (true, true), (false, false, false), … } • For a tuple t A*, the operations include: null(())true null((false))false null(A*) bool null((true))false first((true, false))true first(A*) A first((false, true)) false rest((true, true, false))(true,false) rest(A*) A* rest((false, true, true))(true, true)
Kleene Closure Example If char is the set of ASCII characters, what is char* ? • The infinite set of all tuples formed from ASCII characters. (AKA the set of all character strings). The C/C++ notation: "Hello" is just a different syntax for: ( 'H', 'e', 'l', 'l', 'o' ) Thus, int* denotes a sequence (array, list, …) of integers; int intStaticArray[32]; int * intDynamicArray = new int[n], vector<int> intVec; list<int> intList; real* denotes a sequence (array, list, …) of reals; and so on.
void print(ostream out, int * a) { if ( !null(a) ) { out << first(a) << ' '; print(out, rest(a)); } }; char & operator[](int * a, int i) { if (i > 0) return operator[](rest(a), i-1); else return first(a); }; Sequence Operations Sequence operations can be built via null(),first(), and rest() • An output operation can be defined like this (pseudocode): • A subscript operation can be defined like this (pseudocode): In Lisp: first is called car rest is called cdr.
Practise Using Constructors Give formal descriptions for: • The logical and operation (&&): • How many operands does it take? 2 bool, bool • What types are its operands? • What type of value does it produce? bool So && is a member of (bool bool) bool • The C++/STL substring operation ( str.substr(i,n) ): 3 • How many operands does it take? string, int, int • What types are its operands? • What type of value does it produce? string So substr() is a member of: (string intint) string • The logical negation operation (!):
C++ record: struct Student { int myID; string myName; bool iAmFullTime; double myGPA; }; • An accessormethod: struct Student { int myID; int id() const ; string myName; bool iAmFullTime; double myGPA; }; Practise • How does this affect our Student description?
More Practise • A “completely functional” class: class Student { public: Student(); Student(int, string, bool, double); int id() const; string name() const; boolfullTime() const; double gpa() const; void read(istream &); void print(ostream &) const; private: intmyID; string myName; booliAmFullTime; double myGPA; };
Summary of Constructors • Product constructor allows us to add record types • Record is an aggregate of values of unrestricted types, with each value being accessible via a name (i.e. projection) • Kleene closure constructor allows us to add sequence types • A sequence is an aggregate of values of the same type • e.g. Arrays (adjacent memory), Lists (possibly non-adjacent) • Function constructor allows us to add operations • Using available operations e.g. projection, first(), rest()
Ordinal Types • A type in which the range of possible values can be easily associated with the set of positive integers. • e.g. integer, char, boolean • Two user-defined ordinal types often supported • Enumerations • Subrange
Modeling Real-World Values Suppose we want to model the seven “ROY G BIV” colors. const int RED=0, ORANGE=1, YELLOW=2, GREEN=3, BLUE=4, INDIGO=5, VIOLET=6; intaColor = BLUE; One approach: This approach requires the human to map colors to integers. Instead: enum Color { RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET } ; Color aColor = BLUE; Most imperative languages support such enumerations… Ada: type Color = ( RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET ) ; aColor : Color := BLUE; An enumeration is a type whose values are explicitly listed.
Enumerations: Compiler-Side An enumeration’s values must be valid identifiers: <enumeration-type> ::= enum identifier { <id-list> } ; and the compiler treats a declaration: enum NewType { id0, id1, id2, …, idN-1 }; as being (approximately) equivalent to: const int id0=0, id1=1, id2=2 …, idN-1=N-1 }; Thus, after processing enum Color { RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET }; so far as the compiler is concerned: RED==0 && ORANGE==1 && YELLOW==2 && … && VIOLET==6
Enumerations: User Side Enumerations thus provide an automaticmeans of mapping: (identifier) int whose chief benefit is better program readability: enumElementName { HYDROGEN, HELIUM, … }; ElementNameanElement; // ... switch (anElement) { case HYDROGEN: atomicNumber = 1; break; case HELIUM: atomicNumber = 2; break; … } Enumerations allow real-world ‘values’ to be represented using real-world names, instead of (arbitrary) integers.
Color Element Red Violet Hydrogen E113 Orange … Indigo Helium … E112 Enumerations and SmallTalk OO purists replace enums with class hierarchies: This permits the creation of real-world objects: // Smalltalk aColor := new Blue. // Smalltalk anElement := new Helium. as opposed to real-world valuesprovided by an enumeration. For this reason, “pure” OO languages like Smalltalk don’t provide an enumeration mechanism.
Subrange • A type whose values are a subset of an existing type // Ada subtype TestScore is Integer range 0..100; subtype CapitalLetter is Character range 'A'..'Z'; type DaysOfWeek is (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday); subtype WeekDay is DaysOfWeek range Monday..Friday; If a subrange variable is declared: WeekDay today; and assigned an invalid value: today := Saturday; then an exception occurs that, if not caught, halts the system. This is an essential feature for life-critical systems.
Array Types • An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate, relative to the first element.
Array Design Issues • What types are legal for subscripts? • Are subscripting expressions in element references range checked? • When are subscript ranges bound? • When does allocation take place? • What is the maximum number of subscripts? • Can array objects be initialized? • Are any kind of slices allowed?
Array Indexing • Indexing (or subscripting) is a mapping from indices to elements array_name (index_value_list) an element • Index Syntax • FORTRAN, PL/I, Ada use parentheses • Ada explicitly uses parentheses to show uniformity between array references and function calls because both are mappings • Most other languages use brackets
Arrays Index (Subscript) Types • FORTRAN, C: integer only • Pascal: any ordinal type (integer, Boolean, char, enumeration) • Ada: integer or enumeration (includes Boolean and char) • Java: integer types only • C, C++, Perl, and Fortran do not specify range checking • Java, ML, C# specify range checking
Subscript Binding and Array Categories • Three choices to make: • Type of binding to subscript ranges • Time of binding to storage • Location of storage • Static: subscript ranges are statically bound and storage allocation is static (before run-time) • Advantage: efficiency (no dynamic allocation)
Subscript Binding and Array Categories (continued) • Fixed stack-dynamic: subscript ranges are statically bound, but the allocation is done at declaration time (during execution) • Advantage: space efficiency • Stack-dynamic: subscript ranges are dynamically bound and the storage allocation is also dynamic (done at run-time) • Advantage: flexibility (the size of an array need not be known until the array is to be used)
Subscript Binding and Array Categories (continued) • Fixed heap-dynamic: similar to fixed stack-dynamic: storage binding is dynamic but fixed after allocation (i.e., binding is done when requested and storage is allocated from heap, not stack) • Heap-dynamic: binding of subscript ranges and storage allocation is dynamic and can change any number of times • Advantage: flexibility (arrays can grow or shrink during program execution)
Examples • C and C++ arrays that include static modifier are static • C and C++ arrays without static modifier are fixed stack-dynamic • Ada arrays can be stack-dynamic • C and C++ provide fixed heap-dynamic arrays • C# includes a second array class ArrayList that provides heap-dynamic • Perl and JavaScript support heap-dynamic arrays
Array Initialization • Some languages allow initialization at the time of storage allocation • C, C++, Java, C# example int list [] = {4, 5, 7, 83} • Character strings in C and C++ char name [] = “freddie”; • Java initialization of String objects String[] names = {“Bob”, “Jake”, “Joe”};
More Examples • Ada • List : array (1..5) of Integer := (1, 3, 5, 7, 9); • Bunch : array (1..5) of Integer := (1=>17, 3=>34, others =>0); • Python • [expression for iterate_varin array if condition] • [x * x] for x in range(12) if x%3 == 0] • [0, 9, 36, 81]
Array Operations • Ada allows array assignment but also catenation (&) • Fortran provides elemental operations • Operate between pairs of array elements • For example, + operator between two arrays results in an array of the sums of the element pairs of the two arrays • APL provides the most powerful array processing operations for vectors and matrixes as well as unary operators (for example, to reverse column elements)
Rectangular and Jagged Arrays • A rectangular array is a multi-dimensioned array in which all of the rows have the same number of elements and all columns have the same number of elements • myArray[3,7] • A jagged matrix has rows with varying number of elements • Possible when multi-dimensioned arrays actually appear as arrays of arrays • myArray[3][7]
Slices • A slice is some substructure of an array; nothing more than a referencing mechanism • Slices are only useful in languages that have array operations
Slice Examples • Fortran 95 Integer, Dimension (10) :: Vector Integer, Dimension (3, 3) :: Mat Integer, Dimension (3, 3) :: Cube Vector (3:6) is a four element array
Implementation of Arrays • Access function maps subscript expressions to an address in the array • Access function for single-dimensioned arrays: address(list[k]) = address (list[lower_bound]) + ((k-lower_bound) * element_size) (arrayBaseAddress - firstIndexElementSize) + kElementSize • At Issue: There is an efficiency-vs-convenience tradeoff: • Accesses to 0-relative arrays require two fewer operations: (arrayBaseAddress - 0ElementSize) + iElementSize = arrayBaseAddress + iElementSize • Programmer-specified index values can be pretty convenient: type LetterCounter is array(CapitalLetter) of integer; type DailySales is array(WeekDay) of real;
Accessing Multi-dimensioned Arrays • Two common ways: • Row major order (by rows) – used in most languages • column major order (by columns) – used in Fortran • Efficiency issue: sequential memory accesses will be faster • For each dimension of an array, one add and one multiply instruction are required for the access function.