Chapter 6

Chapter 6 Data Types

Contents Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types Union Types Pointer and Reference Types

6.1 Introduction • Data type: A data type defines a collection of data values and a set of predefined operations on those values An object represents an instance of a user-defined (abstract data) type • Design issues for all data types • What operations are defined and how are they specified. • It is convenient, both logically and concretely, to think of variables in terms of descriptors. • A descriptor is the collection of the attributes of a variable • A descriptor is used for type checking, allocation and, deallocation • Static attributes need only be available at compile-time; dynamic attributes need to be available at run-time.

6.2 Primitive data types • Primitive data types are those that are not defined in terms of other data types • Most primitive types are abstractions for underlying hardware data types. • Common primitive types: • Numeric types Early PLs had only numeric primitive types, and still play a central role among the collections of types supported by contemporary languages. • Integers • Almost always an exact reflection of the hardware, so the mapping is trivial. • For example, C, Ada, java .. allows these: short integer, integer and long integer. • An integer is represented by a string of bits, with the leftmost representing the sign bit.

6.2 Primitive data types (cont.) • Floating point numbers • Model real numbers but only as approximations • languages for scientific use support at least two floating-point types; sometimes more. • usually exactly like the hardware, but not always; some languages allow accuracy specs in code e.g. (Ada) IEEE (The Institute of Electrical and Electronics Engineers) floating-point formats: (a) Single precision, (b) Double precision

6.2 Primitive data types (contd.) • Decimal • for business applications (money) • store a fixed number of decimal digits (coded) • advantage: accuracy • disadvantages: limited range, wastes memory • Boolean types • The range of values has only two elements TRUE or FALSE • Booleans types are often used to represent switches or flags in programs • advantage: readability • Character types • stored as numeric codings (usually ASCII but Unicode has appeared as an alternative) • A new 16-bit character set named Unicode had been developed as an alternative. Java is the first to use Unicode

6.3 Character String Types • Character string type is one in which the values consist of sequences of characters • Design issues with the string types • Should strings be simply a special kind of character array or a primitive type? • Should strings have static or dynamic length? • String Operations • Assignment ( Java: str1 = str2;) (C: strcpy(pstr1, pstr2); • Comparison (=, >, etc.) BASIC: str1 < str2 • Concatenation, C: strcat (str1,str2), (Java : str2 + str3;) • Substring reference • Pattern matching, C: strcmp(str1,str2);

6.3 Character String Types contd. • Examples • C and C++ • not primitive • use char arrays and a library of functions that provide operations • Java : String class (not arrays of char) • objects are immutable • StringBuffer is a class for changeable string objects • String length options • Static – Python, Java’s String class, C++ standard class library, Ruby’s built-in String class, and the .NET class library in C# and F#. • limited dynamic length – C and C++ ( up to a max length indicated by a null character) • dynamic –Perl, JavaScript

6.3 Character String Types (cont.) • Implementation • static length - compile-time descriptor • limited dynamic length - may need a run-time descriptor for length (but not in C and C++ because the end of a string is marked with the null character) • dynamic length - need run-time descriptor; allocation/deallocation is the biggest implementation problem • Fig (a) Compile-time descriptor for static strings; Fig (b) Run-time descriptor for limited dynamic strings Run-time descriptor for limited dynamic strings Compile – time descriptor for static strings

6.4 User-defined Ordinal types • An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers • Design issue: should a symbolic constant be allowed to be in more than one type definition? Examples • Java does not include an enumeration type, but provides the Enumeration interface • C# example enum days {mon, tue, wed, thu, fri, sat, sun}; • Evaluation of enumeration types • aid to readability e.g. no need to code a color as number. • aid to reliability e.g. compiler can check

6.5 Arrays • An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate, relative to the first element. • Design Issues • What types are legal for subscripts? • Are subscripting expressions in element references range checked? • When are subscript ranges bound? • When does allocation take place? • Are ragged or rectangular multidimensioned arrays allowed, or both? • Can array objects be initialized? • Are any kind of slices allowed? • Indexing is a mapping from indices to elements • map(array_name, index_value_list)  an element

6.5 Arrays (continued) • Index Syntax • FORTRAN, PL/I, Ada use parentheses • Most other languages use brackets • Subscript Types: • FORTRAN, C - integer only • Java - integer types only • Categories of arrays: • Fixed stack dynamic - range of subscripts is statically bound, but storage is bound at elaboration time • e.g. Most Java locals, and C locals that are not static • Advantage: space efficiency

6.5 Arrays (continued) • Stack-dynamic - range and storage are dynamic, but fixed from then on for the variable’s lifetime • e.g. Ada declare blocks declare STUFF : array (1..N) of FLOAT; begin ... end; • Advantage: flexibility - size need not be known until the array is about to be used • Heap-dynamic - subscript range and storage bindings are dynamic and not fixed • In APL, Perl, and JavaScript, arrays grow and shrink as needed • In Java, all arrays are objects (heap-dynamic)

Array Initialization • Some languages allow initialization at the time of storage allocation • C, C++, Java, C# example int list [] = {4, 5, 7, 83} ; • Character strings in C and C++ char name [] = “freddie”; • Arrays of strings in C and C++ char *names [] = {“Bob”, “Jake”, “Joe”}; • Java initialization of String objects String[] names = {“Bob”, “Jake”, “Joe”};

6.6 Associative Arrays • An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys • Also known as Hash tables • Index by key (part of data) rather than value • Store both key and value (take more space) • Best when access is by data rather than index • Examples: • Lisp alist: • ((key1 . data1) (key2 . data2) (key3 . data3) • Design Issues • What is the form of references to elements? • Is the size static or dynamic? • Structure and Operations in Perl • Names begin with % • Literals are delimited by parentheses, e.g., %hi_temps = ("Monday" => 77, "Tuesday" => 79,…); • Subscripting is done using braces and keys, e.g., $hi_temps{"Wednesday"} = 83; • Elements can be removed with delete, e.g., delete $hi_temps{"Tuesday"};

6.7 Records • A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by names • Design Issues • What is the form of references? (Calling format: OFF, .) • What unit operations are defined? (Assignment, equality, assign corresponding filed) • Implementation method • Simple and efficient, because field name references are literals bound at compile-time. • Use offsets to determine address. • Record Definition Syntax • COBOL uses level numbers to show nested records; others use recursive definitions • Record Field References • COBOLfield_name OF record_name_1 OF ... OF record_name_n • Others (dot notation)record_name_1.record_name_2. ... .record_name_n.field_name

6.7 Records (continued) • Record Operations • Assignment • Pascal, Ada, and C allow it if the types are identical • In Ada, the RHS can be an aggregate constant • Initialization • Allowed in Ada, using an aggregate constant • Comparison • In Ada, = and /=; one operand can be an aggregate constant • MOVE CORRESPONDING • In COBOL - it moves all fields in the source record to fields with the same names in the destination record • Useful operation in data processing application, where input records are moved to output files after same modification.

6.7 Records (continued) A compile-time descriptor for a record • Comparing records and arrays • Access to array elements is much slower than access to record fields, because subscripts are dynamic (field names are static) • Dynamic subscripts could be used with record field access, but it would disallow type checking and it would be much slower.

6.8 Unions • A union is a type whose variables are allowed to store different type values at different times during execution. • Implementation: • Allocate for largest variant • Discriminated unions include tag field to indicate type • Example: • Table of symbols and values • Each value may be int, real, or string • Design Issues for unions • Should type checking be required? Note that any such type checking must be dynamic. • Should unions be integrated with records? • Examples: • FORTRAN, C and C++ - free unions (no tags) • Not part of their records • No type checking of references Java has neither records nor unions

6.10 Pointers • A pointer type is a type in which the range of values consists of memory addresses and a special value, nil (or null) • Uses • Addressing flexibility (support indirect addressing) • Dynamic storage management (scoping) • Design Issues • What is the scope and lifetime of pointer variables? • What is the lifetime of heap-dynamic variables? • Are pointers restricted to pointing at a particular type? • Are pointers used for dynamic storage management, indirect addressing, or both? • Should a language support pointer types, reference types, or both? Note: heap dynamic variables have no name and must be referenced by pointer variable.

6.10 Pointers (continued) • Fundamental Pointer Operations: • Assignment of an address to a pointer (first binding) • References (explicit versus implicit dereferencing) • The assignment operation j = *ptr (second binding)

6.10 Pointers (continued) • Problems with pointers • Dangling pointers (dangerous) • A pointer points to a heap-dynamic variable that has been deallocated • Creating one (with explicit deallocation): • Set a second pointer to the value of the first pointer • Deallocate the heap-dynamic variable, using the first pointer • Lost Heap-Dynamic Variables (wasteful) • A heap-dynamic variable that is no longer referenced by any program pointer • Creating one: • Pointer p1 is set to point to a newly created heap-dynamic variable • p1 is later set to point to another newly created heap-dynamic variable • The process of losing heap-dynamic variables is called memory leakage

6.10 Pointers (continued) • C and C++ • Used for dynamic storage management and addressing • Explicit dereferencing (*value and &  address) and address-of operator • Can do address arithmetic in restricted forms e.g. float stuff[100]; float *p; p = stuff; *(p+5) is equivalent to stuff[5] and p[5] *(p+i) is equivalent to stuff[i] and p[i] • Domain type need not be fixed (void * ) • void * - Can point to any type and can be usefull for transferring memory from one place to another.

6.10 Pointers (continued) • C++ Reference Types • Constant pointers that are implicitly dereferenced • Example: int result = 0; int &ref_result = result; …… ref_result = 100; In this code segment, result and ref_result are aliases. • Advantages of both pass-by-reference and pass-by-value • Java has no pointer type, but only a reference type. • No pointer arithmetic • Can only point at objects (which are all on the heap) • No explicit deallocator • Means there can be no dangling references

6.10 Pointers (continued) • Evaluation of pointers • Dangling pointers and dangling objects are problems, as is heap management • Pointers are like goto's • they widen the range of cells that can be accessed by a variable • Pointers or references are necessary for dynamic data structures • so we can't design a language without them

Dealing with Lost Objects • The lost object problem can be solved if the language implements automatic storage management. (Java and Lisp) • Two approaches for garbage collection : • Reference counting (“eager” approach): • Object maintains a counter of how many pointers reference it, when counter is decremented to zero, the object is deallocated. • Reference counting incurs significant overhead on each pointer assignment, but the overhead is distributed throughout the session. • Mark-sweep (“lazy” approach): • Wait until all storage is allocated, then collect the garbage

Summary • The data types of a language are a large part of what determines that language’s style and usefulness • The primitive data types of most imperative languages include numeric, character, and Boolean types • The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programs • Arrays and records are included in most languages • Pointers are used for addressing flexibility and to control dynamic storage management

Chapter 6

Chapter 6

Presentation Transcript

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

CHAPTER 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

CHAPTER 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6