340 likes | 1.28k Views
Data Types. Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types Union Types Pointer Types Ref: Chapter 6 in text. Memory Layout: Overview. Text Data Heap Stack. Text : code constant data Data :
E N D
Data Types • Primitive Data Types • Character String Types • User-Defined Ordinal Types • Array Types • Associative Arrays • Record Types • Union Types • Pointer Types • Ref: Chapter 6 in text
Memory Layout: Overview Text Data Heap Stack • Text: • code • constant data • Data: • initialized global & static variables • global & static variables • 0 initialized or un-initialized (blank) • Heap • dynamic memory • Stack • dynamic • local variables • subroutine info 0 xxxxxxxx
Data Types • Data type • a collection of data values/objects, and • a set of predefined operations on these objects • Evolution of data types: • FORTRAN I (1957) - INTEGER, REAL, arrays • Ada (1983) - User created unique types enforced by the system • Design issues for all data types: • Syntax of references to variables • Which operations are defined and how to specify them • Mapping to computer representation • Memory • Bit pattern
Primitive Data Types Not defined in terms of other data types • Integers • Floating point types • Decimal • Character • Boolean • in some PLs: Character Strings
Integers • Almost always an exact reflection of hardware • the mapping is trivial • Up to 8 different integer types in a PL • size • unsigned • signed: • negative numbers: • 2-complement? -> two bit patterns for 0! • => 2-complement - 1 Value Sign word_size - 1 bits 1 bit
Floating Point • Model real numbers • Only approximations • Fractional part: c12-1+c22-2+…+cn2-n • Can't be exact, e.g. 0.1, etc. • Mostly at least two floating-point types • sometimes more • Usually reflects exactly the hardware • but not always • IEEE Standard
IEEE Floating Point Format • Single precision • Double precision • NaN: not a number Exponent Exponent Fraction Fraction Sign Sign 52 bits 23 bits 8 bits 11 bits 1 bit 1 bit
Decimal • Base 10 • Business applications (usually money) • Stored as fixed number of decimal digits • Encoded, not as floating point. • Advantage • Accuracy • No rounding errors, no exponent • Binary representation can’t do this • Disadvantages • Limited range • Needs more memory • Example: BCD (binary coded decimal) • Uses 4 bits per decimal digit • as much space as hexadecimal)
Character • Stored as numeric codes • ASCII • weird order • EBCDIC • Unicode • mostly 16 bit, not for ASCII • Arabic, Thai, Korean, Katakana, Hiragana, Chinese (some) • Often compatible with integer • the same operations
Boolean • true / false • Could be implemented as bits • Pascal: packed array of boolean • Often as bytes or words • Java: 32 bits word • Operations: • not (unary), and, or, =, ≠, • xor (often), >, <, ≥, ≤, (false < true) • Sometimes compatible with integer • C, C++ • not in Java, Pascal • Advantage • readability
String Types • Sequences of characters • Design issues: • A primitivetypeor just a special array? • Is the length of objects staticordynamic? • Operations: • Assignment • Comparison (=, >, etc.) • Concatenation (+) • Substring reference • Pattern matching • Regular expressions
String in PLs • Pascal • packed array of char – not primitive type; • Only assignment and comparison • Ada, FORTRAN 90, and BASIC • Somewhat primitive • Assignment, comparison, concatenation, substring S := S1 & S2 (concatenation) S(2..4) (substring reference) • FORTRAN has an intrinsic for pattern matching e.g. (Ada) • C and C++ • Not primitive • Use char arrays • Library of functions provide operations
String in PLs (cont.) • Java • Somewhat primitive: String class (not arrays of char) • Concatenation with + • Objects cannot be changed (immutable) • StringBuffer is a class for changeable string objects • Rich libraries provide operations, regular expressions • Perl and JavaScript • Patterns are defined in terms of regular expressions • Very powerful, e.g., /[A-Za-z][A-Za-z\d]+/ • SNOBOL4 (string manipulation language) • Primitive string type • Many operations, including elaborate pattern matching predefined
String Length: Static vs. Dynamic • Static • Implementation: compile-time descriptor • FORTRAN 77, Ada, COBOL, Pascal, Java • e.g. (FORTRAN 90) • CHARACTER (LEN = 15) NAME; • Limited Dynamic Length • Implementation: may need a run-time descriptor • not in C and C++ • a null character indicates actual length • Dynamic • Implementation: needs run-time descriptor • allocation/deallocation is the problem • SNOBOL4, Perl, JavaScript, Common Lisp
Character String Types Compile-time descriptor for static strings Run-time descriptor for limited dynamic strings
String Type Evaluation • Advantages • Writability • Readability • Static length primitive type • Inexpensive to provide, why not? • Dynamic length • Flexibility vs. cost to provide
User-Defined Ordinal Types • Ordinal type • The range of values is mapped onto the set of positive integers • Enumeration Types • User enumeratesall possible values as symbolic constants • Represented with ordinal numbers • Design Issues • Can symbolic constants be in more than one type definition? • hair = (red, brown, blonde); • cat = (brown, striped, black); • Can they be read/written as symbols? • Allowed as array indices? • Are subranges supported?
User-Defined Ordinal Type in PLs • Pascal • can't reuse values, no input or output of values • can be used as array indices and case selectors • can be compared • C and C++ • like Pascal; can be input and output as integers • Ada • constants can be reused (overloaded literals) • used as in Pascal; can be input and output
User-Defined Ordinal Type in Java • enum types • Rigorous type checking • Since JDK 1.5 (Java 5) • Not only constants • As classes, can include • Fields (incl. static final, i.e. constants) • Methods • Constructors (with parameters) • Subclass of Object, • i.e. toString() is supported (and can be overwritten!) • Sets of constants are supported! • Compiled into its own *.class file • The Rolls Royce of enumeration types!
Evaluation of User-Defined Ordinal Types • Readability • e.g., no need to code a color as a number • Reliability • compiler can check • e.g., for ranges of values • suppose you have 7 colors coded as integers (1..7), then • 9 is a legal color because it is legal integer! • colors can be added ?! • Operations specified • e.g., colors can’t be added • Support can be extensive, e.g. Java
Subrange Types • An ordered contiguous subsequence of an ordinal type • Design Issue: How can they be used? • Pascal • Behave as their parent types, can be used as • Indices in for-loops • Array indices • e.g., type pos = 0..MAXINT; • Ada • Not new types, just constrained existing types • They are compatible • Can be used as in Pascal • Can be used as case constants • e.g., subtype POS_TYPE is INTEGER range 0..INTEGER'LAST;
Subrange Types: Implementation & Evaluation • Implemented • As parent types • With code to restrict assignments to variables • Code inserted by the compiler • Advantages • Readability • Reliability • restricted ranges improve error detection
Arrays • An ordered collection of homogeneousdata elements • An element is identified by its index • Position relative to the first element • Design issues • What types are legalfor subscripts? • Is the range checked on subscript expressions in references? • When does binding of subscript ranges happen? • When does allocation take place? • What is the maximumnumber of subscripts? • Can array objects be initialized? • Are any kind of slicesallowed?
Array Indexing • Mapping from indices to elements • map (array_name, index_value_list) → elements • Index Syntax • FORTRAN, PL/I, Ada: use parentheses • Most other languages use brackets • Subscript types • FORTRAN, C: integer only • Pascal: any ordinal type (integer, boolean, char, enum) • Ada: integer or enum (includes boolean and char) • Java: integer types only
Static and Fixed Stack Dynamic Arrays Based on subscript binding and binding to storage • Static • Static range of subscripts and storage bindings • e.g. FORTRAN 77, some arrays in Ada • Advantage: • time efficiency (no allocation or deallocation) • Fixed stack dynamic • Static range of subscripts, but • Storage is bound at elaboration time • e.g. Most Java locals, and C non-static locals • Advantage • space efficiency
Dynamic Arrays • Stack-dynamic • Subscript range and storage bindings are dynamic • But fixed for the variable’s lifetime • Advantage • flexibility - size need not be known until the array is used • Heap-dynamic • Subscript range and storage bindings are dynamic • Not fixed • e.g. declare in FORTRAN 90 a dynamic 2-dim array MAT INTEGER, ALLOCATABLE, ARRAY (:,:) :: MAT • APL, Perl, JavaScript: arrays grow, shrink as needed • Java: arrays are heap-dynamic objects
Array Dimensions and Initialization • Number of subscripts / dimension • FORTRAN I allowed up to three • FORTRAN 77 allows up to seven • Other PLs: no limit • Array Initialization • A list of values in the order of increasing subscripts • FORTRAN: DATA statement, or in / ... / • C, C++, Java: values in braces int stuff [] = {2, 4, 6, 8}; • Ada positions for the values can be specified SCORE : array (1..14, 1..2) := (1 => (24, 10), 2 => (10, 7), 3 => (12, 30), others => (0, 0)); • Pascal: array initialization not allowed
Built-in Array Operations • APL • many (text p. 240-241) • Ada • Assignment • to an aggregate constant • to an array name • Concatenation for single-dimensioned arrays • Relationaloperators(= and != only) • FORTRAN 90 • A wide variety of intrinsicoperations, e.g. • matrix multiplication • vectordot product
Array Slices • A slice is some substructureof an array • only a referencing mechanism • Only useful in PLs with array operations • FORTRAN 90: INTEGER MAT (1:4, 1:4) MAT(1:4, 1) - the first column MAT(2, 1:4) - the second row 2. Ada: single-dimensioned arrays only LIST(4..10)
Implementation of Arrays • Access function • Maps subscript expressions to an address in memory • Row major order • By rows • Suppose array with n*m elements with subscripts starting at 0 • i.e. indices (0,0) (0,1) (0,2) … (m-1, n-2) (m-1,n-1) • then A(i,j) = A(0,0) + (i * n + j) * element_size • Column major order • By columns • Suppose array with n*m elements with subscripts starting at 0 • i.e. indices (0,0) (1,0) (2,0) … (m-2, n-1) (m-1,n-1) • then A(i,j) = A(0,0) + (j * m + i) * element_size
Memory Access Function Row major:Ai,j = A0,0+(i *n+j)*el_size Column major:Ai,j = A0,0+(j*m+i)*el_size i*n j i j*m
Associative Arrays • Associative array • An unordered collection of data elements • Indexed by keys • Implemented using hash tables • Design Issues • What is the formof references to elements? • Is the size staticordynamic?
Associative Arrays in Perl • Structure and Operations in Perl • Names begin with % • e.g., declaration with initialization %hi_temps = ("Monday"=>77, "Tuesday"=>79,…); • Subscripting uses braces and keys • e.g., access using key $hi_temps{"Wednesday"} = 83; • Elements can be removed with delete • e.g., delete using key delete $hi_temps{"Tuesday"};