Exploring Data Types & Roots: Primitive, Numeric, Character, & Set Types

Chapter 6 Data Types Introduction: • This chapter first introduces the concept of data type and the characteristics of the common primitive data types. • Then the design of enumeration and subrange types are discussed. • Next ,structured data types ,specifically : • Arrays ,records ,and unions.

Chapter 6 Data Types Introduction: • Set types are then discussed, followed by an in-depth look at pointers. • The design issues are stated and design choices made by the designer(s) of the important languages are explained. • Implementation methods for data types often have significant impact on their design.

Chapter 6 Data Types • Data types roots is 50 years old. • FORTRAN I (1956) - INTEGER, REAL, arrays … • COBOL allow programmers to specify the accuracy of decimal data values. • PL/I extends this to integer and floating points • ALGOL provides few user-defined data types

User defined data types improve readability through the use of meaningful names for types. • User defined data types improve readability and modifiability. • Ada (1983) - User can create a unique type for every category of variables in the problem space and have the system enforce the types • Abstract data type was simulated in Ada83 • Abstract data type is the use of a type separated from the representation and set of operations on values of that type.

Def: A descriptor is the collection of the attributes of a variable. • In an implementation, a descriptor is a collection of memory cells that store variable attributes.

Primitive Data Types • Those not defined in terms of other data types • Numeric • Character string • User-defined ordinal • Arrays • Associative arrays • Record • Union • Set • Pointers

Numeric Types: • Integer • Floating Point • Decimal • Boolean Type • Character Type

Integer • Almost always an exact reflection of the hardware, so the mapping is trivial • There may be as many as eight different integer types in a language: integer, long integer, short integer, unsigned int, unsigned long int,... • A negative integer could be stored in sign-magnitude notation, in which the sign bit is set to indicate negative and the remainder of the bit string represents the absolute value of the number.

Floating Point • Model real numbers, but only as approximations (like pi). • Languages for scientific use support at least two floating-point types; float and double, sometimes more • Usually exactly like the hardware, but not always; some languages allow accuracy specs in code e.g. (Ada) type SPEED is digits 7 range 0.0..1000.0; type VOLTAGE is delta 0.1 range -12.0..24.0;

1 8 bits 23 bits Exponent Fraction Sign bit 1 11 bits 52 bits Exponent Fraction IEEE floating point format for single and double precision

Decimal • For business applications (money). Used in COBOL. • Store a fixed number of decimal digits (coded) using BCD system. • Advantage: accuracy • Disadvantages: limited range, wastes memory

Boolean: • Boolean type are perhaps the simplest of the all types. • Their range of values has only two elements, one for true and one for false. • Could be implemented as bits, but often as bytes • C++ has Boolean type, but it allows expressions to be used as if they were expressions. • Advantage: readability

Character type • Are stored in computers as numeric coding. The most common one is ASCII. • ASCII supports up to 128/256 characters. • A new code called Unicode has been developed • Unicode is 16-bit code. • Java uses Unicode. • Unicode includes the characters from most of the World’s natural languages. • Example: • Unicode includes the Cyrillic alphabet, as used in Serbia, and Thai digits.

Character String Types • A character string type is one in which values are sequences of characters Design issues: • Is it a primitive type or just a special kind of array? • Is the length of objects static or dynamic? Operations: • Assignment • Comparison (=, >, etc.) • Catenation • Substring reference for example Name1(2:4) • Pattern matching

If strings are not defined as a primitive type, string data is usually stored in arrays of single characters and referenced as such in the language. This approach is done in C, C++, PASCAL, and ADA.

Examples: • Pascal • Not primitive; assignment and comparison only (of packed arrays) • Ada, FORTRAN 77, FORTRAN 90 and BASIC • treats strings as a primitive type. • Assignment, comparison, catenation, substring reference • e.g. (Ada) N := N1 & N2 (catenation) N(2..4) (substring reference)

Examples • C and C++ • Not primitive • Use char arrays and a library of functions that provide operations ( strcpy, strcat,…) • SNOBOL4 (a string manipulation language) • Primitive • Many operations, including elaborate pattern matching • e.g. (SNOBOL4) LETTER = ‘ABCDEFGHIGKLMNOPQRSTUVWXYZ’ ALI = BREAK(LETTER) SPAN ( LETTER).WORD TEXT WORD

Perl • Patterns are defined in terms of regular expressions • A very powerful facility! • e.g., /[A-Za-z][A-Za-z\d]+/ • Java • String class (not arrays of char)

String Length Options • There are several design choices regarding the length of string variable: • Static and specified in the declaration. • FORTRAN 90, Ada, COBOL e.g. (FORTRAN 90) CHARACTER (LEN = 15) NAME1,nmae2; • static variables are always full. If a shorter string is assigned to a string variable, the empty character are set to blank. • Limited Dynamic Length :is to allow strings to have varying length up to a declared and fixed maximum set by the variable definition. • Example: C and C++ • such string variables can store any number • of characters between Zero and the maximum.

String Length Options • Dynamic • is to allow strings to have varying length with no maximum ,as SNOBOL4, Perl and javaScript. • This option requires the overhead of dynamic storage allocation and deal location, but provides maximum flexibility.

Evaluation (of character string types): • Aid to writability • As a primitive type with static length, they are inexpensive to provide--why not have them? • Dynamic length is nice, but is it worth the expense?

Implementation: • Character string types are sometimes support directly in hardware, but in most cases software is used to implement string storage, retrieval, and Manipulation. • When character string types are represented as Character arrays, the length often supplies few operations. • A descriptor for a static character string type, Which is only required during compilation, has Three fields.

Static string Length Address ( Name) ( First character) Compile time descriptor for static strings

Limited dynamic length • Limited dynamic length - may need a run-time descriptor for length (but not in C and C++). • In C and C++, the end of a string is marked with the null character. • They do not need the maximum length because index values in array references are not range-checked in C and C++.

Limited dynamic string Maximum Length Current Length Address Run time descriptor for limited dynamic strings

Dynamic length • Dynamic length • Requires more complex storage management. • The length of a string, and therefore the storage to which it is bound , must grow and shrink dynamically. • There are two possible approaches to the dynamic allocation problem:

Linked list - Requires more storage - Allocation and deallocation process are simple Solution to dynamic allocation problem Adjacent storage cells - Faster - Requires less storage - The allocation process is slower

Ordinal Types (user defined) • An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers • Enumeration Types - one in which the user enumerates all of the possible values, which are symbolic constants • example: type days is ( Mon, Tue, We, Thu, Fri, Sat, Sun) • Design Issue: Should a symbolic constant be allowed to be in more than one type definition?

Examples: • Pascal - cannot reuse constants; they can be used for array subscripts, for variables, case selectors; NO input or output; can be compared type colortype = { red, blue, green, yellow} var ahmed : colortype; …… ahmed :=blue; if ahmed > red…

Examples: • Ada - constants can be reused (overloaded literals); disambiguate with context or type_name ‘ (one of them); can be used as in Pascal; CAN be input and output type letter is ( ‘A’,’B’,’C’,’D’,……’z’) type vowel is (‘A’,’B’,C’,’D’,……’Z’); for letter in ‘A’..’N’ LOOP <-- ambiguous Now we remove the ambiguity for letter in vowel’(‘a’).. Vowels’(‘u’) loop

Examples: C and C++ - like Pascal, except they can be input and output as integers Java does not include an enumeration type

Evaluation (of enumeration types): a. Aid to readability--e.g. no need to code a color as a number b. Aid to reliability--e.g. compiler can check operations and ranges of values

2. Subrange Type - an ordered contiguous subsequence of an ordinal type Design Issue: How can they be used? Examples: Pascal - Subrange types behave as their parent types; can be used as for variables and array indices e.g. type pos = 0 .. MAXINT;

Ada - Subtypes are not new types, just constrained existing types (so they are compatible); can be used as in Pascal, plus case constants e.g. subtype POS_TYPE is INTEGER range 0 ..INTEGER'LAST;

Confusing point Type ahmed is new INTEGER range 1..100; - inherit the value range and operations of integer, but are not compatible. subtype ALI is INTEGER range 1..100; inherit the value range and operations of integer, and compatible.( subtype is not a new type ).

Evaluation of enumeration types: - Aid to readability - Reliability - restricted ranges add error detection

- Enumeration types are implemented as integers - Subrange types are the parent types with code inserted (by the compiler) to restrict assignments to subrange variables

Arrays An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate, relative to the first element.

Design Issues: 1. What types are legal for subscripts? 2. Are subscripting expressions in element references range checked? 3. When are subscript ranges bound? 4. When does allocation take place? 5. What is the maximum number of subscripts? 6. Can array objects be initialized? 7. Are any kind of slices allowed?

Indexing is a mapping from indices to elements map(array_name, index_value_list)  an element Syntax - FORTRAN, PL/I, Ada use parentheses - Most others use brackets

In Fortran we can have: SUM = SUM + B(I) B(I) may be used to refer to array or to a subprogram call. This is frustrating to the reader In Ada, the compiler has access to information about previously compiled programs

Subscript Types: FORTRAN, C - int only Pascal - any ordinal type (int, boolean, char, enum) Ada - int or enum (includes boolean and char) Java - integer types only

Four Categories of Arrays(based on subscript binding and binding to storage) 1. Static – is one in which the subscript ranges are statically bound and storage allocation is static( done before run time ). range of subscripts and storage bindings are static . e.g. FORTRAN 77, some arrays in Ada Advantage: execution efficiency (no allocation or deallocation) 2. Fixed stack dynamic – is one in which the subscript ranges are statically bound, but the allocation is done at declartion elaboration time.( during execution ) e.g. Pascal locals and, C locals that are not static Advantage: space efficiency

3. Stack-dynamic – is one in which the subscript ranges are dynamically bound and the storage allocation is dynamic.( done during run time ). e.g. Ada declare blocks GET(N); declare STUFF : array (1..N) of FLOAT; begin ... end; Advantage: flexibility - size need not be known until the array is about to be used

4. Heap-dynamic - subscript range and storage bindings are dynamic and not fixed e.g. (FORTRAN 90) INTEGER, ALLOCATABLE, ARRAY (:,:) :: MAT (Declares MAT to be a dynamic 2-dim array) ALLOCATE (MAT (10, NUMBER_OF_COLS)) (Allocates MAT to have 10 rows and NUMBER_OF_COLS columns) DEALLOCATE MAT (Deallocates MAT’s storage) - In APL & Perl, arrays grow and shrink as needed - In Java, all arrays are objects (heap-dynamic)

C and C++ provide dynamic arrays through malloc and free in C and new and delete in C++

Number of subscripts - FORTRAN I allowed up to three - FORTRAN 77 allows up to seven - C, C++, and Java allow just one, but elements can be arrays. - Others - no limit

Array Initialization - Usually just a list of values that are put in the array in the order in which the array elements are stored in memory

Examples: 1. FORTRAN - uses the DATA statement, or put the values in / ... / on the declaration 2. C and C++ - put the values in braces; can let the compiler count them e.g. int stuff [] = {2, 4, 6, 8}; 3. Ada - positions for the values can be specified e.g. SCORE : array (1..14, 1..2) := (1 => (24, 10), 2 => (10, 7), 3 =>(12, 30), others => (0, 0));

Array Initialization (continued) 4. Pascal and Modula-2 do not allow array initialization in the declaration Sections of programs.

Exploring Data Types & Roots: Primitive, Numeric, Character, & Set Types

Exploring Data Types & Roots: Primitive, Numeric, Character, & Set Types

Presentation Transcript

Chapter 4 Fundamental Data Types

Chapter 6: Data types

Chapter 5: Elementary Data Types

Chapter 6 Data Types

Chapter 6 Data Types

Chapter 6 Types of Learning

Chapter 2 – Data Types

Chapter 2 - Data Types

Chapter 6 Data Types

Chapter 7: Data Types

Chapter Four Data Types

Chapter 7:: Data Types

Chapter 5 – Data Types

Chapter 6 ]Types of Arguments

Data Types Chapter 6

Chapter 6: Data Types

Data Types Chapter 6

Data Types Chapter 5

Chapter 6 Fundamental Types

Chapter 2 – Data Types