1 / 43

ICS 313: Programming Language Theory

ICS 313: Programming Language Theory. Module 06: Data Types. Objectives. To understand basic issues in the design and implementation of typical data types. Central Goal of Typed Data. To model the real-world problem space as closely and efficiently as possible. Evolution:

Download Presentation

ICS 313: Programming Language Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICS 313: Programming Language Theory Module 06: Data Types

  2. Objectives • To understand basic issues in the design and implementation of typical data types.

  3. Central Goal of Typed Data • To model the real-world problem space as closely and efficiently as possible. • Evolution: • Fortran-I: Numeric and Array; floating point modeled with ints • PL1: everything for everyone • Algol: few basic types and user definitions • Simula, Java: Entities modeled with classes • Evolutionary progression: • Association of data with functions. • Abstractions for maintaining/assessing interdependencies automatically.

  4. Primitive and Structured Data Types • Primitive Data Types: • Data types not defined by other types. • Reflect hardware support directly or with minor software support. • Examples: integers, floats, strings, etc. • Structured Data Types: • Primitive types + “type constructors”

  5. Built In Primitives (These are the options: they aren’t all built in to all languages)

  6. Numeric Types: Integer • Only primitive data type found in early languages (except Lisp) • Integer: • Different sizes possible: 1-8 bytes • Arbitrarily large in Lisp • Representation: string of bits • leftmost can represent the sign • twos complement better for computer math • Direct support in hardware.

  7. 23 or 52 1 8 or 11 Fraction Sign Exponent Numeric Types: Floating Point • Floating Point: • Approximations for real numbers. • Typically stored in binary (base 2) • means 0.1 cannot be represented exactly! • Two levels of accuracy • Real (typically four bytes, 1/8/23) • Double (typically eight bytes, 1/11/52) • Representation (IEEE Standard): Try this in Python: x = 3.4 x Range Precision (and range)

  8. Numeric Types: Decimal • Decimal types: • Store a fixed number of decimal digits with decimal point in fixed position. • Mandatory for business application process. • Business (mainframes) have hardware support. • Others implement decimal using integers and software. • Representation: • 1-2 digits encoded into each byte. • Example: 9352.14 in three bytes 1001 0011 0101 0010 . 0001 0100

  9. Numeric Types: Boolean • Simplest type: two values (true and false). • Requires only one bit to implement. • Typically implemented as a byte. • Included in most languages since Algol. • Some languages do not have Boolean type: • In C (and C++): • 0 is false, all other numeric values true. • Lisp: • “nil” false, all others true. • Python extends these conventions to many primitives: • 0, ‘’, (), [], {} are all false. • Scheme uses a mixture of Boolean and others: • #f false, #t and all others true

  10. Characters and Strings • Character type: • Stored as numeric encodings • ASCII popular, but limited to 127 chars. • UNICODE used in Java, ASCII superset, most natural language characters. • Character strings • Design issues • Are strings a primitive or structured data type (i.e. array of chars) • Are strings fixed or variable length?

  11. String Operations • Required operations: • equality, concat, <, >, substring, etc. • Pascal, Ada: • strings are predefined as array of chars. • built-in string operations • Index slice (like immutable strings in Python: s[3:7]) • C, C++: • strings are implemented as an array of chars with special null character terminator. • library package provides string operations. • Scheme: Strings are primitive constants • Java: String immuatble, StringBuffer mutable

  12. Pattern Matching • Built in (PERL, SNOBOL4, ICON) versus Library (Python, …) • Examples (PERL, Python): • What do these represent? • /[A-Za-z][A-Za-z\d]*/ • /\d+\.?\d*|\.\d+/

  13. String Length Choices • Static: • Length specified in declaration of string. • Example: Fortran. (Blank fill) • Limited dynamic: • Maximum length specified in declaration. • Example: C, C++ (null character ends) • Dynamic: • No length specified; shrinks/grows as needed. • Example: Common Lisp, Java

  14. “The String” Address Max Len. Curr Len. “The String” “The String” String Implementation • Static (used only at compile time): Length Address • Limited Dynamic (run time, except in C which uses null, doesn’t check): • Dynamic (used at run time): CurLen Address

  15. “The String” String in memory • How is this allocated? • Linked list: • Faster allocation/deallocation as size changes • Slower operations • More memory • Contiguous storage • Slower reallocation during growth • Faster operations

  16. User Defined Primitives

  17. Enumeration Types • All possible (symbolic) values are explicitly stated in the type declaration: • type WEEKEND = (Sat, Sun); • type DAY = (Mon, Tue, Wed, Thu, Fri, Sat, Sun); • Of what type is ‘Sat’? (overloaded literal) • Pascal, C don’t allow it • Ada allows it • Advantages of enumerated types over “numeric encoding” (i.e. Sat = 1, Sun = 2): • Provides greatly increased readability. • Prevents use of inappropriate operations or values • Implemented w/Integers, range checks

  18. Subrange Types • Subsequence of ordinal, e.g.: • Pascal: index = 1..100 • Python: for x in range(10) • Subtype: • Restricted range of type • Compatible with parent • Derived type: • Also restricted range, but not compatible • Good for readability and reliability • Implemented like parent with range checks

  19. Structured Types Most of these are built in types, although in the case of records (structures) and pointers the programmer then uses them to define specialized types

  20. Arrays • A homogeneous aggregate of typed data elements with elements identified by position. • Issues: • Syntax: A(i), A[i] • Subscript types: allow any ordinal type? • Definition: A[DAY] • Use: A[Mon] • Range checked?

  21. Array Categories • Static arrays: • Subscript ranges (and data element types) are statically bound and storage allocation done at compile-time • FORTRAN up to 77 • Most time efficient, can waste memory • Fixed stack-dynamic arrays: • Subscript ranges/element types statically bound but allocation done at run time. • Supports re-use of large array spaces. • Pascal, C

  22. Array Categories (cont.) • Stack-dynamic arrays: • Subscript ranges bound and storage allocated at run-time, but constant for lifetime of variable. • Heap-dynamic arrays: • Subscript ranges bound and storage allocated at run-time and can change. • Allows greatest flexibility (array can grow or shrink.) • Java Vector • Least efficient.

  23. Array Operations • Operate on array as unit. • Some languages provide no array operations. • Examples of operations: • Assignment • Concatenation • Relational operations • Pair-wise +, -, *, / • Operations on Slices (FORTRAN90, Python) • APL is the most radical programming language for array processing • Array reversal, transposition, inversion

  24. Array Implementation • For 1-based, 1-dimensional array: • address(A[k]) =(address(A[1]) - element_size) +(k * element_size) • Issue: when is array element address computed? • Static arrays: • element_size and address(A[1]) computed at compile time. • Run-time computation: • address(A[k]) = k * constant • Other array types require lookup of A[1] at run-time.

  25. Multidimensional Arrays • Map to linear memory: • Row-major storage (most languages): • lowest value of first subscript stored first • a b c d e f g h i • Column-major storage (FORTRAN): • lowest value of last subscript stored first • a d g b e h c f i • For 1-based, 2-dimensional array in row-major order: • address(A[i,j]) = address(A[1,1]) + ((((i - 1) * n)) + (j - 1)) * elementSize • Why should a programmer care? • Large arrays may cross page boundaries in virtual memory • Access cells in the wrong order and you create a lot of swapping a b c d e f g h i

  26. Associative Arrays • Also known as Hash tables • Index by key (part of data) rather than value • Store both key and value (take more space) • Best when access is by data rather than index • Examples: • Lisp alist: • ((key1 . data1) (key2 . data2) (key3 . data3) • Python Dictionary: • {key1 : data1, key2 : data2, key3 : data3} • Java: • Java.util.Hashtable

  27. Sets • Useful to shorten booleans: • If x in set … • Implemented as primitive only in Pascal • Stored as bitstring in one word • Implementation dependent limit on size • Efficient intersection, union, equality • Some languages supply set operations applied to lists (Common Lisp, Prolog). • Java provides interface java.util.Set

  28. Record types • A heterogeneous aggregate of typed data elements with elements identified by name. • Operations: • assignment • equality • assign corresponding fields. • Implementation: • Simple and efficient, because field name references are literals bound at compile-time. • Use offsets to determine address.

  29. Record types • Examples: • COBOL Records: • NAME OF EMPLOYEE • MOVE CORRESPONDING EMPLOYEE TO REPORT • Pascal Records: • employee.name • with employee do … name = … • Ada also has records, uses dot notation • Common Lisp “Structures”: • (employee-name …) • C also has structures, uses dot notation • Java: use Classes instead

  30. value (max) symbol symbol int_value real_value string_value Union types • Allow different types of values to be stored at different times during execution. • Often used in records (e.g., Pascal record variants) • Example: • Table of symbols and values • Each value may be int, real, or string. • Which would you prefer? • Implementation: • Allocate for largest variant • Discriminated unions include tag field to indicate type

  31. Union Type Evaluation • Advantages: • Union types provide storage efficiency. • Get around overly restrictive type system • Pointer arithmetic in language that does not support it directly (access pointer as if int) • Disadvantages: • Are more difficult to type check. • May require run-time type checking. • May lead to lack of any type checking. • Unnecessary in OOL like Java (why?) and functional languages (like ML) that support polymorphism and compile-time type checking.

  32. Pointer Types • Pointer variables values are memory addresses or one distinguished value (nil). • Pointers provide two capabilities: • Support indirect addresssing. • Enable dynamic memory management. • Note: heap dynamic variables have no name and must be referenced by pointer variables. Will give example of binary trees in FORTRAN and pointers foo_ptr FF03 FF03 “The String”

  33. Fundamental Pointer Operations • Assignment: • Sets pointer variable to address of an object. • Direct addressing: assignment done implicitly during variable initialization. • Indirect addressing: requires an operator that takes an object and returns its address. (ptr = &object in C) • (Reference:) • Occurrence of pointer variable indicates its own address, just as with other variables (ptr). • Dereference: • Occurrence of pointer variable indicates address of object whose address is the value of the pointer variable. (*ptr in C)

  34. Pointer Examples • Let’s diagram this C: • int *ptr; • int i, j; • i = 3; • ptr = &i; • *ptr = 4; • j = *ptr; // compare to j = ptr • Pointer Arithmetic in C • double a[10]; • index = 3; • ptr = a; // assigns address(a[0]) • ptr = ptr + index; // increments by as many words as needed to skip one array element • Pointers to Records: • (*ptr).name is same as ptr -> name in C • ptr^.name in Pascal

  35. Pointer Problems • Type checking: • If a pointer is allowed to point to more than one type of object, then static type checking is no longer possible (as in C, Lisp).

  36. Dealing with Type Checking • Solution: • Force all pointers to be typed (in terms of the object to which they are dereferenced) • Example: FORTRAN90 • Limits prime use of pointers: • Polymorphism (void * in C)

  37. Pointer Problems (cont.) • Dangling Pointers: • When a pointer points to an object, but the object has been deallocated. • Can occur when: • The object goes out of scope but the pointer does not. A contrived example … ptr1 = &ptr2 call foo(ptr1) in which *ptr1 = localObject after return, try *ptr2 • The object is explicitly deallocated ptr1 = new Object(); ptr2 = ptr1; destroy(ptr1); *ptr2 …

  38. Dealing with Dangling Pointers • Four strategies: • Disallow (in language) explicit deallocation. • Ignore (in compiler) explicit deallocation. • Then pointers will never point to nothing (but space will never be reclaimed). • Allow deallocation, reset other pointers. • Incurs run-time overhead. • Tombstones • Locks and keys • Allow deallocation and trust the programmer. • Efficient but allows dangling pointers.

  39. Pointer Problems (cont.) • Lost objects (garbage): • When all pointers to a dynamic variable are removed, so that the variable’s value can no longer be referenced but the space is still allocated. ptr1 = new Object(); … ptr1 = new Object(); • Common when beginners think that every declaration needs a value. • Results in “memory leaks” (memory fills up)

  40. Dealing with Lost Objects • The lost object problem can be solved if the language implements automatic storage management. (Java and Lisp) • Two approaches: • Reference counting (“eager” approach): • Object maintains a counter of how many pointers reference it, when counter is decremented to zero, the object is deallocated. • Reference counting incurs significant overhead on each pointer assignment, but the overhead is distributed throughout the session.

  41. Dealing with Lost Objects (cont.) • Garbage collection (“lazy” approach): • Wait until all storage is allocated, then collect the garbage • Mark and Sweep GC: • Mark all objects in heap as garbage. • Follow all pointers through heap and reset mark on all objects encountered. • Deallocate all remaining marked objects. • Problems with Mark and Sweep GC: • Causes the system to “halt” during GC. • Most time-consuming when you really need it. • “Ephemeral” GC overcomes these problems. • Runs before you need it • Generations according to object age (so only part of memory is searched)

  42. Pointer Commentary • “Their introduction into high-level languages has been a step backward from which we may never recover.” (C. Hoare, 1973). • “Pointers are thought by many to be essential in imperative languages.” (R. Sebesta, 1996) • “Java has no pointer data type.” (P. Johnson, 1999) • “… it remains to be seen … (R. Sebesta, 2002) • Java References • Assignment to (heap dynamic) objects (class instances) • No dereferencing, so no dangling pointers • Runtime system manages memory, so no lost objects • No pointer arithmetic: meaningless

  43. End of module 06

More Related