420 likes | 437 Views
Learn about different types in programming languages, including scalar, aggregate, derived, and more. Understand type safety of pointers and memory management concepts.
E N D
Topic 4 -Types: Checking and Safety Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems Concepts Fall 2002 Monday Wednesday 2:30-3:50 LI 99
Introduction to Types • Types describe: • The data's range of values • The kinds of operations permitted on the data • Types provide a context for operations • So the meaning of a + b depends on • The type of a • The type of b • Impacts code generation and memory layout.
Type Systems • A Type system consists of: • A mechanism for associating types with language constructs • A set of rules/tools using type information • Type Inference - Figure out what type something is • Type Equivalence - Are two types the same? • Type Compatibility - Can we use the types interchangeably? • Good Languages have Orthogonal Typing
Scalar DataBase Types • Scalar Types - Hold one value at a time • Base Types - Language Supported • Boolean • Real • Fixed Point • Floating Point (Single or Double Precision) • Integer
Some Notation • Types can be categorized as • Scalar or Aggregate (Nonscalar) • Base Type or a Derived Type (Not a Base Type) • Scott uses composite type to refer to types that are either: • Derived or • Aggregate
Scalar DataDerived Types • Derived Scalar Types - Hold one value at a time, are user defined • Subrange Types - as per Pascal • Pointers • Enumerated Types - Like C/C++ enum • Bitfields - Found in C/C++ • Language designer pick features
Pointers and Dynamic Memory Allocation • Pointers are often used for dynamically allocated memory • When you use new in C++ or Pascal • Or malloc in C • Deallocation can be explicit or implicit • Explicit - C free(), Pascal, Modula2, C++ • Implicit - Garbage collection based, LISP, Most scripting languages, APL, Java, Eiffel.
Type Safety of Pointers • Referencing type converted pointers is type unsafe (layout changes, bit patterns/format stays the same). • Suppose in C we have int x; and then reference it using float y = *((float *) &x); • C and C++ can't control this problem. • Placing the burden on the programmer
Run Time Pointer Issues • Uninitialized or misdirected pointers are a common source of woe. • Pointers may alias a memory location. • Dangling references point to deallocated memory. • Garbage is memory that is allocated but unreferenced. • Garbage collection is implicit deallocation.
Preventing Dangling References • Tombstones use an additional (hidden) level of indirection.
Locks and Keys • Locks and keys use metadata stored at both the pointer and the referenced object.
Garbage Collection and Aliasing • Aliasing makes garbage collection hard. • If a referenced site is mistakenly labeled as garbage, there may be dangling references. • Garbage colletion requires special run time support. • All methods require additional metadata either stored with the pointer or stored in the dynamically allocated memory.
Mark and Sweep Garbage Collection • Dynamically allocated objects have metadata to indicate if a object is garbage. • Initially mark each allocated object as garbage • For each pointer • Mark the data object it points to as not garbage • Deallocate objects marked as garbage.
Reference Counting • Dynamically allocated objects have metadata to indicate how many pointers are directed at it. • When a pointer is directed at an object, increment its reference count • When a pointer is deallocated or directed away from an object, decrement that object's reference count. • Deallocate object if reference count = 0.
Circular References and Reference Counts • Circularly linked structures may not be properly reclaimed using reference counts • Since the count will not go to zero.
Aggregate Types • Aggregate Types - Are Nonscalar, each data object holds multiple values • Arrays - hold lists of values • Records -Contain heterogeneous value • Variant Records -Has Overlapping Fields • They all have a common base address • Systems objects (e.g. File) • File - A sequence of (possiblty heterogeneously typed) records.
Data Layout in records • How does a compiler allocate space for fields? • Typically in the order in which they are declared (needed for systems programming) • Computer architectures often have errors or performance degradation if data objects are not on word boundaries (frequently 32 bits). • To avoid this, compiler writers attempt to align data with boundaries by "padding" the data • In Pascal, if padding is optional, data can be packed.
A Record Example • Consider the the Source Code for the following C struct and equivalent Pascal record. What is the data layout?
Data Layout of the Example Record • If we assume 32 bit word data alignment the data layout looks like:
Variant Records/Unions • Variant Records have fields starting on the same base address. • Variant records are called unions in C/C++. • Some languages (Pascal, Modula 2, Ada) use a field to indicate which layout to use. • This Special field is called a discriminant. • Others like C rely on distinct field names.
Variant records and Type Safety • Variant Records introduce serious type safety concerns. • Data can be stored under one field and retrieved using another. • Compile time checking hard/infeasible, especially with separate compilation! • Run time checking too expensive. • When retrieved, the layout will differ, but the bit patter (data format) is still the same!
Variant record data layout • Consider the following Ada source code for a record containing a variant record as a field. • Can you give the data layout?
Variant record data layout • The discriminant is allocated space.
Arrays and Data Layout • Arrays are often packed • Records stored in arrays are often padded to 32 bit word boundaries by compilers. • Arrays can have one or more dimensions. • Arrays can be very large (leading to swap space use). • When traversing an array, data access is fastest using contiguous ascending addresses.
Strings • Strings are arrays of characters. • A character array might be larger than needed, if the amount of text stored is not known at compile time. • How can a programmer know where the last significant character is in that case? • Store the index of the last character used (in Pascal you have to do this). • Use a sentinal character to signal end of list (C, C++ use NULL terminated strings).
Multi-Dimensional Array Syntax • Consider a 2 dimensional array of floating point numbers. • In Pascal something like • Var A:array of [1..10] of array[1..10] of real; • In C/C++ and Java, something like: • double A[10][10]; • In Fortran something like: • real, dimension (10,10) :: A;
Multi-Dimensional Array Layout • Many languages support simple forms of multi dimensional array layout, and support contiguous storage allocation. • Column Major -The addresses that vary fastest are the left most (used by Fortran) • So A(2,3) is next to A(3,3) but not A(2,4) • Row Major - The addresses that vary fastest are the right most addresses (C, C++, Pascal). • So A(2,3) is next to A(2,4) but not A(3,3).
Multi-Dimensional Array Layout Issues • However other alternatives exist: • Allocating arrays of pointers to indexes that resolve to arrays.
Support for Large Multidimensional Arrays • When large multi dimensional arrays are used (e.g. in linear algebra applications) • Fast address computations are important! • Locality of reference matters, fetching uncached (or swapped out) data is too slow. • Access patterns may be known by the programmer. • Programmers may commonly access certain parts of an array.
Array Slices • Fortran 90 allows aliasing of a subset of array locations using slices (a notational convenience).
Binding time of Types • Types can be determined • At Compile Time - Static Type Systems • How much checking depends on the language specification • At Run Time - Dynamic Type Systems • Used for interpreted/scripting languages • Type errors can be hard to detect • Often don't need to declare variables/identifiers
Type Checking • Type Checking requires determining • Type Equivalence • Are two types identical? • Type Compatability • Are the operands/result appropriate for the operation? • Type Inference • What type is a data object?
Type Conversion • Type Information Specifies • Layout - Memory management • Format - What does the bit pattern mean? • e.g. 32 bit integer and floating point different format • Type Casting converts between types • Coercion -Converts Layout and Format • Nonconverting Type Casts - Converts layout only (e.g. Using different fields in a C union).
Type Equivalence Revisited • Equivalence - Checks if 2 types the same • Forms of type equivalence • Name Equivalence -feels stricter (ala Pascal) • Structural Equivalence - like C
Name Equivalence • 2 types are name equivalent if • They have the same name • They have the same scope
Structural Equivalence • 2 types are Strutcturally Equivalent if • Every type is structurally equivalent to itself • Structurally equivalent types have the same type constructors applied to structurally equivalent types. • Type renaming does not impact equivalence. • eg. typedef int my_type; my_type is an int.
Type Graphs • Types can be represented using a digraph • Types are nodes • Base types at the leaves • Type constructors form arcs • Below I omit field and type renaming
Type Equivalence Revisited • Name Equivalence • Do array indices need to exactly match? • If so, how to support general purpose functions. • Conformant Arrays -Formal parameters pass the array index bounds (Pascal). • Structural Equivalence • 2 Types are structurally equivalent if they have the equivalent type graphs
Introduction to ML • ML refers to "Meta Language" • Many variants exist, including ocaml • Objective caml • ML is compiled (but one line at a time) • So it feels interactive/interpreted • Why Bother with ML Now? • ML has a sophisticated type inference system • Explore the functional language paradigm
ML Type Inference Rules • All occurences of a variable name within a scope have the same type. • Predicates of if ... then ... else if ... then ... constructs must be boolean (use context). • Function types are a-> b • Tuples can be used to aggregate parameters. • Function Parameters must match the type in the function definition.
An ML Example • ML can infer unspecified types if sufficient information permits.
How is ML's type inference computed? • ML uses unification • Unification is widely used in declaritive languges (especially Prolog) • Given a sequence of clauses describing the system, try to resolve unkowns