1 / 42

Exploring Types in Programming Languages

Learn about different types in programming languages, including scalar, aggregate, derived, and more. Understand type safety of pointers and memory management concepts.

richardryan
Download Presentation

Exploring Types in Programming Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic 4 -Types: Checking and Safety Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems Concepts Fall 2002 Monday Wednesday 2:30-3:50 LI 99

  2. Introduction to Types • Types describe: • The data's range of values • The kinds of operations permitted on the data • Types provide a context for operations • So the meaning of a + b depends on • The type of a • The type of b • Impacts code generation and memory layout.

  3. Type Systems • A Type system consists of: • A mechanism for associating types with language constructs • A set of rules/tools using type information • Type Inference - Figure out what type something is • Type Equivalence - Are two types the same? • Type Compatibility - Can we use the types interchangeably? • Good Languages have Orthogonal Typing

  4. Scalar DataBase Types • Scalar Types - Hold one value at a time • Base Types - Language Supported • Boolean • Real • Fixed Point • Floating Point (Single or Double Precision) • Integer

  5. Some Notation • Types can be categorized as • Scalar or Aggregate (Nonscalar) • Base Type or a Derived Type (Not a Base Type) • Scott uses composite type to refer to types that are either: • Derived or • Aggregate

  6. Scalar DataDerived Types • Derived Scalar Types - Hold one value at a time, are user defined • Subrange Types - as per Pascal • Pointers • Enumerated Types - Like C/C++ enum • Bitfields - Found in C/C++ • Language designer pick features

  7. Pointers and Dynamic Memory Allocation • Pointers are often used for dynamically allocated memory • When you use new in C++ or Pascal • Or malloc in C • Deallocation can be explicit or implicit • Explicit - C free(), Pascal, Modula2, C++ • Implicit - Garbage collection based, LISP, Most scripting languages, APL, Java, Eiffel.

  8. Type Safety of Pointers • Referencing type converted pointers is type unsafe (layout changes, bit patterns/format stays the same). • Suppose in C we have int x; and then reference it using float y = *((float *) &x); • C and C++ can't control this problem. • Placing the burden on the programmer

  9. Run Time Pointer Issues • Uninitialized or misdirected pointers are a common source of woe. • Pointers may alias a memory location. • Dangling references point to deallocated memory. • Garbage is memory that is allocated but unreferenced. • Garbage collection is implicit deallocation.

  10. Preventing Dangling References • Tombstones use an additional (hidden) level of indirection.

  11. Locks and Keys • Locks and keys use metadata stored at both the pointer and the referenced object.

  12. Garbage Collection and Aliasing • Aliasing makes garbage collection hard. • If a referenced site is mistakenly labeled as garbage, there may be dangling references. • Garbage colletion requires special run time support. • All methods require additional metadata either stored with the pointer or stored in the dynamically allocated memory.

  13. Mark and Sweep Garbage Collection • Dynamically allocated objects have metadata to indicate if a object is garbage. • Initially mark each allocated object as garbage • For each pointer • Mark the data object it points to as not garbage • Deallocate objects marked as garbage.

  14. Reference Counting • Dynamically allocated objects have metadata to indicate how many pointers are directed at it. • When a pointer is directed at an object, increment its reference count • When a pointer is deallocated or directed away from an object, decrement that object's reference count. • Deallocate object if reference count = 0.

  15. Circular References and Reference Counts • Circularly linked structures may not be properly reclaimed using reference counts • Since the count will not go to zero.

  16. Aggregate Types • Aggregate Types - Are Nonscalar, each data object holds multiple values • Arrays - hold lists of values • Records -Contain heterogeneous value • Variant Records -Has Overlapping Fields • They all have a common base address • Systems objects (e.g. File) • File - A sequence of (possiblty heterogeneously typed) records.

  17. Data Layout in records • How does a compiler allocate space for fields? • Typically in the order in which they are declared (needed for systems programming) • Computer architectures often have errors or performance degradation if data objects are not on word boundaries (frequently 32 bits). • To avoid this, compiler writers attempt to align data with boundaries by "padding" the data • In Pascal, if padding is optional, data can be packed.

  18. A Record Example • Consider the the Source Code for the following C struct and equivalent Pascal record. What is the data layout?

  19. Data Layout of the Example Record • If we assume 32 bit word data alignment the data layout looks like:

  20. Variant Records/Unions • Variant Records have fields starting on the same base address. • Variant records are called unions in C/C++. • Some languages (Pascal, Modula 2, Ada) use a field to indicate which layout to use. • This Special field is called a discriminant. • Others like C rely on distinct field names.

  21. Variant records and Type Safety • Variant Records introduce serious type safety concerns. • Data can be stored under one field and retrieved using another. • Compile time checking hard/infeasible, especially with separate compilation! • Run time checking too expensive. • When retrieved, the layout will differ, but the bit patter (data format) is still the same!

  22. Variant record data layout • Consider the following Ada source code for a record containing a variant record as a field. • Can you give the data layout?

  23. Variant record data layout • The discriminant is allocated space.

  24. Arrays and Data Layout • Arrays are often packed • Records stored in arrays are often padded to 32 bit word boundaries by compilers. • Arrays can have one or more dimensions. • Arrays can be very large (leading to swap space use). • When traversing an array, data access is fastest using contiguous ascending addresses.

  25. Strings • Strings are arrays of characters. • A character array might be larger than needed, if the amount of text stored is not known at compile time. • How can a programmer know where the last significant character is in that case? • Store the index of the last character used (in Pascal you have to do this). • Use a sentinal character to signal end of list (C, C++ use NULL terminated strings).

  26. Multi-Dimensional Array Syntax • Consider a 2 dimensional array of floating point numbers. • In Pascal something like • Var A:array of [1..10] of array[1..10] of real; • In C/C++ and Java, something like: • double A[10][10]; • In Fortran something like: • real, dimension (10,10) :: A;

  27. Multi-Dimensional Array Layout • Many languages support simple forms of multi dimensional array layout, and support contiguous storage allocation. • Column Major -The addresses that vary fastest are the left most (used by Fortran) • So A(2,3) is next to A(3,3) but not A(2,4) • Row Major - The addresses that vary fastest are the right most addresses (C, C++, Pascal). • So A(2,3) is next to A(2,4) but not A(3,3).

  28. Multi-Dimensional Array Layout Issues • However other alternatives exist: • Allocating arrays of pointers to indexes that resolve to arrays.

  29. Support for Large Multidimensional Arrays • When large multi dimensional arrays are used (e.g. in linear algebra applications) • Fast address computations are important! • Locality of reference matters, fetching uncached (or swapped out) data is too slow. • Access patterns may be known by the programmer. • Programmers may commonly access certain parts of an array.

  30. Array Slices • Fortran 90 allows aliasing of a subset of array locations using slices (a notational convenience).

  31. Binding time of Types • Types can be determined • At Compile Time - Static Type Systems • How much checking depends on the language specification • At Run Time - Dynamic Type Systems • Used for interpreted/scripting languages • Type errors can be hard to detect • Often don't need to declare variables/identifiers

  32. Type Checking • Type Checking requires determining • Type Equivalence • Are two types identical? • Type Compatability • Are the operands/result appropriate for the operation? • Type Inference • What type is a data object?

  33. Type Conversion • Type Information Specifies • Layout - Memory management • Format - What does the bit pattern mean? • e.g. 32 bit integer and floating point different format • Type Casting converts between types • Coercion -Converts Layout and Format • Nonconverting Type Casts - Converts layout only (e.g. Using different fields in a C union).

  34. Type Equivalence Revisited • Equivalence - Checks if 2 types the same • Forms of type equivalence • Name Equivalence -feels stricter (ala Pascal) • Structural Equivalence - like C

  35. Name Equivalence • 2 types are name equivalent if • They have the same name • They have the same scope

  36. Structural Equivalence • 2 types are Strutcturally Equivalent if • Every type is structurally equivalent to itself • Structurally equivalent types have the same type constructors applied to structurally equivalent types. • Type renaming does not impact equivalence. • eg. typedef int my_type; my_type is an int.

  37. Type Graphs • Types can be represented using a digraph • Types are nodes • Base types at the leaves • Type constructors form arcs • Below I omit field and type renaming

  38. Type Equivalence Revisited • Name Equivalence • Do array indices need to exactly match? • If so, how to support general purpose functions. • Conformant Arrays -Formal parameters pass the array index bounds (Pascal). • Structural Equivalence • 2 Types are structurally equivalent if they have the equivalent type graphs

  39. Introduction to ML • ML refers to "Meta Language" • Many variants exist, including ocaml • Objective caml • ML is compiled (but one line at a time) • So it feels interactive/interpreted • Why Bother with ML Now? • ML has a sophisticated type inference system • Explore the functional language paradigm

  40. ML Type Inference Rules • All occurences of a variable name within a scope have the same type. • Predicates of if ... then ... else if ... then ... constructs must be boolean (use context). • Function types are a-> b • Tuples can be used to aggregate parameters. • Function Parameters must match the type in the function definition.

  41. An ML Example • ML can infer unspecified types if sufficient information permits.

  42. How is ML's type inference computed? • ML uses unification • Unification is widely used in declaritive languges (especially Prolog) • Given a sequence of clauses describing the system, try to resolve unkowns

More Related