310 likes | 387 Views
You can write, but can you type?. Gary Marsden Semester 2 – 2001. Why bother. Really, there is only one data type – ones and zeros. Anything else is an artificial creation However, artificial data types improve the correctness of programs Some programmers, however, live by
E N D
You can write, but can you type? Gary Marsden Semester 2 – 2001
Why bother • Really, there is only one data type – ones and zeros. Anything else is an artificial creation • However, artificial data types improve the correctness of programs • Some programmers, however, live by • “Strong types are for weak minds” • Definition: A data type is a set of values, together with a set of operations on those values.
Declarations • Type is given to a value through the declaration of a variable • By declaring a variable, we are associating an identifier and a type to a value in a memory location • Technically, this is know as binding • N.B. There are usually rules about acceptable identifier names (no punctuation or reserved words)
When is type allocated • There are many aspects to binding, the most important of which concerns when information is bound to a variable • There are two types • Static binding or Early binding when information is determined at compilation • Dynamic binding or Late binding when information is bound at run time • We will now look at the implications of these and the information which may be bound
Name & declaration binding • Variables are usually introduced in a program by a declaration • This will require us to look at • the connection (or bind) between type information and a declaration and also • the connection (or bind) between the use of a named variable and its declaration
Static typing – binding type information early • If type information is bound to a variable at compile time (static binding), this has a number of consequences • The compiler can check type consistency, ensuring all executables are type correct • No type errors – fewer crashes • No run time checking – faster code • The compiler can allocate storage (the size for each type is known) resulting in more efficient programs • Statically typed languages provide strong typing
Problems of static typing • Is usually very restrictive (Pascal) • loss of expressive power through premature commitment • Also no generic procedures such as ‘sort’ • Could flaunt type rules and force type conversions (coercion) • this is known as weak typing (like C) and leads to errors • Is there an alternative?
Dynamic typing – binding type information late • We can increase flexibility in our typing, and still provide strong typing, by not checking type information until run time • This is called ensuring type consistency • This makes prototyping solutions easier • It also allows us to create generic procedures (important in OO for polymorphism and overloading), but… • results in slower programs (run time checking)
Dynamic weak typing • Some dynamically typed languages don’t report errors if there is a type mis-match at run time • Instead they automatically coerce values so that they match • SmallTalk will coerce to strings, so if there are a lot of mismatches, every variable will eventually end up as a string
Static vs. Dynamic typing • Static • inconsistencies found at compile time • executed programs are type consistent • increases execution efficiency • easy to read programs • Dynamic • Much more flexible – better prototyping
Scope • We now look at the second sort of information binding – that between a declaration and use of a variable • This introduces the idea of scope; i.e. the area of a program where a particular declaration is valid
Static scope • ‘main’ and ‘inn’ are both examples of blocks • variables declared within a block are local to the block – local variables • Variables declared in enclosing blocks are visible to inner blocks; variables in the inner block are not visible outside • Variable identifiers are bound to the most local declaration float x,y; void inn() { int y,z; y = 34; } void main() { x = 3.141; y = x; }
Static scope in practice • The program works therefore as this ‘y’ in ‘main’ binds to the global declaration, not the inner one • This does not cause a type violation, as it refers to the closest (integer) declaration of ‘y’ float x,y; void inn() { int y,z; y = 34; } void main() { x = 3.141; y = x; }
Dynamic scope • With dynamic binding, lexical structure is ignored and the most recent declaration of a variable is bound to the name • The value printed is therefore ‘2.0’ • With static binding, the result would have been ‘1’ • Dynamic scoping makes analysing listings complex int x; void A() { printf(x);} void B() { float x = 2.0; A(); } void main() { x = 1; B(); A(); }
Allocating types • Typing is usually explicit requiring the programmer to allocate types to constants, variables, operators and functions • Improves readability • unless using FORTRAN which uses first letter of variable name to denote type • Implicitchecking can be performed by type inference systems
Summary • Languages which guarantee type consistency for all types are usually referred to as strongly typed • static => strong • strong > static (could be dynamic with type consistency check) • Languages are usually statically typed and scoped • Static typing usually has explicit declarations • Dynamic typing has explicit and implicit (more usual) • Static, explicit typing is most common • Dynamic scoping is complex and very uncommon
Declaration – reference bindings • This refers to the lifetime or extent of a variable • i.e. how long storage space is bound to a variable identifier • In most block structured languages this is only bound on block entry and collected in block exit • previous values of local variables are therefore lost • hence ‘static’ variables (how should they be initialised?) • For dynamic structures, which are explicitly allocated, they must be explicitly released or have garbage collection
Reference – Value binding • In an assignment statement such as • x = x + 5; • The ‘x’ on the left refers to a location to put values and the ‘x’ on the right refers to an actual value • We need to de-reference the value on the right to get the value • Without assignment, names can be directly bound to values (not a reference) as in functional languages
Constants • Here, there is a direct binding of name to value (no reference) as the value cannot change • Some languages (e.g. ANSI C) require constants to be given a type • Other languages (K&R C) use macro pre-processors to lexically replace the constants before compilation
Basic types • Almost every language has basic predefined types which are used to construct other types • These come in two flavours • Simple: integer, character, real… • Enumerated: these are not predefined; are used to create types of a particular subset of values • type colours = (red,green,blue); (Pascal) • type digit = (0..9); (Pascal) • Enumerated declarations are usually restricted to ordinal types (which have a discrete range)
Complex types • These are really data structures. The language must provide type construction mechanisms to permit user defined data structures • The most common of these is the record which allows the creation of a heterogeneous data structure with named fields • Another flavour of this is the variant record, where different records of the same type may contain different fields • a concession to memory
Record Examples TYPE Ibr = RECORD I: integer; B:boolean; R:real; END; Variant Record Record TYPE IoR = RECORD CASE IsInt: BOOLEAN OF TRUE: I: integer | FALSE: R:real END; END;
Pointers • One special primitive type we did not mention is that of a pointer • Pointers hold memory addresses and are therefore frequently used with records (which are basically a hunk of memory)
Pointer Types • Pointers first appeared in PL/1, where they could hold addresses of any type • Later languages attach types to pointers, as in C • int ci *cipoint; • ci = 34; cipont = &ci; • C permits “aliasing” where two identifiers can affect the same piece of memory
Aliasing 34 ci Ref. &34 cipoint Ref.
Anti-aliasing • In an effort to stamp out aliasing, Pascal and others introduced pointer types as follows: • TYPE intptr= ^integer; • VAR ip, aip: intptr; i: integer; • As Pascal has no ‘&’ operator, ‘ip’ and ‘aip’ can never point to ‘i’ • Pointer types are created dynamically with ‘new(ip);’ whereupon values can be set • ip^ := 17; // ip^ is integer, ip is integer pointer
Pointer memory • The “new” in Pascal automatically allocates the right amount of memory • C requires explicit allocation • cipoint = (int *)malloc(sizeof(int)); • but Pascal still doesn’t solve aliasing: • aip = ip;
Aliasing in Pascal 17 Ref. ip Ref. aip Ref.
Lifetime • This brings us to another interesting area in the life of variable – how long they persist for • Usually variables are allocated on block entry and de-allocated on block exit • Dynamically allocated memory for variables exist whilst a reference exits to that memory • When no reference exists, the memory can be reclaimed (it is garbage)
Garbage collection • Garbage collection is a big area of research • Explicit collection has problems • Programmers forget to do it • Can cause dangling references • Automatic also has problems • complicated to determine what is garbage • run-time speed hit
Persistent systems • There is a group of persistent languages (e.g. Napier) where variables can live forever • To make a variable persist, it is attached to the “root of persistence” • This has benefits – no explicit saving or serializing • Catch is that you have a huge object store to shuffle around if you want to share your code