620 likes | 632 Views
Chapter 5. Names, Bindings, Type Checking, and Scopes. Chapter 5 Topics. Introduction Names Variables The Concept of Binding Type Checking Strong Typing Type Compatibility Scope and Lifetime Referencing Environments Named Constants. 5.1 Introduction.
E N D
Chapter 5 Names, Bindings, Type Checking, and Scopes
Chapter 5 Topics • Introduction • Names • Variables • The Concept of Binding • Type Checking • Strong Typing • Type Compatibility • Scope and Lifetime • Referencing Environments • Named Constants
5.1 Introduction • Imperative languages are abstractions of von Neumann architecture • Memory • Processor • Variables characterized by attributes • “Type”, most important • To design, must consider scope, lifetime, type checking, initialization, and type compatibility
5.2 Names • Associated with labels, subprograms, formal parameters, etc. • Design issues for names: • Maximum length? • Are connector characters allowed? • Are names case sensitive? • Are special words reserved words or keywords?
Names (continued) • Length • If too short, they cannot be connotative • Language examples: • FORTRAN I: maximum 6 • COBOL: maximum 30 • FORTRAN 90 and ANSI C: maximum 31 • Ada and Java: no limit, and all are significant • C++: no limit, but implementers often impose one
Names (continued) • Forms • A letter followed by a string consisting of letters, digits, and underscore characters(_). • In C-based languages, the underscore is replaced by the “camel” notation (e.g. “myStack”).
Names (continued) • Case sensitivity • C, C++, and Java names are case sensitive. The names in other languages are not. e.g. in C++, rose, ROSE, Rose are different • Disadvantage: readability (names that look alike are different) • worse in C++ and Java because predefined names are mixed case (e.g. IndexOutOfBoundsException)
Names (continued) • Special words • An aid to readability • A keyword is a word that is special only in certain contexts, e.g., in Fortran • Real Apple(Real is a data type followed with a name, therefore Real is a keyword) • Real = 3.4(Real is a variable) • A reserved word is a special word that cannot be used as a user-defined name • E.g. in Java – string, int, main, public, void … • Predefined names – names defined in other program units, such as Java packages, and C and C++ libraries. Can be made visible to a program only if explicitly imported. Once imported, they cannot be redefined.
5.3 Variables • A variable is an abstraction of a memory cell • Variables can be characterized as a sextuple of attributes: • Name • Address • Value • Type • Lifetime • Scope
Variables Attributes • Name - not all variables have them, e.g. a pointer pointing to an integer • Address - the memory address with which it is associated • Variables with the same name may have different addresses at different times during execution e.g. void sub(){ int count; … while (…) { int count; count++; }
Variables Attributes • If two variable names can be used to access the same memory location, they are called aliases • E.g. x = max(a, b); … int max(int m, int n){ … } • Aliases are created via pointers, reference variables, C and C++ unions • Two pointer variables are aliases when they point to the same memory location • Aliasing allows a variable to have its value changed by an assignment to a different variable. It is harmful to readability (program readers must remember all of them)
Variables Attributes (continued) • Type - determines the range of values of variables and the set of operations that are defined for values of that type; • E.g. Java int: - 2147483678 to 2147483647 • Value - the contents of the memory cells with which the variable is associated • Memory cells – abstract memory cells • E.g. a floating point value may occupy 4 physical bytes, we think of it as occupying a single abstract memory cell
5.4 The Concept of Binding • The l-value of a variable is its address, because that is required when a variable appears in the left side of an assignment statement. • The r-value of a variable is its value • To access the r-value, the l-value must be determined first • A binding is an association, such as between an attribute and an entity, or between an operation and a symbol • Binding time is the time at which a binding takes place.
Possible Binding Times • Language design time -- bind operator symbols to operations, e.g. * means multiplication • Language implementation time-- bind floating point type to a representation • Compile time -- bind a variable to a type in C or Java • Load time -- bind a FORTRAN 77 variable to a memory cell (or a C static variable) • Runtime -- bind a nonstatic local variable to a memory cell
Possible Binding Times • count = count + 5; • The type of count is bound at ________ time • The set of possible values of count is bound at ________________time • The meaning of the operator symbol + is bound at ___________time, when the type of its operands have been determined • The internal representation of the literal 5 is bound at ________________time • The value of count is bound at ___________time with this statement compile compiler design compile compiler design execution
Static and Dynamic Binding • A binding is static if it first occurs before run time and remains unchanged throughout program execution. • A binding is dynamic if it first occurs during execution or can change during execution of the program
Type Binding • Before a variable can be referenced, it must be bound to a data type. • How is a type specified? • When does the binding take place?
Variable Declarations • explicit declaration: a program statement that lists variable names and specifies that they are a particular type • implicit declaration: a default mechanism for specifying types of variables (the first appearance of the variable in the program) • Most languages require explicit declaration • FORTRAN, PL/I, BASIC, JavaScript, vbScript, and Perl provide implicit declarations
Variable Declarations • e.g. vbScript dim a, b a = 10 b = “Hello!” … function m(){ … m = result } • Advantage: writability, convenient in programming • Disadvantage: reliability e.g. variables that are accidentally left undeclared by the programmer are given default types and unexpected attributes
Variable Declarations • Less problem with Perl • If a name begins with • $: it is a scalar, which can store either a string or a numeric value • @: it is an array • %: it is an hash structure
Dynamic Type Binding • A variable is bound to a type when it is assigned a value in an assignment statement • (JavaScript, vbScript and PHP) • Specified through an assignment statement e.g., JavaScript list = [2, 4.33, 6, 8]; list = 17.3; • Advantage: flexibility (generic program units) • Disadvantages: • High cost (dynamic type checking and interpretation) • Type error detection by the compiler is difficult • Has to be implemented using pure interpreters
Type Inference • Rather than by assignment statement, types are determined from the context of the reference (ML, Miranda, and Haskell) • E.g. fun circumf(r) = 3.14159 *r*r; - r and result are floating point fun time10(x) = 10 * x - x and result are int
Storage Bindings & Lifetime • Allocation - getting a cell from some pool of available cells • De-allocation - putting a cell back into the pool • The lifetime of a variable is the time during which it is bound to a particular memory cell • Scalar (unstructured) variables can be separated into four categories: static, stack-dynamic, explicit heap-dynamic and implicit heap-dynamic • Storage allocation - http://lambda.uta.edu/cse5317/notes/node33.html
Categories of Variables by Lifetimes • Static • bound to memory cells before execution begins and remains bound to the same memory cell throughout execution, e.g., all FORTRAN 77 variables, C static variables, global variables • Advantages: • efficiency (direct addressing, see supplementary material- memory addressing “http://www.cs.helsinki.fi/u/kerola/tito/koksi_doc/memaddr.html”) • history-sensitive subprogram support • Disadvantage: lack of flexibility (no recursion), storage cannot be shared among variables
Categories of Variables by Lifetimes • C and C++ static variables (see supplementary material “Using the Static Keyword-http://www.cprogramming.com/tutorial/statickeyword.html”)
Categories of Variables by Lifetimes • Stack-dynamic • Storage bindings are created for variables when their declaration statements are elaborated (gets executed at run time). • Types are statically bound • Does not matter where the declarations occur • Allocated from run-time stack • E.g. local variables • In Java, C++ and C#, variables defined in methods are by default stack-dynamic • Advantage: allows recursion; conserves storage • Disadvantages: • Overhead of allocation and de-allocation • Subprograms cannot be history-sensitive • Inefficient references (indirect addressing)
Categories of Variables by Lifetimes • Explicit heap-dynamic • Allocated and de-allocated by explicit directives, specified by the programmer, which take effect during execution • Heap is a collection of storage cells whose organization is highly disorganized because of the unpredictability of its use • Referenced only through pointers or references variables, e.g. dynamic objects in C++ (via new and delete), all objects in Java • Has two variables associated with it: a pointer or reference variable through which the heap-dynamic variable can be accessed, and the heap-dynamic variable itself e.g. int *intnode; //create a pointer that points to an interger intnode = new int; //create the heap-dynamic variable delete intnode; //de-allocate the heap-dynamic variable
Categories of Variables by Lifetimes • Java: all data except the primitive scalars are objects. Java objects are explicitly heap-dynamic and are accessed through reference variables. Implicit garbage collection is used • C#: has both explicit heap-dynamic and stack-dynamic objects, all of which are implicitly de-allocated. Also supports C++ style pointers to interoperate with C and C++ components. • Advantage: provides for dynamic storage management • Disadvantage: inefficient and unreliable
Categories of Variables by Lifetimes • Implicit heap-dynamic • Bound to heap storage only when they are assigned values • all variables in APL; all strings and arrays in Perl and JavaScript • E.g. list = [3.5, 4.1] list = 47 • Advantage: flexibility • Disadvantages: • Inefficient, because all attributes are dynamic • Loss of error detection
5.5 Type Checking • Type checking is the activity of ensuring that the operands of an operator are of compatible types • Generalize the concept of operands and operators to include subprograms and assignments • E.g. • int a, b, x; • a = 4;b = 5; x = max(4,5);…float max(float m, float n{ …} string result; int a, b;a = 4;b = 5;result = a + b;
Type Checking • A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compiler- generated code, to a legal type • This automatic conversion is called a coercion. e.g. int a = 2; float b = 1.5 b = a + b; a = a + b; • A type error is the application of an operator to an operand of an inappropriate type
Type Checking • Static type checking – consistency can be checked before the program is run. All variables must be given a type • Dynamic type checking – type checking at run time. Dynamic type binding. • If all type bindings are static, nearly all type checking can be static • If type bindings are dynamic, type checking must be dynamic (e.g. JavaScript) • Advantage of static type checking – potential problems can be identified earlier • Advantage of dynamic type checking - flexibility
5.6 Strong Typing • A programming language is strongly typed if type errors are always detected, either at compile time or run time • Fortran 95: not strongly typed. The use of “Equivalence” allows a variable of one type to refer to a value of different type • Ada: nearly strong typed. Allows programmers to suspend type checking for a particular type conversion • C, C++: not strongly typed. “Union” types are not type checked • Java, C#: strongly typed • Languages with a great deal of coercion, like Fortran, C and C++ are significantly less reliable than those with little coercion, such as Ada, Java and C#.
5.7 Type Compatibility • Name type compatibility means the two variables have compatible types if they are in either the same declaration or in declarations that use the same type name e.g. int a, b; and int a; int b;
Type Compatibility • Easy to implement but highly restrictive: • Subranges of integer types are not compatible with integer types e.g. • Formal parameters must be the same type as their corresponding actual parameters. If a structured type is passed among subprograms through parameters, such a type must be defined only once, globally. A subprogram cannot state the type of such formal parameters in local terms type Indextype is 1..100count: Integerindex: Indextype
Type Compatibility • Structure type compatibilitymeans that two variables have compatible types if their types have identical structures e.g. • More flexible, but harder to implement. Entire structures of the two types must be compared. The comparison is not always simple Apple{ string color; float weight; string type;} Pear{ string color; float weight; string type;} Apple a = new Apple(); Pear p = new Pear();
Type Compatibility • Consider the problem of two structured types: • Are two record types compatible if they are structurally the same but use different field names? • Are two array types compatible if they are the same except that the subscripts are different? (e.g. [1..10] and [0..9]) • Are two enumeration types compatible if their components are spelled differently? • With structural type compatibility, you cannot differentiate between types of the same structure e.g. type celsius is new Float; type fahrenheit is new Float; Although compatible in structure, but shouldn’t be.
Type Compatibility • Ada: uses name type compatibility, but provides two type constructs, subtypes and derived types. • A derived type is a new type that is based on some previously defined type with which it is incompatible, although it may have identical structure e.g. type celsius is new Float; type fahrenheit is new Float; derived from Float, incompatible with each other and with Float • A subtype is a possibly range-constrained version of an existing type. Compatible with its parent type. e.g. subtype Small_type is Integer range 0..99; variables of Small_type is compatible with Integer
Type Compatibility • C: uses structure type compatibility for all types except structures and unions • C++ uses name type compatibility
5.8 Scope • The scope of a variable is the range of statements over which it is visible • A variable is visible in a statement if it can be referenced in that statement • The nonlocal variables of a program unit are those that are visible but not declared there
Scope • Static Scope • Binding names to nonlocal variables • The scope of a variable can be determined prior to execution • Two categories of static-scoped languages • Subprograms can be nested (Ada, JavaScript, PHP) • Subprograms cannot be nested (C-based languages)
Scope • To decide the scope, the reader must find the declaration of the variable • Search process: search declarations, first locally, then in increasingly larger enclosing scopes, until one is found for the given name, e.g. find the scope of X in Sub1 Procedure Big is X: Integer; procedure Sub1 is begin … X … end; procedure Sub2 is X: Integer; begin … end; begin … end
Scope • Enclosing static scopes (to a specific scope) are called its static ancestors; the nearest static ancestor is called a static parent • Variables can be hidden from a unit by having a "closer" variable with the same name. e.g. Legal in C and C++, illegal in Java and C# void sub(){ int count; while (…){ int count: count++; }}
Scope • In Ada, hidden variables from ancestor scopes can be accessed with selective references. E.g. Big.X • C-based languages do not allow subprograms to be nested inside other subprogram definition, but they have global variables. These variables are declared outside any subprogram definition. Local variables can hide these globals. In C++, such hidden variables can be accessed using the scope operator ::, e.g. class_name::variableName
Blocks • A method of creating a section of code to have its own local variables whose scope is minimized. Such a section of code is called a block. • Variables are stack dynamic, so they have their storage allocated when the section is entered and de-allocated when the section is exited. • C-based languages allow any compound statements (a statement sequence surrounded by matched braces) to have declarations and thus define a new scope. • Blocks are treated exactly like those created by subprograms. References to variables in a block that are not declared there are connected to declarations by searching enclosing scopes.
Blocks • Examples: C and C++: for (...) { int index; ... } Ada: declare LCL : FLOAT; begin ... end • C++ allows variable definitions to appear anywhere in functions. When a definition appears at a position other than the beginning of a function, but not within a block, that variable’s scope is from its definition to the end of the function. • In C, all data declarations in a function but not in blocks within the function must appear at the beginning of the function. • In classes of C++, Java and C#, instance variables’ scope is the whole class. The scope of a variable defined in a method starts at the definition.
MAIN MAIN A C A B D C D E B E Evaluation of Static Scoping • Static scoping provides nonlocal access, but also has problems. • Assume MAIN calls A and B A calls C and D B calls A and E
Static Scope Example MAIN MAIN A B A B C D E C D E Desired Calls Potential Calls
Static Scope (continued) • Suppose the spec is changed so that E must now access some data in D • Solutions: • Put E in D (but then E can no longer access B) • Move the data from D that E needs to MAIN (but then all procedures can access them - possibility of incorrect access) • Overall: static scoping often encourages many global variables
Dynamic Scope • Based on calling sequences of program units, not their textual layout • Determined at run time • References to variables are connected to declarations by searching back through the chain of subprogram calls that forced execution to this point