520 likes | 673 Views
Scope binding and Data types. Kun-Yuan Hsieh kyshieh@pllab.cs.nthu.edu.tw Programming Language Lab., NTHU. Names. Identify labels, variables, subprograms, parameters, etc. Name length Earliest PL: 1 FORTRAN I: max 6 CORBOL: max 30 FORTRAN 90 & ANSI C: max 31
E N D
Scope binding and Data types Kun-Yuan Hsieh kyshieh@pllab.cs.nthu.edu.tw Programming Language Lab., NTHU PLLab, NTHU,Cs2403 Programming Languages
Names • Identify labels, variables, subprograms, parameters, etc. • Name length • Earliest PL: 1 • FORTRAN I: max 6 • CORBOL: max 30 • FORTRAN 90 & ANSI C: max 31 • Java, C++: no limit, implementers often impose one • Case sensitive (e.g. StrLeng != strleng) • C, C++, and Java are • Problems: poor readability & writability PLLab, NTHU,Cs2403 Programming Languages
Names (con’t) • Keyword: • word that is special only in certain contexts (in FORTRAN) (in FORTRAN) REAL APPLE REAL INTEGER REAL = 3.4 INTEGER REAL • Reserved word: • word that cannot be used as a name (in C) for, while, if, else • Predefined words: • words that have been defined by public libraries (in C) printf, scan, createfile PLLab, NTHU,Cs2403 Programming Languages
Variable (var) • A var is an abstraction of a memory cell • A var can be viewed as below:(Name, Address, Value, Type, Lifetime, Scope) • Name • Referred to as identifier (id) • Address • The memory address with which it is associated • May have different address at different times during execution (e.g. local vars in a recursive subprogram) • May have different address at different places in a program (e.g. count defined in sub1 and sub2 PLLab, NTHU,Cs2403 Programming Languages
Variable • Alias • More than one variables are used to access the same memory location • (in C++) &p = d ; (p and d are aliases) • Type • Determines the range of its values, and the set of ops defined • (in C) unsigned int count ; (range: 0 – 65535) • Value • The contents of the memory cell associated with the name PLLab, NTHU,Cs2403 Programming Languages
Binding • Binding is an association between two attributes • Ex: int count ; count = count + 5; • Type of count: bound at compile time • Value range of count: bound at compiler design time • Value of count: bound at execution time • Definition of + op: bound at language design time PLLab, NTHU,Cs2403 Programming Languages
Static & dynamic binding • Static binding: it first occurs before run time and remains unchanged in execution • Dynamic binding: it first occurs during run time or can change in execution • Discussed issues: type & storage • Type binding • How is type specified? • When does the binding take place? • Storage binding • When does the allocation take place? PLLab, NTHU,Cs2403 Programming Languages
Static type binding • Static type binding: explicit & implicit declaration • Explicit declaration • (in C) int A, B ; double C ; • Implicit declaration • (in FORTRAN) name begins with I, J, K, L, M, N are INTEGER, others are REAL • (C/C++) declaration & definition • declaration specifies types but do not cause allocation • external declaration is used for type checking • definition specifies types also causes allocation • only declares in header files but defines in C files PLLab, NTHU,Cs2403 Programming Languages
Dynamic type binding • Variable is bound to a type when assigned a value in an assignment statement • (in JavaScript) list = [10.2, 3.5] list = count • Advantage: generic programming • process a list of data with any type • Disadvantage: poor error detection • i = x (int) and i = y (double)!!! • Disadvantage: large cost • storage of a variable must be of varying size PLLab, NTHU,Cs2403 Programming Languages
Storage binding • The lifetime of a variable is the time during which it is bound to a specific memory location • static variables • stack-dynamic variables • explicit heap-dynamic variables • implicit heap-dynamic variables PLLab, NTHU,Cs2403 Programming Languages
Static variables • Vars that bound to memory cells before execution begins and unbound when execution terminates • static storage binding • e.g. global variables • (in C) static int count ; // history sensitive • Advantages: • code efficiency by direct addressing • no allocation/deallocation during runtime • Disadvantages: • less flexibility : cannot support recursive • no storage sharing PLLab, NTHU,Cs2403 Programming Languages
Stack-dynamic variables • Vars whose storage bindings are created when declarations are executed • dynamic storage binding • static type binding • allocated from the run-time stack • Advantages: • storage sharing, support recursive • Disadvantages: • overhead of allocation / deallocation PLLab, NTHU,Cs2403 Programming Languages
Example I #include <stdio.h> int count; main( ) { int i ; for (i=0; i<=10; i++) { test( ) ; } } test( ) { int i ; static int count = 0; count = count + 1 ; } virtual address space test::i main::i run-time stack ptr static var ptr test::count count PLLab, NTHU,Cs2403 Programming Languages
Example I #include <stdio.h> int count; main( ) { int i ; for (i=0; i<=10; i++) { test( ) ; } } test( ) { int i ; static int count = 0; count = count + 1 ; } PLLab, NTHU,Cs2403 Programming Languages
Explicit heap-dynamic variables • Vars that are allocated and deallocated by explicit statements during run-time • Dynamic storage binding • Referenced through pointers or references • (in C) malloc( ) and free( ) (in C++) new and delete • Memory leak is a major cause of SW bugs • Heap: a section of virtual address space reserved for dynamic memory allocation • heap fragmentation: a major cause of SW perf bugs • Java objects are explicit heap-dynamic vars • implicit garbage collection (no free or delete) PLLab, NTHU,Cs2403 Programming Languages
Example II #include <stdio.h> main( ) { int i ; test( ) ; } test( ) { int i ; char *space; static int count = 0; space = (char *) malloc(64); // memory leak } virtual address space heap test::space test::i run-time stack ptr main::i static var ptr test::count PLLab, NTHU,Cs2403 Programming Languages
Example II #include <stdio.h> main( ) { int i ; test( ) ; } test( ) { int i ; char *space; static int count = 0; space = (char *) malloc(64); free(space); } virtual address space heap test::i run-time stack ptr main::i static var ptr test::count PLLab, NTHU,Cs2403 Programming Languages
Implicit heap-dynamic variables • Vars that are bound to heap storage only when they are assigned values • dynamic storage binding • Strings and arrays in Perl and JavaScript @names = { “door”, “windows”, “table”}; $names[3] = “bed”; • Advantages: • highest degree of flexibility • Disadvantages: • huge run-time overhead • some loss of error detection by compiler PLLab, NTHU,Cs2403 Programming Languages
Type checking • The activity of ensuring that the operands of an operator are of compatible types • (int) A = (int) B + (real) C • int + real okay for the + operator • (int + real) converts to (real + real) type coercion • int = real type error • Static type checking: done in compile time • Dynamic type checking: done in run time PLLab, NTHU,Cs2403 Programming Languages
Strong typing • Strong-typed: if type errors are always detected • Yes (Ada, Java), No (Pascal, C, C++) • Example: union in C: type error cannot be detected typedef union { float A; int B; } Sample; Sample data; main( ) { float R1 = 3.231; int N1; data.A = R1; ….. N1 = data.B; // N1 = 1078905012 } PLLab, NTHU,Cs2403 Programming Languages
Type compatibility • Name type compatibility: two variables declared with the same type name • Easy to implement but less flexibility type days = integer ; type months= integer ; days N1; months N2; • N1 and N2 are not compatible • Structured type compatibility: two variables have identical structures • More flexible, but harder to implement • N1 and N2 above are compatible PLLab, NTHU,Cs2403 Programming Languages
Scope • The range of statements over which it is visible (it can be referenced) • Static scope: determined in compile time • Search: based on declaration sequence • Dynamic scope: determined in run-time • Search: based on the call stack sequence PLLab, NTHU,Cs2403 Programming Languages
Scope example Proc A reference environment Static scope: A main Dynamic scope: A B main proc main var X1, X2; proc A var X1, X2; end proc B var X1, X2; call A; end call B; end PLLab, NTHU,Cs2403 Programming Languages
Static & dynamic scope Static scope: (31) x = big’s x (4) x = big’s x (2) x = sub2’s x (51) x = big’s x Dynamic scope (31) x = big’s x (4) x = big’s x (2) x = sub2’s x (51) x = sub2’s x Procedure big; var x : integer; procedure sub1; begin …x… (1) end; procedure sub2; var x : integer; begin …x… (2) sub1; end; begin sub1; (3) …x… (4) sub2; (5) end PLLab, NTHU,Cs2403 Programming Languages
Static vs. dynamic scope • Static scope • easier to read • reference statically resolved • fast execution • Dynamic scope • harder to read • reference dynamically resolved • slow execution PLLab, NTHU,Cs2403 Programming Languages
Block • Block: a method of creating static scopes inside program unit • Global reference uses :: operator (in C/C++) for (…) { int index; index = temp; …. temp = ::index; } PLLab, NTHU,Cs2403 Programming Languages
Lifetime & Scope • Scope and lifetime are closely related but different concepts • Example in C • static scope • count scope only in home( ) • count lifetime ends at the return of office( ) void office( ) { … } /* end of office */ void home( ) { int count; … office( ); } /* end of home */ PLLab, NTHU,Cs2403 Programming Languages
Named constants • Named constant: variable that is bound to a value only when it is bound to storage • value cannot be changed later • e.g. final in Java, const in C or C++ • readability & modifiability • different from #define (no storage allocated) • Ex: (in C++) void students( ) { const int count = 100; for (i = 0; i < count; i++) total += scores[i]; } PLLab, NTHU,Cs2403 Programming Languages
Data types PLLab, NTHU,Cs2403 Programming Languages
Primitive data types • Those are not defined in terms of other data types • Numeric types • Integer • Floating-point • Decimal • Boolean types • Character types PLLab, NTHU,Cs2403 Programming Languages
exp fraction (23) exp (11) fraction (52) Numeric Types • Integer • Several sizes • Byte, short, int, long… • Range limited • Floating point • Approximation to model real numbers IEEE float: IEEE double: PLLab, NTHU,Cs2403 Programming Languages
Numeric Types • Decimal • For business applications • Store a fixed number of decimal digits (coded) • Advantage: accuracy • Disadvantages: limited range, wastes memory • e.g. 924.1 in an 6-byte data w/ a 4-byte integer part 00 00 09 24 10 00 PLLab, NTHU,Cs2403 Programming Languages
Boolean & character types • Boolean • Could be implemented as bits, but often as bytes • Advantage: readability • Character • ASCII: one byte per char (0 – 127) • Unicode: 2 bytes per char (multi-language support) PLLab, NTHU,Cs2403 Programming Languages
Character String Types • Values are sequences of characters • C and C++ • Not primitive • Use char arrays and a library of functions that provide operations PLLab, NTHU,Cs2403 Programming Languages
User-defined ordinal types • Ordinal type: the range of possible values can be easily associated with positive integers • enumeration & subrange • Enumeration type: • (C++)enum colors {red, blue, yellow};colors mycolor = blue; • C and C++: are implicitly converted to integermycolor++ assigns green to mycolor • Readability: code is more meaningful • Reliability: compiler can check operations and ranges PLLab, NTHU,Cs2403 Programming Languages
Subrange types • A contiguous subsequence of an ordinal type • (Pascal) type uppercase = ‘A’ .. ‘Z’; index = 1..100; • Subrange types are different from derived types (Ada) type Cash is new INTEGER range 1..100; subtype Score is INTEGER range 1..100; • type Cash (derived) is not compatible with any INTEGER type • type Score (subrange) is compatible with any INTEGER type PLLab, NTHU,Cs2403 Programming Languages
Array types • An aggregate of homogeneous data elements • Indexing: mapping to an element • Subscript range checking: • C, C++, FORTRAN: no range checking • Pascal, Ada, and Java: yes • Subscript type: • FORTRAN, C, Java: integer only • Pascal: any ordinal type (int, char, boolean, enum) • # of subscripts: • FORTRAN I: 3 (max) • FORTRAN 77: 7 (max) • Others: no limit PLLab, NTHU,Cs2403 Programming Languages
Array categories (4) #include <stdio.h> int data[10]; main( ) { … } • Static array • storage: statically bound • subscript range: statically bound • e.g. global arrays in C/C++ • Fixed stack-dynamic array • storage: dynamically bound • subscript range: statically bound • e.g. local arrays in C/C++ #include <stdio.h> int data[10]; main( ) { int bounds[20]; … } PLLab, NTHU,Cs2403 Programming Languages
Array categories (4) (con’t) • Heap-dynamic array • both: dynamic bound • bound multiple times • e.g. (FORTRAN 90) INTEGER, ALLOCTABLE, ARRAY (:,:) :: MAT (MAT is 2-dim array) ALLOCATE(MAT(10, 5)) (MAT has 10 rows, 5 cols) DEALLOCATE MAT (deallocate MAT’s storage) • Stack-dynamic array • storage: dynamic bound • subscript range: dynamic • only bound once (Ada) GET(LEN) declare LIST : array (1..LEN) of integer begin … end; PLLab, NTHU,Cs2403 Programming Languages
Array categories (4) (con’t) • Heap-dynamic array in Perl/JavaScript @home = {“couch”, “chair”, “table”, “stove” }; // $home[0] = “couch” // $home[1] = “chair” // $home[2] = “table” // $home[3] = “stove” $home[4] = “bed”; • Heap-dynamic array in C/C++ sample( ) { int bounds*, A; bound = (int *) malloc(40); A = bound[3]; free(bound); bound = (int *) malloc(80); A = bound[10]; free(bound); } PLLab, NTHU,Cs2403 Programming Languages
Array implementation • One-dim array address calculation • addr(a[j]) = addr(a[0]) + j * elementSize; • Ex: int bounds[10]; // one int = 4 bytes &bounds[0] = 1200 &bounds[4] = • Row-major allocation • addr(a[i,j]) = addr(a[0,0]) + ((i * n) + j) * elementSize; • ex:a[0, ] = 3 4 7 a[1, ] = 6 2 5 3, 4, 7, 6, 2, 5, 1, 3, 8 a[2, ] = 1 3 8 &a[0,0] = 1200 &a[1,2] = 1216 1220 PLLab, NTHU,Cs2403 Programming Languages
Column-major allocation • Column-major allocation • addr(a[i, j]) = addr(a[0,0]) + ((j * n) + i) * elementSize; • Ex: a[0, ] = 3 4 7 a[1, ] = 6 2 5 3, 6, 1, 4, 2, 3, 7, 5, 8 a[2, ] = 1 3 8 &a[0,0] = 1200 &a[1,2] = 1228 PLLab, NTHU,Cs2403 Programming Languages
Associative arrays • an unordered collection of data elements that are indexed by an equal # of keys • example in Perl %salaries = { “John” => 5000, “Tom” => 3500, “Mary” => 6000}; $salaries{“Perry”} = 6500; { add an entry into this array } tax = $salaries{“John”}; {tax is assigned with 5000} delete $salaries{“Tom”}; {remove the entry with key “Tom”} @salaries = (); {remove all entries in this array} PLLab, NTHU,Cs2403 Programming Languages
Record types • A heterogeneous aggregate of data elements identified by names • Ex: (Ada) EMPLOYEE_RECORD : record EMPLOYEE_NAME : record FIRST : STRING (1..20); LAST : STRING (1..20); end record; SALARY : FLOAT; end record PLLab, NTHU,Cs2403 Programming Languages
Record types (con’t) • C/C++: struct • Field reference • COBOL: field OF Name1 OF … OF NameN • Others: Name1.Name2. … .NameN.field • Pascal and Module-2 provide with clause with employee do begin name := ‘Bob’; age := 42; end • Implementation: field are stored in adjacent memory, and accessed through offset address PLLab, NTHU,Cs2403 Programming Languages
Array vs. Record • Array: same type of data • int data[100]; • A = data[35]; • &data[35] = 1340 • Record: diff type of data • struct { char name[32]; int age; float salary; } dd; • A = dd.age; • &dd.age = 1632 PLLab, NTHU,Cs2403 Programming Languages
Union types typedef union { float A; short B; } Sample; Sample data; main( ) { float R1 = 3.231; short N1; data.A = R1; N1 = data.B; // N1 = error number } • Whose variables are allowed to store diff type values at diff time • FORTRAN: EQUIVALENCE • C and C++: union (free union: free from type checking) B data A PLLab, NTHU,Cs2403 Programming Languages
Pointer type • Usage: indirect addressing & dynamic storage management • Operator: * (C, C++), access (Ada), ^ (Pascal) • Problems: • Dangling pointer: points to a heap-dynamic var that has been deallocated • Memory leakage (aka lost heap-dynamic var): an allocated var that is no longer accessible • These 2 problems cause a large part of bugs in software int *p1, *p2; p1 = (int) malloc(sizeof(int)); p2 = p1; delete p1; value = *p2; int *p1, *p2; p1 = (int) malloc(sizeof(int)); p2 = (int) malloc(sizeof(int)); p1 = (int) malloc(sizeof(int)); delete p1; PLLab, NTHU,Cs2403 Programming Languages
dynamic variable Solution to pointer problems • Tombstone approach • Each dynamic storage has a special cell, called tombstone, that points to this storage • Pointer(s) point to the tombstone • Deallocate: set tombstone to NULL • Disads: cost in time & space as tombstone is never deallocated and reused Heap Heap dynamic variable 0 tombstone dangling pointers PLLab, NTHU,Cs2403 Programming Languages
Solution to pointer problems • Locks-and-keys approach • Pointer value represented by (key, address) • Dynamic storage represented by (lock, storage) • Each access will first check if (key == lock), unmatched pair causes run-time error. • Deallocation: lock invalid value PLLab, NTHU,Cs2403 Programming Languages