520 likes | 673 Views
A Knowledge Sharing Session on. Unit IV: Tables (DSPS). Syllabus: Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree Hash Tables: Basic Concepts, Hash Function, Hashing
E N D
A Knowledge Sharing Session on Unit IV: Tables (DSPS)
Syllabus: Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing. Tables |Unit IV of DSPS (SE-Comp)
Part I : Symbol Tables Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree. Part II: Hash Tables Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.
Symbol Table | Why Symbol Table Symbol Table What Compiler Does? • Lexical analysis – Detects inputs with illegal tokens • e.g.: main$ (); • Parsing – Detects inputs with ill-formed parse trees • e.g.: missing semicolons • Semantic analysis – Last “front end” phase – Catches all remaining errors Symbol Table Examples AVL Tree AVL Implementation AVL Algorithm Analysis
Symbol Table | Why Symbol Table Typical Semantic Errors • multiple declarations: a variable should be declared (in the same region) at most once. • undeclared variable: a variable should not be used before being declared. • type mismatch: type of the left-hand side of an assignment should match the type of the right-hand side. • wrong arguments: methods should be called with the right number and types of arguments.
Symbol Table | Aim of Symbol Table Purpose of Symbol Table – keep track of names declared in the program – names of • variables, classes, fields, methods,
Symbol Table | Symbol Table Stores What it Contains associates a name with a set of attributes, e.g.: • kind of name (variable, class, field, method, etc) • type (int, float, etc) • nesting level • memory location (i.e., where will it be found at runtime).
Symbol Table | Symbol Table Revisit In Short, During Lexical Analysis --Finds Symbols --Adds Symbols to symbol table During Syntactic Analysis --Information about each symbol is filled in During Semantic Analysis --Used for type checking.
Symbol Table | Symbol Table Important? Info Provided by Symbol Table , • Given an Identifier which name is it? • What information is to be associated with a name? (Actual Characters of the name, Type, Storage allocation info (number of bytes), Line number where declared, Lines where referenced, Scope. • How do we access this information? • How do we associate this information with a name?
Symbol Table | Reminder on Symbol Table Note, • A name can represent • Variable • Type • Constant • Parameter • Record • Record Field • Procedure • Array • Label • file
Symbol Table Symbol Table |Operations on Symbol Table Operations on Symbol Table • determining whether a string has already • been stored • inserting an entry for a string • deleting a string when it goes out of scope • This requires three functions: • 1. lookup(s): returns the index of the entry for • string s, or 0 if there is no entry • 2. insert(s): add a new entry for string s and return its index • 3. delete(s): deletes s from the table (or, typically, • hides it)
Symbol Table Symbol Table | Symbol Table Examples Example • 01 PROGRAM Main • 02 GLOBAL a,b • 03 PROCEDURE P (PARAMETER x) • 04 LOCAL a • 05 BEGIN {P} • 06 …a… • 07 …b… • 08 …x… • 09 END {P} • 10 BEGIN{Main} • 11 Call P(a) • 12 END {Main}
Symbol Table Unsorted List • 01 PROGRAM Main • 02 GLOBAL a,b • 03 PROCEDURE P (PARAMETER x) • 04 LOCAL a • 05 BEGIN {P} • 06 …a… • 07 …b… • 08 …x… • 09 END {P} • 10 BEGIN{Main} • 11 Call P(a) • 12 END {Main} Look up Complexity Name Characteristic Class Scope Other Attributes Declared Referenced Other Main Program 0 Line 1 a Variable 0 Line 2 Line 11 b Variable 0 Line 2 Line 7 P Procedure 0 Line 3 Line 11 1, parameter, x x Parameter 1 Line 3 Line 8 a Variable 1 Line 4 Line 6
Symbol Table Sorted List • 01 PROGRAM Main • 02 GLOBAL a,b • 03 PROCEDURE P (PARAMETER x) • 04 LOCAL a • 05 BEGIN {P} • 06 …a… • 07 …b… • 08 …x… • 09 END {P} • 10 BEGIN{Main} • 11 Call P(a) • 12 END {Main} Look up Complexity Worst Case: Name Characteristic Class Scope Other Attributes Declared Referenced Other a Variable 0 Line 2 Line 11 a Variable 1 Line 4 Line 6 b Variable 0 Line 2 Line 7 Main Program 0 Line 1 P Procedure 0 Line 3 Line 11 1, parameter, x x Parameter 1 Line 3 Line 8
Two issues: 1. Interface: how to use symbol tables 2. Implementation: how to implement it.
Basic Implementation Techniques Considerations: Number of names Storage space Retrieval time
<1> unordered list (linked list/array) • <2> ordered list • binary search on arrays • expensive insertion • (+) good for a fixed set of names • (e.g. reserved words, assembly opcodes) • <3> binary search tree • On average, searching takes • O(log(n)) time. • However, names in programs are not chosen randomly. • <4>AVL: • <5> Hash table: most common • (+) constant time
Static Tree Table • If Symbols are known in advance : • No insertion and Deletion allowed • Cost of searching symbols of higher frequency should be small. • Huffman tree and OBST 1 0 if 0 1 1 0 a c b do Read 0 1 while e d Fig: Optimal Search Tree when frequency of symbols are specified Fig: Huffman Tree
bst 50 32 60 20 45 55 68 Dynamic Tree Tables • Symbols are inserted as and when they come • Deletion is also possible • AVL
Part I : Symbol Tables Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree Part II: Hash Tables Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.
Hash Table| Motivation Where Hashing will be Used? docDict Database Compliers Network Router and Servers Substring Search Cryptography
Symbol Table | Why Hash Table Hashing A Problem? • We have to store some records and perform the following: add new record delete record search a record by key Find a way to do these efficiently! Motivation Hashing Methods Collision Resolution
Hash Table| Unsorted Array Use an array to store the records, in unsorted order add - add the records as the last entry fast O(1) delete a target - slow at finding the target, fast at filling the hole (just take the last entry) O(n) search - sequential search slow O(n)
Hash Table| Sorted Array Use an array to store the records, keeping them in sorted order add - insert the record in proper position. much record movement slow O(n) 2. delete a target - how to handle the hole after deletion? Much record movement slow O(n) 3. search - binary search fast O(log n)
Hash Table| Linked List Store the records in a linked list (unsorted) add - fast if one can insert node anywhere O(1) delete a target - fast at disposing the node, but slow at finding the target O(n) search - sequential search slow O(n) (if we only use linked list, we cannot use binary search even if the list is sorted.)
Hash Table| More Approaches What is the Solution then? have better performance but are more complex Hash table Tree (BST, Heap, …)
Hash Table| More Approaches Array as table? studid name score 0012345 sandy 81.5 0033333 bubli 90 0056789 david 56.8 ... 9801010 peter 20 9802020 manali 100 ... 9903030 tushar 73 9908080 Namrata 49
Hash Table| Array as table? name score 0 : : : One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345] 12345 andy 81.5 : : : 33333 betty 90 : : : 56789 david 56.8 : : : : : : 9908080 bill 49 : : : 9999999
Hash Table| Whats Wrong Then? Consider this problem. We want to store 1,000 student records and search them by student id. One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]
Hash Table| What's Wrong Then? Keys may not be nonnegative integers. Gigantic Memory hog
Hash Table| What's Wrong Then? Keys may not be nonnegative integers. Solution: Prehash Gigantic Memory hog Solution: Direct Hash Table (reduce universe of all keys to reasonable size)
Hash Table| Direct Hashing Table •Each slot, or position, corresponds to a key in U. •If there’s an element x with key k, then T [k] contains a pointer to x. •Otherwise, T [k] is empty, represented by NIL.
Hash Table| Direct Hashing Table Store the records in a huge array where the index corresponds to the key add - very fast O(1) delete - very fast O(1) search - very fast O(1)
Hash Table| Hash function function Hash(key: KeyType): integer; Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number. H(‘0012345’) = 134 H(‘0033333’) = 67 H(‘0056789’) = 764 … H(‘9908080’) = 3
Hash Table| Hash Table 0 name score To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id). : : : 3 9908080 bill 49 : : : 67 0033333 betty 90 : : : 134 0012345 andy 81.5 : : : 764 0056789 david 56.8 : : : 999 : : :
h(k) = k mod m Hash Table| Division Method Ex: key mod size 2201 mod 1000 =201
Hash Table| Collision different keys map to the same index i.e h(k1)=h(k2)=i (k1!=K2) Ex: 5 mod 11 and 27 mod 11 have index 5.
Hashing • Widely useful technique for implementing dictionaries • Constant time per operation (on the average) • Best Case O(1) • Worst Case O(n) 0 1 2 3 4 5 Key Record f()=>address
Ch s Hash Function • Quick Computation • I t should spread keys evenly: • Uniform Distribution • Avoid collision • Very rare cases • E.g Birth day paradox
Hash Functions • Direct hashing • Digit Extraction • Modulo –division method • Mid-square Method • Folding method
Hash Table|-Collision Resolution DS Hashing with Separate Chaining (Open hashing)-unlimited space Hashing with Open Addressing(closed hashing)
Hash Table|-Collision Resolution Strategies Separate chaining Open Addressing Quadratic Probing Linear Probing Double Hashing LP without chaining LP with chaining LPWC with replacement LP WC without replacement
Hash Table| Chained Hash Table One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil. 0 1 nil 2 nil 3 4 nil 5 : Key: 9903030 name: tom score: 73 HASHMAX nil
Hash Table| Rehashing Is required: • When table is completely full • With quadratic probing when table is filled half • When insertion fail due to overflow • Size get double after rehashing • Mod value changed to new size • * Very costly as new table creation, insertion from old table with using new hash fun.
Hash Table| Rehashing It’s more efficient when load factor is >=70% Whr l is load factor= l=h/t whr h is total mapped loc t is total loc.
Types of Linear Probing (with chaining with and without replacement Note: Try to Solve all example that is taken in class on transparencies and on board ……you can take it from book…
Extendible Hashing • All tech. so far are used for small data • When data becomes bulky there will be too many disk access • So in that case use extendible hashing • This uses binary (disk) coding to mapped the loc with binary values. • 4 size hash table with 4 slot • 00 • 01 • 10 • 11
**Implementation: • Followings are some example how to create structure and apply hash function on it… • Linear Probing with store and search • Double hashing • Quadratic probing
Linear Probe intsearch_LP(inthashtable[],intkey,int T[]) {intI,j; J=key%max;// mapped loc for(i=0;i<MAX;i++) { if(T[j]==0) { hashtable[j]=key; T[j]=1; return(j); } j=(j+1)%MAX;//next loc in circular way. } return(-1); }
Search in LP Only change if condition checking for{ If(T[j]==1 && hashtable[j]==key) { return(j); } }