230 likes | 251 Views
Symbol Tables. Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc. Typical symbol table operations are Insert, Delete and Search It's a dictionary structure!. Symbol Tables.
E N D
Symbol Tables • Symbol tables are used by compilers to keep track of information about • variables • functions • class names • type names • temporary variables • etc. • Typical symbol table operations are Insert, Delete and Search • It's a dictionary structure!
Symbol Tables • What kind of information is usually stored in a symbol table? • type • storage class • size • scope • stack frame offset • register • We also need a way to keep track of reserved words.
Symbol Tables • Where is a symbol table stored? • array/linked list • simple, but linear lookup time • However, we may use a sorted array for reserved words, since they are generally few and known in advance. • balanced tree • O(lgn) lookup time • hash table • most common implementation • O(1) amortized time for dictionary operations
Hashing • Hash tables • use array of size m to store elements • given key k (the identifier name), use a function h to compute index h(k) for that key • collisions are possible • two keys hash into the same slot. • Hash functions • A good hash function • is easy to compute • avoids collisions (by breaking up patterns in the keys and uniformly distributing the hash values)
Hashing • In the following slides: • k is a key • h(k) is the hash function • m is the size of the hash table • n is the number of keys in the hash table
Hashing • What makes a good hash function? • It is easy to compute • It minimizes collisions. • hash = to chop into small pieces (Merriam- Webster) =to chop any patterns in the keys so that the results are uniformly distributed (cs311)
Hashing • When the key is a string, we generally use the ASCII values of its characters in some way: • Examples for k = c1c2c3...cx • h(k) = (c1128x-1+c2128x-2+...+cx1280) mod m • h(k) = (c1+c2+...+cx) mod m • h(k) = (h1(c1)+h2(c2)+...hx(cx)) mod m, where each hi is an independent hash function.
Hash functions Truncation • Ignore part of the key and use the remaining part directly as the index. • Example: if the keys are 8-digit numbers and the hash table has 1000 entries, then the first, fourth and eighth digit could make the hash function. • Not a very good method : does not distribute keys uniformly
Hash functions Folding • Break up the key in parts and combine them in some way. • Example : if the keys are 9 digit numbers, break up a key into three 3-digit numbers and add them up.
Hash functions Middle square • Compute k*k and pick some digits from the resulting number. • Example : given a 9-digit key k, and a hash table of size 1000 pick three digits from the middle of the number k*k. • Works fairly well in practice if the keys do not have many leading or trailing zeroes.
Hash functions Division • h(k)=k mod m • Fast • Not all values of m are suitable for this. For example powers of 2 should be avoided because then k mod m is just the least significant digits of k • Good values for m are prime numbers .
Hash functions Multiplication • h(k)=m (k c- k c) , 0<c<1 • In English : • Multiply the key k by a constant c, 0<c<1 • Take the fractional part of kc • Multiply that by m • Take the floor of the result • The value of m does not make a difference • Some values of c work better than others • A good value is
Hash functions Multiplication • Example: Suppose the size of the table, m, is 1301. For k=1234, h(k)=850 For k=1235, h(k)=353 For k=1236, h(k)=115 For k=1237, h(k)=660 For k=1238, h(k)=164 For k=1239, h(k)=968 For k=1240, h(k)=471 nice distribution!
Hash functions Universal Hashing • Worst-case scenario: The chosen keys all hash to the same slot. This can be avoided if the hash function is not fixed: • Start with a collection of hash functions • Select one at random and use that. • Good performance on average: the probability that the randomly chosen hash function exhibits the worst-case behavior is very low.
Load factor • Given a hash table of size m, and n elements stored in it, we define theload factor of the table as=n/m • The load factor gives us an indication of how full the table is. • The possible values of the load factor depend on the method we use for resolving collisions.
Resolving collisions: Chaining • Chaining* • Put all the elements that collide in a chain (list) attached to the slot. • The hash table is an array of linked lists • The load factor indicates the average number of elements stored in a chain. It could be less than, equal to, or larger than 1. * a.k.a. closed addressing
Resolving collisions: Chaining • Insert/Delete/Lookup in expected O(1) time • Keep the list doubly-linked to facilitate deletions • Worst case of lookup time is linear. • However, this assumes that the chains are kept small. • If the chains start becoming too long, the table must be enlarged and all the keys rehashed.
Resolving collisions: Chaining • Assumption: simple uniform hashing • any given key is equally likely to hash into any of the m slots • Analysis of unsuccessful search: • average time to search unsuccessfully for key k = the average time to search to the end of a chain. • The average length of a chain is . • Total (average) time required : (1+ )
Resolving collisions: Chaining • Analysis of successful search: • Expected number e of elements examined during a successful search for key k= one more than the expected number of elements examined when k was inserted. • it makes no difference whether we insert at the beginning or the end of the list. • Take the average, over the n items in the table, of 1 plus the expected length of the chain to which the i th element was added:
Resolving collisions: Chaining Total time : (1+ )
Resolving collisions: Chaining • Both types of search take (1+ ) time on average. • If n=O(m), then =O(1) and the total time for Search is O(1) on average • Insert : O(1) in the worst case • Delete : O(1) in the worst case
Resolving collisions: Chaining • Storage for the elements may be allocated and deallocated within the hash table itself by linking all unused slots into a free list. • Insert: • if key k hashes into empty slot h(k), put it there and set a flag to indicate that this is the actual position where the element hashed. • if h(k) is not empty, and the element k1 it contains has its flag set, then use a slot off the free list to store k1. Its flag should be unset. • if h(k) is not empty, and the element k1 it contains has its flag unset, then move k1 to another slot and store k in h(k).