200 likes | 324 Views
CSC 2300 Data Structures & Algorithms. February 27, 2007 Chapter 5. Hashing. Today – Splay Trees and Hashing. Splay Trees – Delete a Node Hashing – Overview Hashing – Separate Chaining. Splay Tree – Delete a Node. How do we delete a node? First, access the node.
E N D
CSC 2300Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing
Today – Splay Trees and Hashing • Splay Trees – Delete a Node • Hashing – Overview • Hashing – Separate Chaining
Splay Tree – Delete a Node • How do we delete a node? • First, access the node. • This puts the node at the root. • After we delete the root, we get TL and TR. • Which node should we select as the new root? • The largest element in TL. • Find this element (easy), and rotate it to the root position of TL. • Will this new root of TL have a right child? • If not, what can we do?
Examples in Class • We will illustrate the idea with a few examples in class: http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm
Zig-zag and Zag-zig • The web site uses a different (and perhaps more intuitive) way to describe the actions of Zig-zag and Zag-zig. We will use the definitions in the text. • Zig-zag as in text: • The web site calls it zag-zig – we do a zag first, and then a zig.
Integer Input Keys • If the input keys are integers, then a good strategy is to compute key mod tableSize. • When may this strategy become a bad choice? • As an example, let tableSize=10. Can you suggest a sequence of input keys that will all be mapped to the same cell? • What is then a good choice for tableSize?
String Input Keys • Choose tableSize as a prime number – a good choice according to the previous slide. • Since memory is cheap, construct a large table. • Let tableSize=10,007. • Suppose that all keys are eight or fewer characters long. • Use this hash function: • What can happen to the hash table? • Hint. What is the maximum integer value of an ASCII character?
String Input Keys • So, we want hashVal to be large – greater than 10,007. • Multiplication may be more appropriate than addition. • Try this hash function: • This function uses the first three characters of the input string. • We can check that 26 x 272 > 10,007. • Why do we use 26 x 272 and not 263 ? • What may go wrong with distribution in this hash table?
String Input Keys • Include also the ten numbers 0 to 9. • Get this hash function: • This function uses Horner’s rule. What is it?
Horner’s Rule • Problem 2.14 in text. • Evaluate f(x) = anxn + an-1xn-1 + … + a2x2 + a1x + a0 • Code: poly = 0; for( i=n; i>=0; i-- ) poly = x * poly + a[i]; • Why does this algorithm work? • What is its running time?
Compromises • A hash function needs not be the best with respect to table distribution. • But it should be simple and reasonably fast. • If the keys are very long, the hash function will take too long to compute. • A common practice is not to use all the characters. • As an example, consider a complete street address. • The hash function may include only a couple of characters from the street address, and a couple of characters from the city name and the zip code. • The idea is that the time saved in computing the hash function will make up for a slightly less evenly distributed function.
Collision Resolution • If, when an element is inserted, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it. • There are several methods for dealing with collision, we will discuss the simplest approach: separate chaining.
Separate Chaining • Keep a list of all elements that hash to the same value. • Example. The keys are the first 10 perfect squares and the hash function is hash(x) = x mod 10.
Discussion • Another data structure could be used to resolve the collisions; for example, binary search trees. • Why do we use linked lists instead? • We define the load factor, λ, of a hash table to be the ratio of the number of elements in the table to the table size. • The average length of a list is λ. • The effort to perform a search is the constant time required to evaluate the hash function plus the time to traverse the list. • In an unsuccessful search, what is the average number of nodes to examine? • In a successful search, what is the average number of nodes to examine? • Why 1+(λ/2) and not (λ/2)?
Hashing Applet • In class, we will run some hashing examples using this web site: http://www.engin.umd.umich.edu/CIS/course.des/cis350/hashing/WEB/HashApplet.htm