Hashing

Hashing • Main ch. 11.2-11.5 • Background: • We want to store a collection of data. We want to add to, delete from, and search in the collection • What is the average case complexity of add, delete, and search if: • The collection is stored as an unsorted array • The collection is stored as a sorted array • The collection is in a binary search tree

We Want to Do Better • Hashing has good average case behavior • Suppose • You want to keep track of students via their student ID’s • If students ID’s range from 0-99 this is easy • What if SS type numbers are used? • This type of data is known as sparse data • The SS# becomes a key for obtaining the data

Suppose • We want to keep track of only a small number of students in an array of size 10, and suppose we use SS#’s as keys • We need a function that maps key values (SS #’s) to array indices (integers between 0 and 9) • Such a function is called a hash function. • An example hash function could be: hash(ssn) = ssn%10

Choosing a HashingFunction • For the previous example, we could have used: • hash(ssn) = the first number in the ssn • We want a hash function that uniformly distributes the keys throughout the array. This is called uniform hashing. • If you use a division hash function (remainder of division), it is best to have a table size that is a prime number of the form 4k+3. • see Main, p. 552 for other kinds of hash functions

What could go wrong? • If possible, store an object with key value key in array[hash(key)]. • This is not always possible: you may want to add an object whose key value hashes to an index that’s already in use. • This is called a collision. • What is the big-oh time complexity of a hash table lookup (search) if there are no collisions?

Handlihng Collisions • Linear probing • Place the object in the next open spot • How would you find an object in a hash table that uses linear probing? • How would you delete an object from a hash table that uses linear probing? • See the example in Main p. 550-551

Linear probing • Performance isn’t all that great • Easy to implement • As the hash table gets fuller, larger and larger consecutive stretches of the array will be filled. This is called clustering.

Double Hashing • If there is a collision, hash the key again, using a second hash function. • Double hashing is also called rehashing

Chained Hashing • Each element in the array can hold a list of elements. • Hash the key and put the object in the list in array[hash(key)] • See the demo at: http://www2.ics.hawaii.edu/~richardy/project/hash/applet.html

A Hash Function forNames private int hashFunction( String name ) { int hashValue = 0; char cName = name.toCharArray(); for (int j=0; j < cName.length; j++) { hashValue += cName[j]; } return hashValue % size; } Note that size is previously defined as the size of the hash table.

Time Analysis • The load factor of a hash table is defined as follows:

Searching with LinearProbing • In a non-full hash-table with no removals, and using uniform hashing, the average number of table elements examined in a successful search is approximately:

Searching withDouble Hashing • In a non-full hash-table with no removals, and using uniform hashing, the average number of table elements examined in a successful search is approximately:

Searching withChained Hashing • In a non-full hash-table, using uniform hashing, the average number of table elements examined in a successful search is approximately:

Ave. # of Elements Examined During a Search(Main p. 561)

Hashing

Hashing

Presentation Transcript

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing, Hashing Tables

Hashing

Hashing

Hashing