200 likes | 537 Views
CS5539 Data Structures and Algorithms. Lecture 19 Hashing. Reading. Watt and Brown: Chapter 12. Time Complexities. Operation Key-indexed Parallel Single BST BST array sorted Linked well ill array List balanced get O(1) O(log n) O(n) O(log n) O(n)
E N D
CS5539 Data Structures and Algorithms Lecture 19 Hashing
Reading Watt and Brown: Chapter 12
Time Complexities Operation Key-indexed Parallel Single BST BST array sorted Linked well ill array List balanced get O(1) O(log n) O(n) O(log n) O(n) remove O(1) O(n) O(n) O(log n) O(n) put O(1) O(n) O(n) O(log n) O(n)
Perceived problems with key-indexed array • Potential size – much memory • Keys may be strings: cannot be used as index • Conversion of strings to integers using ASCII character codes • Large numbers result • Map large numbers to small numbers
Implementation Aim: obtain time complexity O(1) without restriction on key type Hashing • Gives superior performance • O(1) performance for the following operations: • get() • remove() • put()
Hashing • The key field is changed into a small integer by the application of a function to the key • Hash function: the function used to transform the key into a small integer • Hash value: the derived small integer used as index
Hash Table • Hash table: one-dimensional structure consisting of indexed buckets where values stored according to index determined by hash value. • Every element of hash table should be initialised to “empty”
Calculating a Hash Table index from an Key 1. Create integer hash code Derived from the value of the key Ideally unique hash code for each key 2. Map hash code on to the index range 0..Size-1 of the table Typically uses modulo arithmetic hashcode(key) % size index = hashcode(key) % size
Hashtable [0] Hashtable [1] Hashtable [2] . . . . . . Hashtable [n-2] Hashtable [n-1] v Represents an empty bucket v Graphical Representation of Hash Table
Hashtable [0] Hashtable [1] Hashtable [2] . . . . . . Hashtable [n-2] Hashtable [n-1] v Hashing to a Bucket keyvalue3 Hashfunction(keyvalue1) 2 Hashfunction(keyvalue2) n-1 keyvalue1 Hashfunction(keyvalue3) 0 Hashfunction(keyvalue4) n-2 keyvalue4 keyvalue2
cat Bucket 2 cougar coyote horse Bucket 7 hippopotamus Simple Example 1 Use alphabet position of first letter of word. (Start at 0) Hash Table has 26 buckets Cat Dog Elephant Frog Grasshopper Hippopotamus Horse Cougar Coyote Zebra
Simple Example 2 A hash function adding up the values of the characters in the key - letters are given a value using their position in the alphabetintegers are given their integer value. Taking an table of size 10 the code S101 is converted as follows: S = 19 1 = 1 0 = 0 1 = 1 TOTAL = 21 21 modulus 10 = 1 element should be placed at bucket 1
collision Find the Bucket Location for each of the Following S= 19 S101 bananas S123 potatoes S592 tomatoes S199 plums S102 apples S213 pears S541 peaches bucket 1 bucket 5 bucket 5 bucket 8 bucket 2 bucket 5 bucket 9 Problem: several keys hash to the same location.
A Hash Function Hash(key) = (2 * int(key) modulus 10) Cat Dog Elephant Frog (2* (3+1+20)) % 10 = 48 % 10 = 8 (2*(4+15+7)) % 10 = 52 % 10 = 2 (2*(5+12+5+16+8+1+14+20) )% 10 = 162 % 10 = 2 (2*(6+18+15+7)) % 10 = 92 % 10 = 2 Any problems with this function ?
Hash Function • Perfect hash function where each distinct key produces a different value: very rare • Collision: occurs when two keys hash to the same location • Collisions unavoidable: • Number of keys > size of hash table • Collision avoidance: choose hashing function which will place keys uniformly over rows of the hash table
Collision Avoidance:Multiple Congruency Method • Key changed to integer value • Multiply this by a large prime number • Divide the result obtained at 2 by the size of the hash table int(key) primeNumber * int(key) (primeNumber * int(key)) % TableSize
Hashing • A Hash Function must • Be Simple to compute • Distribute keys as equally as possible • Too many collisions degradation in performance • Result may be 0 so hash tables are indexed from 0
Open Bucket Hash Table Open-bucket hash table: where a bucket is a storage location for a single data element. The result of transforming a key will give the home bucket.
Closed Bucket Hash Table: Chaining Closed-bucket hash table: where a bucket is a storage location for a collection of data elements
Summary • Hashing is an efficient technique • Care must be taken in choosing a hash function so that elements are as easily spread throughout the hash table • Collisions are inevitable • A Strategy must be developed to avoid problems with collisions