170 likes | 306 Views
Hashing. Suppose we want to search for a data item in a huge data record tables How long will it take? It depends on the data structure (unsorted) linked list? (O(N)) (balanced) binary search tree (e.g., AVL tree)? (O(log 2 N)) Unsorted array (O(N))
E N D
Hashing • Suppose we want to search for a data item in a huge data record tables • How long will it take? • It depends on the data structure • (unsorted) linked list? (O(N)) • (balanced) binary search tree (e.g., AVL tree)? (O(log2N)) • Unsorted array (O(N)) • Sorted array, binary search (O(log2N)) • Can we do better than O(log n) (e.g., O(1))? • Yes, we can use “hash” functions
Hashing Basic Idea • Use hash function to map keys into positions in a hash table Ideally • If element e has key k and h is hash function, then e is stored in position h(k) of table • To search for e, compute h(k) to locate position. If no element, the table does not contain e.
Hashing (cont’d) • The general idea behind hashing is to directly map each data item into a position in a table (e.g., an array) using some function h • Complex data items must have a “key” in order to perform the hashing. Then, position = h(key) Data Name: Mike Height: .. Address: … h(“Mike”) =1 • With perfect hashing, every key is mapped to a different position. What is the implication of this in terms of T, the size of the array?
Some Hash Function Example • Division • Folding
Hash Function: Division • Hash Functions: Division • If the keys are integers then key%T is generally a good hash function
Example • The hash function was h(key) = key%10. • Inserting entries 20, 25, 34, 76, 59, and 81
Hash Division • What happens if more than one key hash to the same position? • In general, to avoid situations like this, table size T should be a prime number. • Collision methods, will be discussed later
Hash Function: Folding • A key is divided into several parts, and use a simple operation to combine them in a certain way.
Hash Function: Folding • If the keys are strings the hash function is some function of the characters in the strings. • One possibility is to simply add the ASCII values of the characters: h(str) = (sum(str(i)))%T • Another possibility is to convert the string into some number in some arbitrary base b (b also might be a prime number): Example h(ABC)=(65b0+66b1+67b2)%T
Folding Example int h(String x, int T) { int i, sum; for (sum=0, i=0; i<x.length(); i++) sum+= (int)x.charAt(i); return (sum%T); } • sums the ASCII values of the letters in the string • ASCII value for “A” =65; sum will be in range 650-900 for 10 upper-case letters; good when T around 100, for example • order of chars in string has no effect
Collision • What happens if more than one key hash to the same position? • The problem is called a “collision” • Solution #1: Open Addressing • Solution #2: “separate chaining” collision resolution approach, all data items that hash to the same position are kept in a linked list. So, to find the item that we are looking for, we have to search the linked list • Hash functions should try to achieve uniform coverage of the hash table, while minimizing collisions
Handling Collision • “Open addressing” resolves collisions by trying alternative slots in the hash table, until an empty cell is found. In general we try cells in the following order H1(key) = (h(key) + f(i))%T f(0) = 0
Example: Handling Collision-Open Addressing T =10, after 0, 1, 4, 19, 16, 25, 36, 59, 64 and 81 have been inserted using open addressing. The hash function h(key) = key %10 and f(i) =i
Handling Collision: Chaining • With “separate chaining” all data items that hash to the same location are kept in a linked list in that location. • So, each entry in the hash table can be a linked list of arbitrary length. • To find a piece of data we use the hash function to find the correct list, and then we search the list.
Chaining Example T =10, after 0, 1, 4, 19, 16, 25, 36, 59, 64 and 81 have been inserted
Store colliding elements in the same position in the table • When a bucket is full, the open addressing can be used.