1 / 20

Lecture 9

Lecture 9. Hashing. Topics. Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables. Introduction. A hash table is a generalization of the simpler notion of an ordinary array

loring
Download Presentation

Lecture 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 9 • Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables Data Structure and Algorithm

  2. Introduction • A hash table is a generalization of the simpler notion of an ordinary array • Searching for an element in a hash table can take as long as searching for an element in an array/linked list i.e. O(N) time in the worst case. • But under reasonable assumptions, hashing takes O(1) time to search an element in a hash table. Data Structure and Algorithm

  3. Dictionary/Table Keys Given a student ID find the record (entry) Data Structure and Algorithm

  4. Direct Addressing • Direct Addressing is a simple technique that works well when the universe U of keys is reasonably small. • Let Universe U= (0,1,…….m-1} where m is not too large • We may assume that no two elements have the same key. • To represent the dynamic set, we use an array or direct address table T[0..m-1] in which each position or slot correspond to a key in the universe U Data Structure and Algorithm

  5. Direct Addressing T data key 0 1 U(universe of keys) 2 2 0 3 3 7 4 9 4 2 3 1 5 5 5 k4 8 k5 6 7 6 8 8 actualkeys 9 Slot k points to an element in the set with key k If the set contains no element with key k, then T[k]=nil Data Structure and Algorithm

  6. Direct-address Table 0 / entry 1 1 2 / 3 / 7 4 / 1 5 5 5 6 / 7 7 • Direct addressing is a simple technique that works well when the universe of keys is small. Assuming each key corresponds to a unique slot. Direct-Address-Search(T,k) return T[k] Direct-Address-Insert(T,x) return T[key[x]]  x Direct-Address-Delete(T,x) return T[key[x]]  Nil O(1) time for all operations Data Structure and Algorithm

  7. The Problem With Direct Addressing • If the universe U is large, storing a table T of size |U| may be impractical or even impossible. • Furthermore, the set K of keys actually stored may be so small relative to U that most of the space allocated for T would be wasted. • Solution: map keys to smaller range 0..m-1 • This mapping is called a hash function Data Structure and Algorithm

  8. Hash function T 0 U(universe of keys) h(k1) k1 h(k4) k4 K(actualkeys) k5 h(k2) = h(k5) k2 h(k3) k3 m - 1 • Hash function h maps the universe U of keys into slots of a hash table T [0..m-1]: h : U {0,1,….m-1} • But two keys may hash to the same slot – a collision Data Structure and Algorithm

  9. Next Problem • But two keys may hash to the same slot – acollision T 0 U(universe of keys) h(k1) k1 h(k4) k4 K(actualkeys) k5 h(k2) = h(k5) k2 h(k3) k3 m - 1 Data Structure and Algorithm

  10. Resolving Collisions • How can we solve the problem of collisions? • Solution 1: chaining • Solution 2: open addressing Data Structure and Algorithm

  11. Chaining k5 k2 —— k6 —— • Chaining puts elements that hash to the same slot in a linked list: T —— U(universe of keys) k4 k1 —— —— k1 —— k4 K(actualkeys) k5 —— k7 k7 —— k3 k2 k3 —— k8 k6 k8 —— Data Structure and Algorithm

  12. Chaining (insert at the head) T —— U(universe of keys) k1 —— —— k1 —— k4 K(actualkeys) k5 —— k7 —— —— k3 k2 k8 —— k6 —— —— Data Structure and Algorithm

  13. Chaining (insert at the head) T —— U(universe of keys) k1 —— —— k1 —— k4 K(actualkeys) k5 —— k7 k2 —— —— k3 k2 k3 —— k8 k6 —— —— Data Structure and Algorithm

  14. Chaining (insert at the head) k4 k1 —— T —— U(universe of keys) k1 —— —— k1 —— k4 K(actualkeys) k5 —— k7 k2 —— —— k3 k2 k3 —— k8 k6 —— —— Data Structure and Algorithm

  15. Chaining (insert at the head) k4 k1 —— k5 k2 —— k6 —— T —— U(universe of keys) k1 —— —— k1 —— k4 K(actualkeys) k5 —— k7 k2 —— —— k3 k2 k3 —— k8 k6 —— Data Structure and Algorithm

  16. Chaining (Insert to the head) k5 k2 —— k6 —— T —— U(universe of keys) k4 k1 —— —— k1 —— k4 K(actualkeys) k5 —— k7 k7 —— k3 k2 k3 —— k8 k6 k8 —— Data Structure and Algorithm

  17. Operations Direct-Hash-Search(T,k) Search for an element with key k in list T[h(k)] (running time is proportional to length of the list) Direct-Hash-Insert(T,x)(worst case O(1)) Insert x at the head of the list T[h(key[x])] Direct-Hash-Delete(T,x) Delete x from the list T[h(key[x])] (same as searching) Data Structure and Algorithm

  18. Open Addressing • Basic idea (details in Section 12.4): • To insert: if slot is full, try another slot, …, until an open slot is found (probing) • To search, follow same sequence of probes as would be used when inserting the element • If reach element with correct key, return it • If reach a NULL pointer, element is not in table • Good for fixed sets (adding but no deletion) • Table needn’t be much bigger than n Data Structure and Algorithm

  19. Choosing A Hash Function • Choosing the hash function well is crucial • Bad hash function puts all elements in same slot • A good hash function: • Should distribute keys uniformly into slots • Should not depend on patterns in the data • Three popular methods: • Division method • Multiplication method • Universal hashing Data Structure and Algorithm

  20. The Division Method • h(k) = k mod m • hash k into a table with m slots using the slot given by the remainder of k divided by m • Elements with adjacent keys hashed to different slots: good • If keys bear relation to m: bad • In Practice: pick table size m = prime number not too close to a power of 2 (or 10) Data Structure and Algorithm

More Related