1 / 72

Chapter 7

Chapter 7. Skip Lists and Hashing Part 2: Hashing. Sorted Linear Lists. For formula-based implementation Insert: O(n) comps & data moves Delete: O(n) comps & data moves Search: O(log(n)) comps For chained implementation: Insert: O(n) comps Delete: O(n) comps Search: O(n) comps.

dacia
Download Presentation

Chapter 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7 Skip Lists and Hashing Part 2: Hashing

  2. Sorted Linear Lists • For formula-based implementation • Insert: O(n) comps & data moves • Delete: O(n) comps & data moves • Search: O(log(n)) comps • For chained implementation: • Insert: O(n) comps • Delete: O(n) comps • Search: O(n) comps

  3. Sorted Chain

  4. Dictionary • A dictionary is a collection of elements, each element has a field called key. • Key is unique for each element • Operations: • Insert an element with a specified key value • Search the dictionary for an element with a specified key value • delete an element with a specified key value • The access mode for elements in a dictionary is random access (or direct access) mode: i.e. any element may be retrieved by performing a search on its key.

  5. Dictionary

  6. Ideal hashing • Hash table: table used to store elements • Hash function: function to map keys to positions: k => f(k) • Search for an element with key k: if f(k) is not empty, found; otherwise, failed • Insert: f(k) must be empty • Delete: f(k) cannot be empty

  7. Example: Student record dictionary • Use student ID (6 digit number) as the key • ID range 951000 and 952000 • f(k) = k - 951000 • Table size: 1001 i.e. ht[0..1000] • ht[i].key = 0 indicates an empty entry

  8. Evaluation: Ideal Hashing • Initialize an empty dictionary: Θ(b) where b is the size of the table • Search, insert, and delete: Θ(1) • Property: 1 key <=> 1 position • Problem: the range of the keys may be very large resulting in large hash table, e.g. if the key is a 9 digit integer (ex SSN), the size of the table will be 109

  9. Hashing with linear open addressing • Used when the size of the hash table (D) is smaller than the key range • f(k) = k % D • Positions in hash table are indexed 0..D-1 • bucket - position in a hash table • If key values are not integral type, they need to be converted first. • two keys k1 and k2 map into the same bucket if f(k1) = f(k2)  collision • home bucket - position numbered f(k) is the home bucket for k • In general a bucket may contain space for more than one element. • An overflow occurs if there is not room in the home bucket for the new element. • If bucket has space for only one element, collision and overflow are the same.

  10. Collision, overflow and linear open addressing 80, 58, &35 map into home bucket ht(3). In case of collision, insert in next available bucket in sequence.

  11. Search • To search for an element with key k, begin at bucket f(k) and continue in successive bucket regarding the table as circular, until: • a bucket containing an element with k is found (successful) • an empty bucket is reached (unsuccessful) • return to the home bucket (unsuccessful)

  12. deletion • After deletion, must move successive elements until: • am empty bucket is reached • return to the bucket from which the deletion took place • To improve performance, use a NeverUsed field. May need reorganization when many buckets have their NeverUsed field set to false

  13. Class definition

  14. Constructor

  15. hSearch

  16. Search

  17. Insert

  18. Performance analysis • b - the number of buckets in the hush table, b = D • initialization - Θ(b) • worst-case insert and search - Θ(n), where n is the number of elements in the table • worst-case happens when all n keys have the same home bucket

  19. Performance analysis (continue) Average performance • Let α=n/b denote the loading factor • Un and Sn - average number of buckets examined during and unsuccessful and successful search, respectively, then

  20. Performance analysis (continue) • The performance of hashing with linear open addressing is superior: • when α=0.5 table is half full Un=2.5 and Sn=1.5 • when α=0.9 table is 90% full Un=50.5 and Sn=5.5

  21. Determining D • either a prime number or has no prime factors less than 20 • two methods: • begin with the largest possible value for b • Then find the largest D (<= b) that is either a prime or has no factors smaller than 20 • e.g., when b = 530, then D = 23*23 = 529

  22. Determining D Second method: • determine your accepted Un and Sn • Estimate n • determine α • determine smallest b for the above α • determine smallest integer D >= b that either prime or has no factor smaller than 20.

  23. Determining D • n = 1000 • S  4 and U  50.5 • S = 4 ==> α = 6/7 • U = 50.05 ==> α = 0.9 • α = min(6/7 , 0.9) = 6/7 • b = n/ α = 7000/6 = 1167 • note: 23*51 = 1173 • ==> select D = b = 1173

  24. Hashing with Chains

  25. Implementations

  26. An improved implementation

  27. Comparison with Linear Open Addressing • Space complexity • Let s be the space required by an element • Let b and n denote the number of buckets and number of elements, respectively • Linear open addressing: b(s+2) bytes (2 for an element of empty array) • chaining: 2b+2n+ns bytes • when n < bs/(s+2), chaining takes less space

  28. Search time complexity • Worst-case time complexity= noccurs when all elements map to same bucket (equal to that of linear open addressing) • Average • average length of a chain is α=n/b • average number of nodes examined in an unsuccessful search: * if chain has i nodes, it may take 1, 2, 3, …,I examinations. Assuming equal probability, on average search time =

  29. Search time complexity Ctnd • If α=0, Un=0 • If α<1, Un<= α • If α>=1,

  30. Average time complexity for successful search • Need to know the expected distance of each of the n elements from the head of its chain • Without losing generality, we assume elements are inserted into the chain in increasing order • When the ith element is inserted, the expected length of the chain is (i-1)/b; and the ith element is added into the end of the chain • A search for this element will require examination of 1+(i-1)/b nodes • Assuming n elements are searched for with equal probability, then

  31. Comparison with linear open addressing • The expected performance of chaining is superior, e.g., • when α=0.9 • Chaining: Un=0.9, Sn=1.45 • Linear open addressing: Un=50.5, Sn=5.5

  32. Skip Lists

  33. 20 24 30 40 80 75 60 20 24 30 40 80 75 60 A sorted chain with head and tail nodes Pointers to middle are added

  34. 20 30 40 80 60 24 75 Pointers to every second node

  35. Skip List Implementation

  36. An application • Text compression • compressor: file coding • run-length coding: 1000 xs + 2000 ys => 1000x2000y • space needed: 3002 bytes (2 bytes for string ends) => 12 bytes • decompressor: decoding • LZW Compression (Lempel, Ziv, and Welch)

More Related