1 / 29

ΜΑΘΗΜΑ 10 ο

ΜΑΘΗΜΑ 10 ο. Πίνακες Κατακερματισμού Υλικό εκτός εξετάσιμης ύλης για την χρονιά 2007-2008. Hashing Lecture 10. hashing methods and their implementation hash tables scatter tables.

shamus
Download Presentation

ΜΑΘΗΜΑ 10 ο

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ΜΑΘΗΜΑ 10ο Πίνακες Κατακερματισμού Υλικό εκτός εξετάσιμης ύλης για την χρονιά 2007-2008.

  2. HashingLecture 10 • hashing methods and their implementation • hash tables • scatter tables Το υλικό για αυτή τη διάλεξη είναι από το βιβλίο «Data Structures and Algorithms with Object Oriented Design Patterns in Java”, Bruno R. Preiss, John Willey and Sons.

  3. Rationale for Hashing • many applications: information store & retrieval • consider containers storing {key, value} pairs • hashing-based containers suitable for applications with • frequently executed basic operations [find(key), insert] • items are not required to be ordered • main advantage: find, insert are O(1) (average case)

  4. Hashing Methods Section 8.1-3 • basic idea • hash functions • characteristics • common methods • dealing with arbitrary keys

  5. Array-Based Implementation of the Container • for clarity of presentation, consider the case when the container contains keys only • an array will hold some number of items of a given set of keys K • the position of a key in the array is given by a hashfunctionh(·) • in general |K|is large or even unbounded • the actual number of items stored in the container is typically much less than |K|  use an array of size M

  6. Hash Functions • we need a function

  7. Characteristics of a Good Hash Function • good hash function • avoids collisions • spreads keys evenly in the array • its computation is fast

  8. Hashing Methods • division method: h(x) = x mod M • middle-square method W=2word length • multiplication method • Fibonacci hashing

  9. Implementing the Division Method public class DivisionMethod { static final int M = 1031; // a prime public static int h (int x) { return Math.abs (x) % M; } }

  10. Implementing the Multiplication & Fibonacci Methods public class MultiplicationMethod { static final int k = 10; // M==1024 static final int w = 32; static final int a = (int) 2654435769L; public static int h(int x) { return (x * a) >>> (w - k); } }

  11. Dealing With Arbitrary Keys illustration • What if keys are not integers? • function f maps keys into non-negative integers: • function g maps non-negative integers into • hash function h is simply h = f g

  12. Hashing of Strings • view a character string, s, as a sequence of n characters, {s0, s1, . . . , sn-1} • simple (but very poor) hash function is • e.g., for all possible a ASCII strings of length five 0 ≤ f(s) ≤ 640

  13. Better String Hash Function • suppose we known a priori that all strings have length four • let B = 256 • we can compute the 32-bit quantity like this f(s) = s0B3 +s1B2 +s2B +s0 • this function spreads out the values better

  14. Dealing with Arbitrary Length Strings • generalize the preceding approach: • computes a unique integer for every possible string • unfortunately the range of f(s) is unbounded • a simple modification bounds the range:

  15. More Elaborate Hash Function

  16. Hash Tables Section 8.4 • hash tables • main difference: dealing with collisions • separate chaining hash tables [illustration] • running time analysis • worst case, average case • scatter tables [illustration] • chained, open addressing • running time analysis

  17. Hash Tables • a hash table is a searchable container that implements the HashTable interface • the AbstractHashTable class is an abstract class from which various implementations are derived – method getLength (returns M) – methods f, g, and h (h = f g) • the implementation shown uses the division method of hashing

  18. Separate Chaining Hash Table • uses separate chaining to resolve collisions • hash table is implemented as an array of linked lists • linked list into which an item is to b inserted is determined by hashing that item • item is inserted into the linked list by appending it

  19. Example: Keys and their Hash Values Prog 8.3(octal)

  20. ChainedHashTable Running Time (Worst Case) constructor O(M) purge O(M) insert T <hashCode> + O(1) withdraw T <hashCode> + O(n) find O(1) +T<hashCode> +nT <isEQ> +O(n)

  21. Average Case Analysis • consider we have a hash table of size M • let there be exactly n items in the hash table • the quantity λ = n/M is called the load factor • the load factor is the average length of a linked list!

  22. Average Running Times • unsuccessful search O(1)+T<hashCode>+λT <isEQ>+O(λ) • successful search O(1)+T<hashCode>+((λ+1)/2)T<isEQ>+O(λ) • if one could guarantee λ ≤ 1, thenT <hashCode>= O(1) and T <isEQ>= O(1), consequently all operations would be O(1) [average] • to guarantee this, must resize

More Related