220 likes | 436 Views
Appendix I Hashing. Chapter Scope. Hashing, conceptually Using hashes to solve problems Hash implementations. Hashing. In hashing elements are stored in a hash table at a location determined by applying a hash function to the value to be stored.
E N D
Appendix I Hashing
Chapter Scope • Hashing, conceptually • Using hashes to solve problems • Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing • In hashing elements are stored in a hash table at a location determined by applying a hash function to the value to be stored. • Elements are stored in a hash table, with their location determined by a hashing function • Each location is a cell or a bucket. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Idealistically.. • In an ideal world each value would be hashed to a unique address in a 1-to-1 fashion. • If this were the case, then the time to access/store data in a hash table would be O(1) • Factors to prevent this: • Less than perfect hash function • Limitations on the size of the address space Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Example • Consider an example where we create an array that will hold 26 elements • To store names, we create a simple hashing function that associates the first letter of each name to a separate cell • The first letter of the string determines into which cell it goes • The access time to a particular element is independent of the number of elements stored • All operations would be O(1) • But it requires each element mapping to a unique position • That's called a perfect hashing function Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Less than Perfect • A collision occurs when two or more elements map to the same location • two names that begin with the same letter • Collisions will have to be resolved somehow – a technique for storing multiple elements that map to the same bucket • Even if a hashing function isn't perfect, a good hashing function can still result in O(1) operations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hash Table Size • How large should the table be? • If we have a dataset of size n and a perfect hashing function, we'd need a table of size n • Without a perfect hashing function, a good guideline is to make the table 150% of the dataset size • If we do not know the size of the dataset, we can rely on dynamic resizing – creating a larger hash table and transferring the elements Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Dynamic Resizing • Deciding when to resize is key • One possibility: when the table is full • But performance of a hash table seriously degrades as it becomes full • A better approach is to use a load factor – a percentage of occupancy at which the table will be resized Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing Functions • Hashing function examples • There are many good approaches to hashing functions • The method used in the name example is extraction – part of an element's key value is used to compute the location Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing Function Examples • Extraction • Using only a part of the element’s value or key to compute the location at which to store the element. • Example on page 1007 • Extract the first character of the value and calculate it’s offset from the letter ‘A’ to determine its location. • ‘A’ maps to 0; ‘B’ maps to 1, etc. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing Function Examples • Another approach is called division – computing the location mathematically as : Hashcode(key) = Math.abs(key) % p • For some positive integer p, the result will be in the range 0 to p-1 • Using the remainder of the key divided by some positive integer p as the index for the element • Example: Hashcode(key) = Math.abs(key) % p • Yields 0 to p-1 location indices • Use the tablesize as p for a one-to-one mapping • Example: Key value = 79 and table size is 43, • Math.abs(79) % 43 yields 36 • It has been found that using a prime number p as the table size and the divisor helps provide a better distribution of keys Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing Function Examples • Folding • The key is divided into parts which are then combined to create the index • Divide the key into parts where each part is of the same length as the desired index except for perhaps the last part • Shift folding • The parts are added together to create the index • Key = 987-65-4321 • 987 + 654 + 321 => 1962 • Use extraction or division to yield a smaller index • Boundary folding • A slight variation of shift folding where some of the parts of the key are reversed before adding • Key = 987-65-4321 • 987 + 654 + 321 • 987 + 456 + 321 => 1764 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing Function Examples • Mid-Square Method • In the mid-square method, the key is multiplied by itself and then the extraction method is used (from the middle) • For example, if the key is 4321, multiplying it by itself yields 18671041 • Extract three digits from the middle: 710 • It's important that the same three digits be extracted each time • Recap: key = 4321 • 4321 * 4321 => 18671041 • Assume we need a 3 digit key • Extract 671 or 710 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing Function Examples • Radix Transformation method • Transform the key into another numeric base • If our key is 23 in base 10, we might convert it to 32 in base 7 • Then we use the division method and divide the converted key by the table size and use the remainder as the index • Example: key= 23 in base 10 • Convert to 32 in base 7 • Use division method to convert to index Hashcode(23) = Math.abs(32) % 17 => index of 15 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing Functions • In the digit analysis method, the index is formed by extracting and then manipulating specific digits from the key • If the key is 1234567, we might select the digits in positions 2 through 4 yielding 234 • The manipulation could then take many forms: • reversing the digits (432) • performing a circular shift (423) • swapping each pair of digits (324) Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing Functions • In the length-dependent method, the key and the length of the key are combined in some way to form either the index itself or an intermediate version • If our key is 8765, we might multiply the first two digits by the length and then divide by the last digit, yielding 69 • If our table size is 43, we would then use the division method to yield an index of 26 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Hashing Function Examples • Java.lang.Objecthashcode method • Returns an integer based on the memory location of the object • This is generally not useful, but ensures that all objects have a hashcodemethod • A class may override the inherited version of hashcodeto provide their own • The Stringand Integerclasses define their own hashcodemethods Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Resolving Collisions • As mentioned, without a perfect hashing function, collisions must be resolved • There are several techniques for this as well • Chaining • Treat the table as an array of linked lists • Open Addressing • linear probing • quadratic probing • double hashing Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Chaining with Links or Overflow Area Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Open Addressing • The open addressing method looks for another unused position in the table • The simplest approach is linear probing – if an element hashes to position p and that position is occupied, try position (p+1)%s where s is the size of the table • One problem with linear probing is the development of clusters of occupied cells • There are other approaches to open addressing • quadratic probing • double hashing Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
Java Collections Hash Tables • The Java Collections API provides seven implementations of hashing • Three of these are: • Hashtable – Key-Value Pairs, the oldest class, synchronized. • HashMap- Key-Value Pairs, unsynchronized, permits null values • HashSet –Values only which are unique, unsynchronized, permits null values • Note: The chaining method is used to resolve collisions. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase