140 likes | 437 Views
Modified. Appendix E-A Hashing. Chapter Scope. Concept of hashing Hashing functions Collision handling Open addressing Buckets Chaining Deletions Performance. What is hashing?.
E N D
Modified Appendix E-A Hashing
Chapter Scope • Concept of hashing • Hashing functions • Collision handling • Open addressing • Buckets • Chaining • Deletions • Performance Java Software Structures, 4th Edition, Lewis/Chase
What is hashing? • Hashing is a scheme for storing and retrieving information by (key) value. Sometimes used to implement associative memory. • A hash function is used to map a value to a location; The value (and associated info) may be stored at that location or at least accessed via that location. • Very efficient for storing and retrieving • Used extensively in computing – software and hardware. Java Software Structures, 4th Edition, Lewis/Chase
Collisions • Ideally, the value being mapped would be stored at the mapped location in the location space, and this could be true for a perfect hashing function. • However, in most situations, multiple values will/may map to the same location (collisions) • So we have to have to have a strategy to handle collisions. • There are several popular collision handling strategies. Java Software Structures, 4th Edition, Lewis/Chase
Hash function • A hash function is a mapping from a value space to a location space. • The value space is any domain of values. Strings, ints, phone numbers, student IDs, … • The location space is normally a sequence of integers from 0 to N-1, where N is the size of the location space. The location space resembles a 1-dimentional array (like computer memory). Location space Value space
Characteristics of a good hash function • It should cover the entire location space • It should distribute the key values fairly evenly into the location space • Generally, 2 values that are “close together” in the value space should not be close together in the location space. Aside: Cryptographic hashing Java Software Structures, 4th Edition, Lewis/Chase
Division (remainder) hash function • Probably most commonly used method, either by itself or combined with another method. • If the location space is of size N, divide the value (somehow represented as an integer) by N and take the remainder as the result. • Choosing N to be a prime number improves the likelihood of the mapping distributing the values fairly evenly. Java Software Structures, 4th Edition, Lewis/Chase
Representing a value as an integer • We know that all information stored in computer storage is a string of bits. • Any string of bits can be interpreted as a binary integer. • So how to we make that interpretation? • Modern languages try to prevent us from changing out interpretation of a string of bits – strong typing. Java Software Structures, 4th Edition, Lewis/Chase
Char to int Java does give us a loophole for character data System.out.println( (int) "ABC".charAt(0)); displays 65 The data type char is an “integral” type, and can be automatically converted to an int or long. Int I = ‘B’; // sets I to 66; Java Software Structures, 4th Edition, Lewis/Chase
Also, can use bitwise and bit shift ops. • ~, &, |, ^, <<, >>, >>> inti = 'B'; System.out.println( i); //displays 66 System.out.println( (int) "ABC".charAt(0)); //displays 65 System.out.println( 'A' & 'B' ); //displays 64 System.out.println( 'A' | 'B' ); //displays 67 System.out.println( 'A' ^ 'B' ); //displays 3 System.out.println( ~'A' ); //displays -66 System.out.println( 'A' << 2 ); //displays 260 System.out.println( 'A' >>> 1 ); //displays 32 Java Software Structures, 4th Edition, Lewis/Chase
Folding • Divide the value into parts and then combine them. • Example: value is 234-56-9876 234 965 + 876 --------------- 2075 % N Java Software Structures, 4th Edition, Lewis/Chase
Other hash functions • Mid square -- Square the value, as a number, and take a portion out of the middle of that product. • Extraction involves using only a part of an element’s value or key to compute the location at which to store the element • Length dependent – use a portion of the value, then combine with the length of the value. Java Software Structures, 4th Edition, Lewis/Chase
Hashing Functions - Digit Analysis • In the digit analysis method, the index is formed by extracting, and then manipulating specific digits from the key • For example, if our key is 1234567, we might select the digits in positions 2 through 4 yielding 234 • The manipulation can then take many forms • Reversing the digits (432) • Performing a circular shift to the right (423) • Performing a circular shift to the left (342) • Swapping each pair of digits (324) • Alternately, these manipulations could be done on the bits Java Software Structures, 4th Edition, Lewis/Chase