60 likes | 178 Views
Introduction to Hash Tables. Remember! The array elements just hold references to the objects, not the objects themselves!. Arrays. Consider this array of Student records. Hyun. Chelsea. David. Abhinav. Fiona. Erik. Bing. Ina. Jim. Gheeta. [0]. [1]. [2]. [3]. [4]. [5]. [6].
E N D
Remember! The array elements just hold references to the objects, not the objects themselves! Arrays Consider this array of Student records Hyun Chelsea David Abhinav Fiona Erik Bing Ina Jim Gheeta [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Sequential access good Direct access bad I can easily loop through all the student records by using a for loop. But if I want to access Jim’s record only, I have to start at 0 and loop through the array until I find it. With a big array this could be rather inefficient. Is there a better way?
Hash tables Sequential access bad Direct access good The student records are stored in an array. The place in the array that a particular student is held is determined by the hashing function. Bing [0] [1] David Ina [2] Hashing Function Abhinav [3] Jim’s student ID no. Erik [4] “6” Hyun [5] Jim [6] [7] Fiona The hashing function takes some value, e.g. a name, or, as here, a student id number, and translates it into an array index. So if we want to find Jim’s record we just give his id number to the hashing function and it tells us where his record is located. [8] Gheeta Chelsea [9]
Collisions What happens if the hashing function gives the same array index for two different students? Bing [0] This happens and it is called a collision. There are a number of ways of dealing with collisions, the details of which you don’t need to know. But what you do need to know is that the performance of hash tables degrades over time because of multiple collisions. [1] David Ina [2] Abhinav [3] Erik [4] Hyun [5] Hashing Function Jim [6] Hiro’s student ID no. “6” [7] Fiona Collision! [8] Gheeta Chelsea [9]
Collisions [0] David [1] [2] Hashing Function Collision! [3] Hyun’s student ID no. Erik’s student ID no. David’s student ID no. [4] Erik “4” “1” “4” [5] Hyun Hyun goes into next available index [6] [7] Click to go through the animation [8] [9] If there had already been a lot of records in the array when the collision happened, Hyun may have been pushed a long way down the array. Later, when we try to access Hyun’s record, the hashing function still gives us 4 as the place to find him. But he’s not there! So we have to do a sequential search from index number 4, through the array to find him. This is the reason that hash table performance degrades over time.
The Hashing Algorithm The simplest way to translate the Student ID into an array index is to use the modulo operator (% in Java). The modulo operator returns the remainder of a division operation, for example 11 % 4 = 3. Question: If we have an array of 10 elements, what do we need to mod our Student IDs by to be sure of getting some value from 0 to 10? Answer: 11 Question: Let’s say we have an array of size N. Now what to we need to mod our Student IDs by? Answer: N+1