200 likes | 399 Views
Tutorial 10 Hashing. Search Records. Traversal Search O(n) Binary Search O(log n) Can we do it in O(1) Hashing. Hashing. Put a record into one of many buckets in some way based on the key When searching a record by key, identify the bucket and search in the bucket Main concepts
E N D
Search Records • Traversal Search O(n) • Binary Search O(log n) • Can we do it in O(1) • Hashing
Hashing • Put a record into one of many buckets in some way based on the key • When searching a record by key, identify the bucket and search in the bucket • Main concepts • hash table, bucket, slot • hash function: maps keys into buckets
Hashing • For a key k, h(k) is the home address ( home bucket) • Two keys, k1 and k2, are said to be synonyms if h(k1)=h(k2) • Collision: home bucket for a new record to be inserted is occupied by a record with a different key already • Overflow: there is no space in the home bucket for the new record
kiwi 0 1 2 3 4 5 6 7 8 9 banana watermelon apple mango peach grapes strawberry Example • 10 bucket, each with 1 slot • h("apple") = 5,h("watermelon") = 3,h("grapes") = 8,h(“peach") = 7,h("kiwi") = 0,h("strawberry") = 9,h("mango") = 6, h("banana") = 2. • Insert h(“orange”)=7
Avoid collisions • T: the size of the key space • N: number of records • Key density: n/T • Loading density: a=n/(bs) • Method 1: use the space of T for support n records • Method 2: design a mechanism to handle overflows
Hash functions • Object • Fast and minimize the number of collisons • Perfect hash function • Uniform hash • Random: h(k)=i with probability 1/b • Division: h(k)=k%D • Mid-Square • Folding
Hash function: Division • h(k)=k % D • Choose of D • Should not be D=2^p or D=10^p • Use the lowest-order p bits of k • Should not be even number • All even keys go to even buckets and odd keys go to odd buckets • Should be a prime number, or at least an odd number
Hash function: mid-square • Squaring the key and use an appropriate number of bits from the middle of the square • Example • A hash table with b=2^r buckets
Hash function: folding • Partition a key k into several parts, each has the same length except the last one. • All these partitions are added together to obtain the home bucket for k • Schema • Shift folding • Folding at the boundaries • Example
Convert strings to integers • Add the ASCII code of each characters • h(“ABC”)=65+66+67=198 • Problem: any permutations has the same hash number • Key a is an array of characters of length n • h(a)= • h(“ABC”)==65478
Overflow handling – Linear Probing • Compute h(k) • Examine the hash table buckets in the order , for until one of the following happens • has a record whose key is ; is found. • is empty; is not in the table. • Return to ; the table is full.
Linear Probing • Let a hash table be with b=17 buckets • Let a hash function be h(k)=k%b • Consider inserting 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45. • Intends to have a cluster, a block of contiguously occupied slots • The bigger a cluster is, the more likely it will be even bigger when a new key is hashed into the cluster • The larger the cluster the slower the performance
Exercise • Consider a hash function h(k)=k% D, where D is not given. We want to figure out what value of D is being used. We wish to achieve this using as few attempts as possible, where an attempt consists of supplying the function with k and observing h(k). Indicate how this may be achieved in the following two cases. D is known to be a prime number in the range [10,20].
Solution If D is a prime number in the range of [10, 20], then D must be 11, 13, 17, or 19. We can test the hash function with each of these.
Exercise • Given several records {2341, 4234, 2839, 430, 33, 397, 3920} a hash table of size 7, and a hash function h(x)=x % 7, show the resulting tables after inserting all records with linear probing
Solution • 2341 % 7 = 3 4234 % 7 = 6 2839 % 7 = 4 430 % 7 = 3 • 22 % 7 = 1 397 % 7 = 5 3920 % 7 = 0
Exercise • Suppose you could steal a system file with user names and hashed passwords and suppose you knew the hash function used for the passwords. Would this give you access to user accounts on the system? • Suppose you know someone’s login name and you know a password that is different from their password, but this other password has the same hash value as their password. Does that allow you to log in to their account?
Solution • It wouldn’t give you direct access. Even though you know the user names, you don’t know the passwords. You only know the hashed passwords. When you log in, you don’t enter the hash of the password. You have to enter the password itself and you don’t know it • Yes. The system doesn’t know their true password. It only knows the hash value of their password. So if you enter your password and it hashes to the same hash value, then the system cannot tell the difference. The hashed password matches the one in the “password file” and it assumes you have the correct password. It lets you in.