180 likes | 266 Views
Look-up problem. IP address. did we see the IP address before?. Hashing + chaining. use IP address as an index. linked list. index. hash function. IP address. How to choose a hash function?. depends on the distribution of the data. x [0,1]
E N D
Look-up problem IP address did we see the IP address before?
Hashing + chaining use IP address as an index linked list index hash function IP address
How to choose a hash function? depends on the distribution of the data x [0,1] x x.n interpret x as a number x x mod n IP-addresses n=256 bad n=257 good?
Universal hash functions choose a hash function “randomly” n = number of entries in the hash table U = the universe h: U {0,...,n-1} a hash function
Universal hash functions choose a hash function “randomly” n = number of entries in the hash table U = the universe h: U {0,...,n-1} a hash function a set of hash functions H is universal if x,y U and random h H P ( h(x) = h(y) ) 1/n
Universal hash functions a set of hash functions H is universal if x,y U and random h H P ( h(x) = h(y) ) 1/n For IP addresses choose a1,a2,a3,a4 {0,1,...,256} (x1,x2,x3,x4) a1x1+a2x2+a3x3+a4x4 mod 257
Perfect hashing Goal: worst-case O(1) search space used O(m) static set of elements
Perfect hashing Goal: worst-case O(1) search space used O(m) static set of elements n = m2 i.e., space used (m2) H = family of universal hash functions hash function h H with no collision
Perfect hashing Goal: worst-case O(1) search space used O(m) n = m H = family of universal hash functions x1,...,xn the number of elements that map to 1,2,...,n h H such that xi2 = O(m)
h H such that xi2 = O(m) Perfect hashing Goal: worst-case O(1) search space used O(m) n = m H = family of universal hash functions x1,...,xn the number of elements that map to 1,2,...,n secondary hash table of size xi2
Bloom filter Goal: store an m element subset of IP addresses IP address HASH HASH HASH 0 0 0 n-bits of storage
Bloom filter - insert INSERT(x) for i from 1 to k do A(hi(x)) 1 IP address HASH HASH HASH 1 1 1 n-bits of storage
Bloom filter – member MEMBER(x) for i from 1 to k do if A(hi(x))=0 then return FALSE return TRUE IP address HASH HASH HASH 1 1 1 n-bits of storage
Bloom filter – member MEMBER(x) for i from 1 to k do if A(hi(x))=0 then return FALSE return TRUE sometimes gives false positive answer error parameter: false positive probability
Bloom filter – analysis error parameter: false positive probability m = number of items to be stored n = number of bits of storage k = number of hash functions
Bloom filter – analysis error parameter: false positive probability m = number of items to be stored n = number of bits of storage k = number of hash functions p = fraction of the bits filled p e-km/n
Bloom filter – analysis error parameter: false positive probability m = number of items to be stored n = number of bits of storage k = number of hash functions p e-km/n p = fraction of the bits filled false positive probability (1-p)k
Bloom filter – analysis error parameter: false positive probability m = number of items to be stored n = number of bits of storage k = number of hash functions optimal k 0.7 m/n false positive rate 0.6185m/n