220 likes | 400 Views
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese. Presented by: Sailesh Kumar. Bloom Filter. Store a set S = { x 1 , x 2 , x 3 ,… x n } on some universe U , so that we are able to answer queries of the form:
E N D
An Improved Constructionfor Counting Bloom FiltersFlavio BonomiMichael MitzenmacherRina PanigrahySushil SinghGeorge Varghese Presented by: Sailesh Kumar
Bloom Filter • Store a set S = {x1,x2,x3,…xn} on some universe U, so that we are able to answer queries of the form: • Is x a member of S • Bloom Filter is a technique that can answer this • Small amount of space independent of element size • Constant query time • False positive probability (some probability of a wrong answer) • Alternative to hashing with some interesting trade-offs
1 1 H1 1 X H2 H3 H4 Hk 1 1 Bloom Filter m-bit Array Bloom Filter
1 1 1 H1 1 Y H2 H3 1 H4 Hk 1 1 1 Bloom Filter m-bit Array
1 1 1 H1 1 X H2 H3 1 H4 Hk 1 1 1 Bloom Filter match m-bit Array
1 1 1 H1 1 W H2 H3 1 H4 Hk 1 1 1 Bloom Filter Match (false positive) m-bit Array
How many Hash Functions? • k = no. of hash functions • n = Total no. of elements • m = no. of bits in the array • Objective is to pick k so that we minimize the false positive prob. • It is fairly simple to derive that k = (ln 2)m/n • For opt. k, fpp is approx. (0.6185)m/n
How many Hash Functions? m/n = 8 Opt k = 8 ln 2 = 5.5
Counting Bloom Filter • Bloom filters do not support deletes • Use counting Bloom filter • Use counters instead of bits in the array • Instead of setting the bits, increment the counters • During query, if (counter > 0) implies the bit is set
1 1 H1 1 X H2 H3 H4 Hk 1 1 Counting Bloom Filter m-counter Array Bloom Filter
1 2 1 H1 1 2 Y H2 H3 1 H4 Hk 1 Bloom Filter 1 1 Deletes are straightforward: Just decrement the counters 1 m-counter Array
Improved Counting Bloom Filter • 4-bit counters ensures wvhp that counters do not overflow • 4x increase in space compared to Bloom filter • Construct an alternative Bloom filter that is 2 times compact than CBF • Based upon d-left hashing and fingerprinting technique • We need to understand d-left hashing and fingerprinting
Fingerprinting • Temporarily assume that we have a perfect hash function h • Use some random function to compute c-bit fingerprints • F() : U -> [2c] • False positive prob. = 1/2c • 2x compact than Bloom filter • Not easy to compute the perfect hash function h • Use near perfect hashing (d-left) Element 1 Element 2 Element 3 Element 4 Element 5 h Fingerprint(4) Fingerprint(5) Fingerprint(2) Fingerprint(1) Fingerprint(3)
d-left hashing • Use d equal sized tables • Use d different hash functions and chose bucket from each table • A bucket can store multiple elements • Store the element into least loaded bucket (break tie to left) • Interesting properties: • Very small maximum load O(log log n) • Maximum load is close to average load even for small d such as 4 • 80% space utilization with d=4
Improved Counting Bloom Filter • Use d-left hashing • d hash tables each containing B buckets • Note that a bucket contains multiple cells; a cell can store a fingerprint and a small counter • In order to store an element, we compute its fingerprint • Fingerprint consists of two components • Bucket index – [1, B] • Remainder – [1, R], thus log2R bits, stored explicitly • We use separate bucket index for each table but identical remainders • Use d-left insertion policy; augment fingerprint with counters; if fingerprint matches, then increment the counter
Improved Counting Bloom Filter 5 7 7 Element x H(x) = (3, 7), (4, 7) : we store element in first table Element y H(y) = (1, 5), (5, 5) : we store element in first table Element z H(z) = (1, 7), (4, 7) : we store element in second table Now, if we try to delete x, we do not know whether fingerprint in table 1 or table 2 has to be removed
Improved Counting Bloom Filter • Solve the problem by breaking the hash operating into 2 phases • 1st phase: compute a single true fingerprint • 2nd phase: to obtain d locations, use permutations P1, … Pd • A permutation of a set is a one-to-one map of the set onto itself • This simple modification enables proper delete operations
Improved Counting Bloom Filter • Claim. When deleting an element in the set, only one remainder corresponding to the element will exist in the table. • Proof: • Suppose not. Then there is some element x ∈ S whose remainder is stored in table j to be deleted and at the same time another element y ∈ S such that Pi(fx) = Pi(fy) for i = j. • Since the Pi are permutations, we must have that fx= fy, so x and y share the same true fingerprint. • Let x was inserted before y; in this case, when y is inserted, the counter in table j associated with the remainder of x would be incremented, contradicting our assumption.
Simulation Results • Target is fpp < 0.002 • dlCBF configuration • d = 4 tables with 2048 buckets each • Each bucket has 8 cells • Target load = 0.75 (6 items per bucket) • 14-bit fingerprint, r. • 2-bit counter to handle identical fingerprints • Total size of structure = 220 bits. Total items = 3x214 • CBF configuration • 13.5 counters per element (9 hash function) • For 3x214 elements, we will need 2.5x220 bits, 2.5 times dlCBF