300 likes | 559 Views
Network Applications of Bloom Filters: A Survey. Andrei Broder and Michael Mitzenmacher Presenter: Chen Qian Original presenter: Hongkun Yang. Outline. Bloom Filter Overview Standard Bloom Filters Counting Bloom Filters Historical Applications Network Applications
E N D
Network Applications of Bloom Filters: A Survey Andrei Broder and Michael Mitzenmacher Presenter: Chen Qian Original presenter: HongkunYang
Outline • Bloom Filter Overview • Standard Bloom Filters • Counting Bloom Filters • Historical Applications • Network Applications • Distributed Caching • P2P/Overlay Networks • Resource Routing • Conclusion
Standard Bloom Filters: Notations • S the set of n elements {x1, x2, …, xn} • k independent hash functions h1, …, hkwith range {1, …, m}. • Assume: hash functions map each item in the universe to a random number uniformly over the range {1, …, m} • MD5 • An array B of m bits, initially filled with 0s
Standard Bloom Filters: How It Works • Hash each xi in Sk times. If Hj(xi) = 1, set B[=1. • To check whether y is in S, check B at H_j(y), j = 1,2,…,k • If all k values are set to 1, y is assumed to be in S, • If not, yis clearly not in S. No False Negative Possible False Positive
Standard Bloom Filters: An Example 0 0 0 0 0 0 B INTIAL STATE
Standard Bloom Filters: An Example x1 x2 0 1 0 0 0 1 0 1 0 B INSERTION
Standard Bloom Filters: An Example y1 y2 1 0 0 1 0 1 B CHECK
Overview • Burton Bloom introduced it in 1970s • Randomized data structure • Representing a set to support membership queries • Dramatic space savings • Allow false positives
Bloom Filter Principle “Wherever a list or set is used, and space is at a premium, consider using a Bloom filter if the effect of false positives can be mitigated.” “Network Applications of Bloom Filters: A Survey”, A. Broder and M. Mitzenmacher
Standard Bloom Filters: False Positive Rate (1) • Pr[a given bit in B is 0]= • The probability of a false positive is • Let rbe the proportion of 0 bits after all elements are inserted in the Bloom filter • Conditioned on r, the probability of a false positive is
Standard Bloom Filters: False Positive Rate (2) • The fraction of 0 bits is extremely concentrated around its expectation • Therefore, with high probability,
Standard Bloom Filters: Optimal Number of Hash Functions (1) • Two competing forces: • More hash functions gives more chances to find a 0 bit for an element that is not a member of S • Fewer hash functions increases the fraction of 0 bits in the array
Standard Bloom Filters: Optimal Number of Hash Functions (2)
Standard Bloom Filters: Space Efficiency • A lower bound • Let e be the false positive ratio, then • The optimal case • The false posive rate for the optimal Bloom filter is • Let f>e
Standard Bloom Filters: Operations (1) • Union • Build a Bloom filter representing the union of A and B by taking the OR of BF(A) and BF(B) • Shrinking a Bloom filter • Halving the size by taking the OR of the first and the second half of the Bloom filter • Increase false positive rate • The intersection of two sets
Counting Bloom Filters: Motivation • Standard Bloom filters • Easy to insert elements • Cannot perform deletion operations • Counting Bloom filters • Each entry is not a single bit but a small counter • Insert an element: increment the corresponding counters • Delete an element: decrement the corresponding counters
Counting Bloom Filters: An Example 0 0 0 0 0 0 B INTIAL STATE
Counting Bloom Filters: An Example x1 x2 0 1 0 0 0 1 0 1 0 2 B INSERTION
Counting Bloom Filters: An Example x1 1 0 0 0 1 2 0 1 B DELETION
Historical Applications • Dictionaries • Hyphenation programs • UNIX spell-checkers • Dictionary of unsuitable passwords • Databases • Semi-join operations • Differential files
Distributed Caching: Summary Cache • Motivation • Sharing of caches among Web proxies to reduce Web traffic and alleviate network bottlenecks • Directly sharing lists of URLs has too much overhead • Solution • Use Bloom filters to reduce network traffic • Use a counting Bloom filter to track cache contents • Broadcast the corresponding standard Bloom filter to other proxies
P2P/Overlay Networks: Content Delivery • Problem • Peer A has a set of items SA, peer B has SB, B wants useful items from A (SA-SB) • Solution • B sends A its Bloom filter BF(B) • A sends B its items that is not in SB according to BF(B) • Implications of false positives • Not all elements in SA-SBwill be sent • A large fraction of SA-SBis sufficient (not necessarily the entire set)
P2P/Overlay Networks: Efficient P2P Keyword Searching (1) • Problem • Peer A has a set of items SA, peer B has SB, A wants to determine • Solution • A sends B its Bloom filter BF(A) • B sends A its items that appears to be in SAaccording to BF(A) • B eliminates false positives and determines exactly • Fewer bits transmitted than A sending the entire set SA
P2P/Overlay Networks: Efficient P2P Keyword Searching (2) ServerA ServerB (2) BF(A) 3 4 6 1 2 3 4 3 4 5 6 SA SB 3 4 (1) request Client
Resource Routing (1) • Network is in the form of a rooted tree • Nodes hold resources • Each node keeps Bloom filters representing • A unified list of resources that it holds or reachable through one of its children • Individual lists of resources for it and each child. • When receiving a request for a resource • Check the unified list to see whether the node or its descendants hold the resource • Yes: check the individual lists • No: forward the request up the tree toward the root
Conclusion • Simple space-efficient representation of a set or a list that can handle membership queries • Applications in numerous networking problem • Bloom filter principle