1 / 33

Bloom Filters

Bloom Filters. Benoit Donnet November 30th, 2006. 1. Context. Introduced in 1970 ([bloom]) Set membership problem Trade-off between space and computing complexity Lossy summary technique Historical usage Spell checking ([McIlroy]) Database ([Bratbergsengen]). Content. Bloom filters

kyria
Download Presentation

Bloom Filters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bloom Filters • Benoit Donnet • November 30th, 2006 1

  2. Context • Introduced in 1970 ([bloom]) • Set membership problem • Trade-off between space and computing complexity • Lossy summary technique • Historical usage • Spell checking ([McIlroy]) • Database ([Bratbergsengen])

  3. Content • Bloom filters • Extensions • Networking applications • Conclusion • References

  4. Bloom filters

  5. Construction

  6. Membership Query

  7. False Positive • A Bloom filter can suffer of false positives • The filter returns a positive answer for some elements that do not belong to A • Can we evaluate a priori the impact of false positives on a Bloom filter?

  8. False Positive (2)

  9. False Positive (3)

  10. False Positive (4)

  11. Extensions

  12. Content • Compressed Bloom filters ([mitzenmacher]) • Counting Bloom filters ([Fan et al.]) • Dynamic Bloom filters ([Guo et al.]) • Retouched Bloom filters ([Donnet et al.])

  13. Compressed BF • A Bloom filter can be used a message exchanged by networked monitors • New performance metric • Bandwidth • Transmission size can be affected by compression • Compressed Bloom filters ([mitzenmacher])

  14. Compressed BF (2) • Positive aspects: • Quantity of bit exchanged reduced • False positive rate reduced • Amount of computation per query reduced • Cost: • Internal memory increased • Compression/decompression process

  15. Compressed BF (3)

  16. Counting BF • The subset A is changing over time • Insertion • Deletion • How to perform deletion? • Couting Bloom filters ([Fan et al.])

  17. Counting BF (2)

  18. Counting BF (3) • Which size for the counter? • 4 bits per counter are OK for most of the applications • What happens in case of an overflow?

  19. Dynamic BF • Statement: • During the execution of the application, |A| can exceed its orignal size n • Consequence: • The false positive rate is not maintained anymore • Solution? • Dynamic Bloom Filters ([Guo et al.])

  20. Dynamic BF (2) • It uses a matrix of s Bloom filters • Each Bloom filter uses m bits and k hash functions • It starts with s equals to 1. • A new Bloom filter (i.e., a new row in the matrix) is created when needed.

  21. Dynamic BF (3) • How to insert an element? • Check for an active Bloom filter • If there is no active Bloom filter, create a new one • Add the element to the Bloom filter • How to query an element? • If all s Bloom filters return false, the element does not belong the DBF. • If, at least, one Bloom filter returns true, the element probably belongs to the DBF.

  22. Dynamic BF (4)

  23. Retouched BF • Statement: • Some false positives might be more troublesome than others • Some applications might tolerate a small level of false negatives • Question: • Can we trade-off the false positives against false negatives? • Solution? • Retouched Bloom filters ([Donnet et al. 06])

  24. Retouched BF (2)

  25. Retouched BF (3) • Quid if we randomly reset s bits in the vector? • Eliminates the same proportion of false positives as the proportion of false negatives generated • Randomized bit clearing. • The process of removing selected false positives is called selective clearing

  26. Retouched BF (4)

  27. Networking Applications

  28. Distributed Caching • Proxies cooperate to exchange cache information • Instead of sharing URLs list, proxies broadcast Bloom filters ([Fan et al.]) • A Bloom filter represents a proxy’s cache content

  29. Multicast • A router maintains, for each multicast address, a list of associated interfaces/connections • Replace the list by a Bloom filter ([Grönvall]) • Parallelization possible • Deletion of an address can be achieved with a counting Bloom filter

  30. Measurement • Topology discovery at the IP interface level • Traceroute monitors exchange information about what was previously discovered • Doubletree • This information shared can be encoded as a Bloom filter • Communication cost reduction ([Donnet et al. 05])

  31. Conclusion • A Bloom filter • solves the set membership problem • can generate false positives • Extensions to standard Bloom filter were presented • A few networking applications were discussed

  32. References • [Bloom]: Space/Time Trade-Offs in Hash Coding with Allowable Errors. In Communications of the ACM. vol. 13, n°7. • [McIlroy]: Development of a Spelling List. In Transactions on Communications. vol. 30, n° 1. • [Mitzenmacher]: Compressed Bloom Filters. In Transactions on Networking. vol. 10, n° 5. • [Fan et al.]: Summary Cache: a Scalable Wide-Area Web Cache Sharing Protocol. In Transactions on Networking. vol. 8, n°3. • [Guo et al.]: Theory and Network Applications of Dynamic Bloom Filters. In Proc. INFOCOM 2006.

  33. References (2) • [Bratbergsengen]: Hashing Methods and Relational Algebra Operations. In Proc. ICVLD 1984. • [Bruck et al.]: Weighted Bloom Filters. In Proc. ISIT 2006. • [Donnet et al. 06]: Retouched Bloom Filters: Allowing Networked Applications to Trade-Off Selected False Positives Against False Negatives. In Proc. CoNEXT 2006. • [Grönvall]: Scalable Multicast Forwarding. In Proc. ACM SIGCOMM 2001. Student Workshop. • [Donnet et al. 05]: Improved Algorithms for Network Topology Discovery. In Proc. PAM 2005.

More Related