Bloom Filters. Benoit Donnet November 30th, 2006. 1. Context. Introduced in 1970 ([bloom]) Set membership problem Trade-off between space and computing complexity Lossy summary technique Historical usage Spell checking ([McIlroy]) Database ([Bratbergsengen]). Content. Bloom filters

  2. Context • Introduced in 1970 ([bloom]) • Set membership problem • Trade-off between space and computing complexity • Lossy summary technique • Historical usage • Spell checking ([McIlroy]) • Database ([Bratbergsengen])

  3. Content • Bloom filters • Extensions • Networking applications • Conclusion • References

  4. Bloom filters

  5. Construction

  6. Membership Query

  7. False Positive • A Bloom filter can suffer of false positives • The filter returns a positive answer for some elements that do not belong to A • Can we evaluate a priori the impact of false positives on a Bloom filter?

  8. False Positive (2)

  9. False Positive (3)

  10. False Positive (4)

  11. Extensions

  12. Content • Compressed Bloom filters ([mitzenmacher]) • Counting Bloom filters ([Fan et al.]) • Dynamic Bloom filters ([Guo et al.]) • Retouched Bloom filters ([Donnet et al.])

  13. Compressed BF • A Bloom filter can be used a message exchanged by networked monitors • New performance metric • Bandwidth • Transmission size can be affected by compression • Compressed Bloom filters ([mitzenmacher])

  14. Compressed BF (2) • Positive aspects: • Quantity of bit exchanged reduced • False positive rate reduced • Amount of computation per query reduced • Cost: • Internal memory increased • Compression/decompression process

  15. Compressed BF (3)

  16. Counting BF • The subset A is changing over time • Insertion • Deletion • How to perform deletion? • Couting Bloom filters ([Fan et al.])

  17. Counting BF (2)

  18. Counting BF (3) • Which size for the counter? • 4 bits per counter are OK for most of the applications • What happens in case of an overflow?

  19. Dynamic BF • Statement: • During the execution of the application, |A| can exceed its orignal size n • Consequence: • The false positive rate is not maintained anymore • Solution? • Dynamic Bloom Filters ([Guo et al.])

  20. Dynamic BF (2) • It uses a matrix of s Bloom filters • Each Bloom filter uses m bits and k hash functions • It starts with s equals to 1. • A new Bloom filter (i.e., a new row in the matrix) is created when needed.

  21. Dynamic BF (3) • How to insert an element? • Check for an active Bloom filter • If there is no active Bloom filter, create a new one • Add the element to the Bloom filter • How to query an element? • If all s Bloom filters return false, the element does not belong the DBF. • If, at least, one Bloom filter returns true, the element probably belongs to the DBF.

  22. Dynamic BF (4)

  23. Retouched BF • Statement: • Some false positives might be more troublesome than others • Some applications might tolerate a small level of false negatives • Question: • Can we trade-off the false positives against false negatives? • Solution? • Retouched Bloom filters ([Donnet et al. 06])

  24. Retouched BF (2)

  25. Retouched BF (3) • Quid if we randomly reset s bits in the vector? • Eliminates the same proportion of false positives as the proportion of false negatives generated • Randomized bit clearing. • The process of removing selected false positives is called selective clearing

  26. Retouched BF (4)

  27. Networking Applications

  28. Distributed Caching • Proxies cooperate to exchange cache information • Instead of sharing URLs list, proxies broadcast Bloom filters ([Fan et al.]) • A Bloom filter represents a proxy’s cache content

  29. Multicast • A router maintains, for each multicast address, a list of associated interfaces/connections • Replace the list by a Bloom filter ([Grönvall]) • Parallelization possible • Deletion of an address can be achieved with a counting Bloom filter

  30. Measurement • Topology discovery at the IP interface level • Traceroute monitors exchange information about what was previously discovered • Doubletree • This information shared can be encoded as a Bloom filter • Communication cost reduction ([Donnet et al. 05])

  31. Conclusion • A Bloom filter • solves the set membership problem • can generate false positives • Extensions to standard Bloom filter were presented • A few networking applications were discussed

  32. References • [Bloom]: Space/Time Trade-Offs in Hash Coding with Allowable Errors. In Communications of the ACM. vol. 13, n°7. • [McIlroy]: Development of a Spelling List. In Transactions on Communications. vol. 30, n° 1. • [Mitzenmacher]: Compressed Bloom Filters. In Transactions on Networking. vol. 10, n° 5. • [Fan et al.]: Summary Cache: a Scalable Wide-Area Web Cache Sharing Protocol. In Transactions on Networking. vol. 8, n°3. • [Guo et al.]: Theory and Network Applications of Dynamic Bloom Filters. In Proc. INFOCOM 2006.

  33. References (2) • [Bratbergsengen]: Hashing Methods and Relational Algebra Operations. In Proc. ICVLD 1984. • [Bruck et al.]: Weighted Bloom Filters. In Proc. ISIT 2006. • [Donnet et al. 06]: Retouched Bloom Filters: Allowing Networked Applications to Trade-Off Selected False Positives Against False Negatives. In Proc. CoNEXT 2006. • [Grönvall]: Scalable Multicast Forwarding. In Proc. ACM SIGCOMM 2001. Student Workshop. • [Donnet et al. 05]: Improved Algorithms for Network Topology Discovery. In Proc. PAM 2005.

