1 / 29

A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives. Miguel Jimeno Ken Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 {mjimeno, christen}@cse.usf.edu. Outline. Introduction & Background

tana
Download Presentation

A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives Miguel Jimeno Ken Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 {mjimeno, christen}@cse.usf.edu

  2. Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work

  3. Introduction • The internet consumes 2% of all the electricity consumed in the US.[1] • An average PC consumes 120 W when fully powered-on.[10] • PCs could add 10% to the typical US residential consumption. • P2P Applications make the PC remain “on the net” all the time, (they are idle 99% of the time) [1]K. Kawamoto, J. Koomey, B. Nordman, R. Brown, M. Piette, M. Ting, and A. Meier, “Electricity Used by Office Equipment and Network Equipment in the U.S.: Detailed Report and Appendices,” Technical Report LBNL-45917, Energy Analysis Department, Lawrence Berkeley National Laboratory, 2001.

  4. Introduction • Can a P2P application can be run in small, low-power microcontroller? • The PC could then be power managed. • The microcontroller can’t store large list of file names. Bloom Filters: • Bloom filters are a well known probabilistic data structure for representing a list of file name strings.

  5. Introduction Bloom Filters: • A group of hash functions are used to map elements into an array of bits. • False negatives are not possible, but there is a probability of generating false positives. where m = size of the Bloom filter in bits, k = number of hash functions used to calculate a Bloom filter, and s = number of bits set. Figure 1. Bloom filter of size mbits, and k = 4 hash functions. Image Taken from [9]

  6. Background • Bloom filters were first proposed by Bloom [2] • Kirsch et. al. proposed a way to calculate bloom filter with less hashing [7] • Lumetta et. al. used the Power of Two Choices to calculate the bloom filter [8] [2] B. Bloom, “Space/Time Tradeoffs in Hash Coding with Allowable Errors,” Communications of the ACM, Vol. 13, No. 7, pp. 422-426, 1970.

  7. Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work

  8. Research Problem • We investigated new methods for reducing the probability of false positives for a Bloom filter for fixed m and n. • The target is the implementation of this structure in a power management proxy.

  9. Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work

  10. The SmartNIC • NICs support up to MAC layer, but can’t respond to higher-layer packets. • A PC needs to be fully powered-on in order to respond to packets. • Applications like P2P file sharing require the PC to be fully powered-on all the time. • To manage power in PCs running P2P applications: • We are studying the idea of using small controller to proxy for a sleeping PC.

  11. The SmartNIC • This proxy will be able to maintain P2P TCP connections and respond to query messages. • We are exploring locating the controller on the NIC, so it’s a “SmartNIC”.

  12. Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work

  13. The New Design: Best-of-N method • Best-of-N method: N instances of a Bloom filter are generated and the instance with the least number of bits set to 1 is selected. • The “winner” hash group is used to test the bloom filter. • What improvement in Pr[false positive] can be achieved? • 2) What is the computational cost to generate the filter?

  14. The New Design: Best-of-N method • In order to compute N instances quickly, we developed a new pseudo-hashing method called “RNG hashing”. • This method, based on a Random Number Generator, generates multiple hashes from one initial “seed” hash.

  15. Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work

  16. Analysis of Best-of-N Method • We define S to be the random variable for the number of bits set in a Bloom filter. • Using order statistics we can determine the distribution of the minimum value of the independent samples S1, S2, …, SN (selected as Best-of-N). • For order statistics, if f(s) and F(s) are known, then

  17. Analysis of Best-of-N Method • For a continuous distribution, • The mean can be computed as • Based on heuristic and empirical evidence, the distribution of S appears to be close to normal. Now we have that • where μ=E[S] and σ= σ[S]. We know that

  18. Analysis of Best-of-N Method • We derive • The probability of false positive for our method is then: where E[Smin] is computed by substituting above.

  19. Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work

  20. Numerical Results • For a given m and n where k is chosen optimally, we study the probability of false positive as a function of N. 30%

  21. Numerical Results For Figure 5, n = 1000 and m = 16,000. For Figure 6, same n, but m = 32,000

  22. Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work

  23. Experiments Evaluation • Environment • Dell OptiPlex GX620 PC (Pentium4, 3.4 Ghz, 2 MBytes cache) with 1 GByte RAM. • WindowsXP, gcc compiler (version 3.4.2 mingw-special from Dev C++. • A list of 25,000 strings of unique music file names was obtained using Bearshare 5.2. • Response Variables • Probability of false positive for the Bloom filter. • Execution time to generate a Bloom filter.

  24. Experiments Evaluation • Control variables • Hashing method used. • CRC32, Md5, RNG Method, Kirsch Method • Bloom filter parameters m, n, and k. • Best-of-N parameter N. • Number of strings used in the string test set. • Experiments Description • False Positive Exp 1: Vary N, measure Prob. of False Positive. • False Positive Exp 2: Vary N, measure False Pos. • Run-time experiment: Collect CPU time for each N.

  25. Experiments Evaluation • The experimental results for probability of false positive perfectly agree with the analysis. • CPU time results of RNG method were as good as Kirsch method, and better than CRC32. Kirsch and RNG

  26. Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work

  27. Summary & Future Work • Two Improvements to Bloom filters • A new Best-of-N method that reduces the probability of false positive by generating N instances of a Bloom filter and selecting the best one. • A new RNG hashing method that generates pseudo hashes given a single seed hash. • Bloom filters could be implemented in a power management proxy for P2P applications. • Savings of up to 85 Mill. could be obtained if 25% of PCs running P2P applications use SmartNICs.

  28. References • A. Broder and M. Mitzenmacher, “Network Applications of Bloom Filters: A Survey,” Internet Mathematics, Vol. 1, No. 4, pp. 485-509, 2005. • Energy Information Administration, “U.S Household Electricity Report,” July 2005. Available: http://www.eia.doe.gov/emeu/reps/enduse/er01_us.html. • L. Fan, P. Cao, and J. Almeida, “Bloom Filters - The Math,” 2000. Available: http://www.cs.wisc.edu/~cao/ papers/summary-cache/node8.html. • A. Kirsch and M. Mitzenmacher, “Less Hashing, Same Performance: Building a Better Bloom Filter,” Technical Report TR-02-5, Computer Science Group, Harvard University, 2005. • S. Lumetta and M. Mitzenmacher, “Using the Power of Two Choices to Improve Bloom Filters,” unpublished, 2006. Available: http://www.eecs.harvard.edu/~michaelm/ postscripts/bftwo.ps. • A. Pagh, R. Pagh, and S. Rao, “An Optimal Bloom Filter Replacement,” Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 823-829, 2005. • http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html • US Department of Energy, Energy Efficiency and Renewable Energy, “Estimating Appliance and Home Electronic Energy Use,” 2005. Available: http://www.eere.energy.gov/consumer/your_home/appliances/index.cfm/mytopic=10040.

  29. Thanks! I’ll be happy to answer any questions.

More Related