A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

  1. A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives Miguel Jimeno Ken Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 {mjimeno, christen}@cse.usf.edu

  3. Introduction • The internet consumes 2% of all the electricity consumed in the US.[1] • An average PC consumes 120 W when fully powered-on.[10] • PCs could add 10% to the typical US residential consumption. • P2P Applications make the PC remain “on the net” all the time, (they are idle 99% of the time) [1]K. Kawamoto, J. Koomey, B. Nordman, R. Brown, M. Piette, M. Ting, and A. Meier, “Electricity Used by Office Equipment and Network Equipment in the U.S.: Detailed Report and Appendices,” Technical Report LBNL-45917, Energy Analysis Department, Lawrence Berkeley National Laboratory, 2001.

  4. Introduction • Can a P2P application can be run in small, low-power microcontroller? • The PC could then be power managed. • The microcontroller can’t store large list of file names. Bloom Filters: • Bloom filters are a well known probabilistic data structure for representing a list of file name strings.

  5. Introduction Bloom Filters: • A group of hash functions are used to map elements into an array of bits. • False negatives are not possible, but there is a probability of generating false positives. where m = size of the Bloom filter in bits, k = number of hash functions used to calculate a Bloom filter, and s = number of bits set. Figure 1. Bloom filter of size mbits, and k = 4 hash functions. Image Taken from [9]

  6. Background • Bloom filters were first proposed by Bloom [2] • Kirsch et. al. proposed a way to calculate bloom filter with less hashing [7] • Lumetta et. al. used the Power of Two Choices to calculate the bloom filter [8] [2] B. Bloom, “Space/Time Tradeoffs in Hash Coding with Allowable Errors,” Communications of the ACM, Vol. 13, No. 7, pp. 422-426, 1970.

  8. Research Problem • We investigated new methods for reducing the probability of false positives for a Bloom filter for fixed m and n. • The target is the implementation of this structure in a power management proxy.

  10. The SmartNIC • NICs support up to MAC layer, but can’t respond to higher-layer packets. • A PC needs to be fully powered-on in order to respond to packets. • Applications like P2P file sharing require the PC to be fully powered-on all the time. • To manage power in PCs running P2P applications: • We are studying the idea of using small controller to proxy for a sleeping PC.

  11. The SmartNIC • This proxy will be able to maintain P2P TCP connections and respond to query messages. • We are exploring locating the controller on the NIC, so it’s a “SmartNIC”.

  13. The New Design: Best-of-N method • Best-of-N method: N instances of a Bloom filter are generated and the instance with the least number of bits set to 1 is selected. • The “winner” hash group is used to test the bloom filter. • What improvement in Pr[false positive] can be achieved? • 2) What is the computational cost to generate the filter?

  14. The New Design: Best-of-N method • In order to compute N instances quickly, we developed a new pseudo-hashing method called “RNG hashing”. • This method, based on a Random Number Generator, generates multiple hashes from one initial “seed” hash.

  16. Analysis of Best-of-N Method • We define S to be the random variable for the number of bits set in a Bloom filter. • Using order statistics we can determine the distribution of the minimum value of the independent samples S1, S2, …, SN (selected as Best-of-N). • For order statistics, if f(s) and F(s) are known, then

  17. Analysis of Best-of-N Method • For a continuous distribution, • The mean can be computed as • Based on heuristic and empirical evidence, the distribution of S appears to be close to normal. Now we have that • where μ=E[S] and σ= σ[S]. We know that

  18. Analysis of Best-of-N Method • We derive • The probability of false positive for our method is then: where E[Smin] is computed by substituting above.

  20. Numerical Results • For a given m and n where k is chosen optimally, we study the probability of false positive as a function of N. 30%

  21. Numerical Results For Figure 5, n = 1000 and m = 16,000. For Figure 6, same n, but m = 32,000

  23. Experiments Evaluation • Environment • Dell OptiPlex GX620 PC (Pentium4, 3.4 Ghz, 2 MBytes cache) with 1 GByte RAM. • WindowsXP, gcc compiler (version 3.4.2 mingw-special from Dev C++. • A list of 25,000 strings of unique music file names was obtained using Bearshare 5.2. • Response Variables • Probability of false positive for the Bloom filter. • Execution time to generate a Bloom filter.

  24. Experiments Evaluation • Control variables • Hashing method used. • CRC32, Md5, RNG Method, Kirsch Method • Bloom filter parameters m, n, and k. • Best-of-N parameter N. • Number of strings used in the string test set. • Experiments Description • False Positive Exp 1: Vary N, measure Prob. of False Positive. • False Positive Exp 2: Vary N, measure False Pos. • Run-time experiment: Collect CPU time for each N.

  25. Experiments Evaluation • The experimental results for probability of false positive perfectly agree with the analysis. • CPU time results of RNG method were as good as Kirsch method, and better than CRC32. Kirsch and RNG

  27. Summary & Future Work • Two Improvements to Bloom filters • A new Best-of-N method that reduces the probability of false positive by generating N instances of a Bloom filter and selecting the best one. • A new RNG hashing method that generates pseudo hashes given a single seed hash. • Bloom filters could be implemented in a power management proxy for P2P applications. • Savings of up to 85 Mill. could be obtained if 25% of PCs running P2P applications use SmartNICs.

  29. Thanks! I’ll be happy to answer any questions.

