290 likes | 411 Views
A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives. Miguel Jimeno Ken Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 {mjimeno, christen}@cse.usf.edu. Outline. Introduction & Background
E N D
A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives Miguel Jimeno Ken Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 {mjimeno, christen}@cse.usf.edu
Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work
Introduction • The internet consumes 2% of all the electricity consumed in the US.[1] • An average PC consumes 120 W when fully powered-on.[10] • PCs could add 10% to the typical US residential consumption. • P2P Applications make the PC remain “on the net” all the time, (they are idle 99% of the time) [1]K. Kawamoto, J. Koomey, B. Nordman, R. Brown, M. Piette, M. Ting, and A. Meier, “Electricity Used by Office Equipment and Network Equipment in the U.S.: Detailed Report and Appendices,” Technical Report LBNL-45917, Energy Analysis Department, Lawrence Berkeley National Laboratory, 2001.
Introduction • Can a P2P application can be run in small, low-power microcontroller? • The PC could then be power managed. • The microcontroller can’t store large list of file names. Bloom Filters: • Bloom filters are a well known probabilistic data structure for representing a list of file name strings.
Introduction Bloom Filters: • A group of hash functions are used to map elements into an array of bits. • False negatives are not possible, but there is a probability of generating false positives. where m = size of the Bloom filter in bits, k = number of hash functions used to calculate a Bloom filter, and s = number of bits set. Figure 1. Bloom filter of size mbits, and k = 4 hash functions. Image Taken from [9]
Background • Bloom filters were first proposed by Bloom [2] • Kirsch et. al. proposed a way to calculate bloom filter with less hashing [7] • Lumetta et. al. used the Power of Two Choices to calculate the bloom filter [8] [2] B. Bloom, “Space/Time Tradeoffs in Hash Coding with Allowable Errors,” Communications of the ACM, Vol. 13, No. 7, pp. 422-426, 1970.
Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work
Research Problem • We investigated new methods for reducing the probability of false positives for a Bloom filter for fixed m and n. • The target is the implementation of this structure in a power management proxy.
Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work
The SmartNIC • NICs support up to MAC layer, but can’t respond to higher-layer packets. • A PC needs to be fully powered-on in order to respond to packets. • Applications like P2P file sharing require the PC to be fully powered-on all the time. • To manage power in PCs running P2P applications: • We are studying the idea of using small controller to proxy for a sleeping PC.
The SmartNIC • This proxy will be able to maintain P2P TCP connections and respond to query messages. • We are exploring locating the controller on the NIC, so it’s a “SmartNIC”.
Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work
The New Design: Best-of-N method • Best-of-N method: N instances of a Bloom filter are generated and the instance with the least number of bits set to 1 is selected. • The “winner” hash group is used to test the bloom filter. • What improvement in Pr[false positive] can be achieved? • 2) What is the computational cost to generate the filter?
The New Design: Best-of-N method • In order to compute N instances quickly, we developed a new pseudo-hashing method called “RNG hashing”. • This method, based on a Random Number Generator, generates multiple hashes from one initial “seed” hash.
Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work
Analysis of Best-of-N Method • We define S to be the random variable for the number of bits set in a Bloom filter. • Using order statistics we can determine the distribution of the minimum value of the independent samples S1, S2, …, SN (selected as Best-of-N). • For order statistics, if f(s) and F(s) are known, then
Analysis of Best-of-N Method • For a continuous distribution, • The mean can be computed as • Based on heuristic and empirical evidence, the distribution of S appears to be close to normal. Now we have that • where μ=E[S] and σ= σ[S]. We know that
Analysis of Best-of-N Method • We derive • The probability of false positive for our method is then: where E[Smin] is computed by substituting above.
Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work
Numerical Results • For a given m and n where k is chosen optimally, we study the probability of false positive as a function of N. 30%
Numerical Results For Figure 5, n = 1000 and m = 16,000. For Figure 6, same n, but m = 32,000
Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work
Experiments Evaluation • Environment • Dell OptiPlex GX620 PC (Pentium4, 3.4 Ghz, 2 MBytes cache) with 1 GByte RAM. • WindowsXP, gcc compiler (version 3.4.2 mingw-special from Dev C++. • A list of 25,000 strings of unique music file names was obtained using Bearshare 5.2. • Response Variables • Probability of false positive for the Bloom filter. • Execution time to generate a Bloom filter.
Experiments Evaluation • Control variables • Hashing method used. • CRC32, Md5, RNG Method, Kirsch Method • Bloom filter parameters m, n, and k. • Best-of-N parameter N. • Number of strings used in the string test set. • Experiments Description • False Positive Exp 1: Vary N, measure Prob. of False Positive. • False Positive Exp 2: Vary N, measure False Pos. • Run-time experiment: Collect CPU time for each N.
Experiments Evaluation • The experimental results for probability of false positive perfectly agree with the analysis. • CPU time results of RNG method were as good as Kirsch method, and better than CRC32. Kirsch and RNG
Outline • Introduction & Background • Research Problem • The SmartNIC • The new Design: Best-of-N Method • Analysis of Best-of-N Method • Numerical Results & Experiments Evaluation • Summary & Future Work
Summary & Future Work • Two Improvements to Bloom filters • A new Best-of-N method that reduces the probability of false positive by generating N instances of a Bloom filter and selecting the best one. • A new RNG hashing method that generates pseudo hashes given a single seed hash. • Bloom filters could be implemented in a power management proxy for P2P applications. • Savings of up to 85 Mill. could be obtained if 25% of PCs running P2P applications use SmartNICs.
References • A. Broder and M. Mitzenmacher, “Network Applications of Bloom Filters: A Survey,” Internet Mathematics, Vol. 1, No. 4, pp. 485-509, 2005. • Energy Information Administration, “U.S Household Electricity Report,” July 2005. Available: http://www.eia.doe.gov/emeu/reps/enduse/er01_us.html. • L. Fan, P. Cao, and J. Almeida, “Bloom Filters - The Math,” 2000. Available: http://www.cs.wisc.edu/~cao/ papers/summary-cache/node8.html. • A. Kirsch and M. Mitzenmacher, “Less Hashing, Same Performance: Building a Better Bloom Filter,” Technical Report TR-02-5, Computer Science Group, Harvard University, 2005. • S. Lumetta and M. Mitzenmacher, “Using the Power of Two Choices to Improve Bloom Filters,” unpublished, 2006. Available: http://www.eecs.harvard.edu/~michaelm/ postscripts/bftwo.ps. • A. Pagh, R. Pagh, and S. Rao, “An Optimal Bloom Filter Replacement,” Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 823-829, 2005. • http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html • US Department of Energy, Energy Efficiency and Renewable Energy, “Estimating Appliance and Home Electronic Energy Use,” 2005. Available: http://www.eere.energy.gov/consumer/your_home/appliances/index.cfm/mytopic=10040.
Thanks! I’ll be happy to answer any questions.