550 likes | 745 Views
Random Number Generation and Testing 随机数的生产及检验. W. W. Tsang 曾衛寰 Department of Computer Science The University of Hong Kong 计 算机科学系 , 香港大学. http://www.cs.hku.hk/~tsang/RNGT.ppt. Henan and Hong Kong. Hong Kong. Hong Kong.
E N D
Random Number Generation and Testing随机数的生产及检验 W. W. Tsang 曾衛寰 Department of Computer Science The University of Hong Kong 计算机科学系,香港大学 http://www.cs.hku.hk/~tsang/RNGT.ppt
The University of Hong Kong 香港大学 Sun Yat Sen 孫中山
Random Number Generation and Testing is an interdisciplinary (跨学科的) area Statistical Computing, Computational Statistic Mathematics数学 Computer Science计算机科学 RNGT
Overview • Random numbers and their applications 应用 • Early random number generators (RNGs) in computers 早期的随机数生产器 • Criteria of good RNGs 标准 • Good RNGs 优秀的生产器 • Goodness-of-fit tests 拟合优度检验 • Statistical tests for RNGs 统计检验 • Conversions of uniform random integers to variates of other distributions 随机数的变换
1. Random numbers and their applications • Entertainment 娱乐 • Gambling 赌博 • Lottery, lucky draw 抽奖 • Games 遊戏 • Cryptography 密码学 • Key generation • Computer simulation 模拟 • Software testing 软体测试 • Generating testing data • Randomized algorithms 随机化算法 • Avoiding worst cases
2. Early RNGs in computers • Reading a large file of random numbers 阅读随机数档案 • Deterministic 预决的 • A 10 billion bit file is available at Diehard Battery of Tests of Randomness v0.2 beta http://www.csis.hku.hk/~diehard/ • Reading of the last few bits of a fast ticking clock 阅读时钟 • Unpredictable 不可预测的
1903-1957 2. Early RNGs in computers • Mid-square method, 1940s • Suggested by John von Neumann in the development of the first atomic bomb 应用在原子弹的开发 • Xn+1 = middle_digits(Xn ×Xn ) X = 45086273 X × X = 2032772013030529 new X = 77201303 • Deterministic • Period (周期) depends on the seed and is hard to determine obsolete!
2. Early RNGs in computers • Congruential generator, 1951, most commonly used • Suggested by Lehmer • Xn+1 = (aX n + c)mod m . X = 45086273 (X×7654321 + 1) mod 108 = 345104806235634 mod 108 new X = 06235634 • Simple, fastest • For 32-bit words, the period can reach 232 • Insecure, the formula can be worked out from output • Fails in many tests • Sufficiently random for many applications 最简单,最常用
2. Early RNGs in computers • 3D points generated using a congruential RNG Points fall on planes Ideal random points 有模式,不够乱
0 + 1 31 55 : exclusive-or 异 2. Early RNGs in computers • Lagged Fibonacci generator, 1958 • suggested by Mitchell and Moore Xn = (Xn-24 + Xn-55) mod 232, n≥ 55 • The period is 231(2551) 长周期! • Fails in the birthday spacing test • Knuth, The Art of Computer Programming, vol 2, 1998. . . . . . .
3. Criteria of good RNGs 标准 • Fast, especially in simulation 快 • Well distributed 分布正确 pass all statistical tests known • Independent 独立的 • Portable and reproducible 在不同的电腦能重複生产的 (for verifying simulation results) • Long periods (for deterministic RNGs) 長周期 • Unpredictable and irreproducible (for cryptography) • Security 保密 (for cryptography) • Large seed spaces (for deterministic RNGs) 种子的选择要够多
0 1 A 397 624 T 4. Good RNGs • Mersenne Twister, 1988 • Makoto Matsumoto & Takuji Nishimura Output xk+624T • Period: 2199371 • Evenly distributed in high dimension • Fast, pass all tests, insecure • Matsumoto, M., and Nishimura, T., 1998, Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Trans. Model. Comput. Simul. 8, No. 1, 3-30. http://www.math.sci.hiroshima-u.ac.jp/%7Em-mat/MT/emt.html . . . . . .
1924 - 4. Good RNGs • Combined Generators 组合的生产器 • Combine the outputs of two or more RNGs, eg, using . • More evenly distributed, more independent, longer period, more secure • The universal generators 通用生产器 • Combine 2 generators: • a Lagged Fibonacci, and • Xn+1 = (X nk)mod 1677213 • Portable, pass all tests • Marsaglia, G., 1984, A current View of Random Number Generators, Keynote Address, Computer Science and Statistics: 16th Symposium on the Interface, Atlanta. • G. Marsaglia, A. Zaman and W.W. Tsang, Toward a universal random number generator, Letters in Statistics and Probability, 9 (1), 35-39, January 1990. • G. Marsaglia and W.W. Tsang, The 64-bit universal RNG, Letters in Statistics and Probability, Vol. 6, Issue 2, pp. 183-187, January, 2004.
4. Good RNGs • Combined Generators • The KISS generator (Keep It Simple, Stupid) 简单生产器 • Suggested by George Marsaglia • Combine three simple generators • A congruential generator • A 3-shift generator • A Multiply-with-carry generator • Pass all tests, popular • Period: ~2124 • http://oldmill.uchicago.edu/~wilder/Code/random/Papers/Marsaglia_2003.html unsigned long KISS() { static unsigned long x=123456789, y=362436, z=521288629, c=7654321; unsigned long long t, a=698769069LL; x=69069*x+12345; y^=(y<<13); y^=(y>>17); y^=(y<<5); t=a*z+c; c=(t>>32); return x+y+(z=t); }
1946 - Alan Turing, 1912 -1954 4. Good RNGs • Turing Award winner in 2000 Andrew Chi-Chih Yao 姚期智 • Contributions 貢献 • Theory of computation • Complexity • Theory of RNGs 随机数理论 If there is no practical way to predict the next bit of an RNG with more than 50% chance, the RNG will pass all statistical tests.
4. Good RNGs • Blum-Blum-Shub (BSS) generators, 1986 • First generator that fulfills Andrew Yao’s RNG theory m=pq, | p | = | q | p and q are distinct primes (质数) of the form 4z+3 • m has 1024 to 4096 bits • Output the last bit of Xn+1 • Well distributed: pass all tests in theory 可以通过所有检验 • Secure but very slow • The period depends on the seed and can only be worked out using an algorithm.
4. Good RNGs • The HAVEGE generator • HArdware Volatile Entropy Gathering and Expansion • André Seznec • Read the fast changing states in the computer in real time, eg, cache, Pipeline states, TLB, etc. 阅读电腦內迅速轉变的数据 • Hardware dependent • Unpredictable • Irreproducible • http://www.irisa.fr/caps/projects/hipsor/HAVEGE1.0.html
1 2 3 4 5 6 5. Goodness-of-fit tests 拟合优度检验 • The following shows 178 outcomes of a dice (骰子). Is the dice honest? Face values 1 2 3 4 5 6 Observed 14 35 28 25 39 35 • A goodness-of-fit test measures the discrepancy between the sample distribution and the purported distribution.
5. Goodness-of-fit tests • Pearson’s chi-square test Compute a statistic, X2, that summarizes the difference Face values 1 2 3 4 5 6 Expected 29.3 29.3 29.3 29.3 29.3 29.3 Observed 14 35 28 25 39 35
14.1 5. Goodness-of-fit tests • The chi-square test • If the samples are distributed as expected, X2 follows the Chi-square distribution of 5 degrees of freedom. • The p-value is the chance that X2 is smaller then 14.1. p = Pr[ X2≤ 14.1] = 0.985 • If the p-value is greater than a pre-determined threshold (eg, 0.95), rejected.
5. Goodness-of-fit tests • Let X be a random variable that is uniformly distributed in [0,1). The cumulative distribution function (CDF) 累积分布函数of X is the diagonal from (0,0) to (1,1).对角线 • Suppose x(1), x(2),…, x(n) are samples of X. • x1, x2,…, xn be the ordered x(i)’s. • The Empirical distribution (经验分布) is the staircase 楼梯
Good fit符合 5. Goodness-of-fit tests • If the samples truly follows the uniform distribution, the staircase will be close to the diagonal most of the time. Bad fit
5. Goodness-of-fit tests • The Kolmogorov-Smirov (KS) test • Most commonly used 最常用 • Measure the maximum absolute distance • Dn= max | F(x) –Fn(x) | 1903 - 1987
5. Goodness-of-fit tests • The KS test • The p-value is the CDF of Dn. The CDF of Dn is difficult to evaluate. 很难算 • In 2003, we found the long forgotten matrix formula derived by Durbin. It is computationally stable and efficient except in the extreme right tail. We fixed the problem using an approximation. The resulting program evaluates the CDF with 13-digit accuracy for 2 ≤ n ≤ 16000. • G. Marsaglia, W.W. Tsang and J. Wang, Evaluating Kolmogorov's distribution, Journal of Statistical Software, Vol. 8, Issue 18, Pages 1-4, November, 2003. (available at http://www.jstatsoft.org/ ) • W. W. Tsang and J. Wang, Evaluating the CDF of the Kolmogorov statistic for normality testing, Proceedings of the COMPSTAT 2004, 16th Symposium of IASC, Prague, August 23-27, 2004, 1893-1900.
5. Goodness-of-fit tests 安徒生,心愛的人 • The Anderson and Darling (AD) test • Summation of the weighted squares of the vertical differences 加权面积的平方 • More powerful than the KS test (比KS检验强) • The CDF of An is harder to evaluate than the CDF of Dn 更难算
5. Goodness-of-fit tests • The AD test • In 2004, Marsaglia published a recursive procedure for computing the CDF of A∞ with 13-digit accuracy. He also gave an approximation formula for evaluating the An with 3-digit accuracy for n > 35. • G. Marsaglia and J. Marsaglia, Evaluating the Anderson-Darling distribution, Journal of Statistical Software, 9(2): 1-5, 2004.
6. Statistical tests for RNGs • Statistical tests are used to reject poor RNGs • The collision test 碰撞检验(An example) • Suppose we throw n balls at random into m cells. A collision occurs when a ball falls into a cell that is occupied. The test counts the no. of collisions (c). A generator passes this test if it doesn’t induce too many or too few collisions.
6. Statistical tests for RNGs • The collision test • The prob. that c collisions occur is where is a Sterling no. of 2nd kind. • Knuth, The Art of Computer Programming, vol 2, 1998. • W.W. Tsang, L.C.K. Hui, K.P. Chow, C.F. Chong, and C.W. Tso, Tuning the collision test for power, Conferences in Research and Practice in Information Series, Vol. 26. No. 1, pp. 23-30, 2004. (Proceedings of the 27th Australasian Computer Science Conference, Dunedin, New Zealand, 2004.) A p-value is computed from c. If it is greater than a threshold (eg, 0.99), rejected.
6. Statistical tests for RNGs • Criteria of good tests • Powerful 能力 • Efficient 效率 • The experiment is similar to certain important applications. Eg, the collision test is similar to the insertion of a hash table.
6. Statistical tests for RNGs • Knuth’s collection The most well-known collection of tests for RNGs is the one compiled by Knuth. It comprises 11 tests. • Knuth, The Art of Computer Programming, vol 2, 1938 - 1998.
6. Statistical tests for RNGs • The National Institute of Standards and Technology (NIST) of USA has suggested 16 statistical tests for checking cryptographic RNGs (美国国家科技標準局) • Frequency (Monobit) Test • Frequency Test within a Block • Runs Test • Tests for the longest Run of Ones in a Block • Binary Matrix Rank Test • Discrete Fourier Transform (Spectral) Test • Non-overlapping Template Matching Test • Overlapping Template Matching Test
6. Statistical tests for RNGs • The NIST collection • Maurer’s “Universal Statistical” Test • Lampel-Ziv Compression Test • Linear Complexity Test • Serial Test • Approximate Entropy Test • Cumulative Sums (Cusum) Test • Random Excursions Test • Random Excursions Variant Test • Official Website: Random number generation and testing <http://csrc.nist.gov/rng/>.
Most powerful An RNG passes these tests passes all other tests 6. Statistical tests for RNGs • Diehard is the most widely used testing package for examining RNGs.最常用的 • Developed by George Marsaglia • Birthday Spacings • GCD • Gorilla 大猩猩 • Overlapping Permutations • Binary Rank nn • Binary Rank 68 • Monkey Tests OPSO, OQSO, DNA • Count the 1’s • Count the 1’s specific
6. Statistical tests for RNGs • Diehard • Parking Lot • Minimum Distance • Random Spheres • The Squeeze • Overlapping Sums • Runs Up and Down • The Craps • Diehard Battery of Tests of Randomness v0.2 beta http://www.csis.hku.hk/~diehard/ • G. Marsaglia and W.W. Tsang, Some difficult-to-pass tests of randomness, Journal of Statistical Software, Vol. 7, Issue 3, Pages 1-8, January, 2002.(available at http://www.jstatsoft.org/ ).
7. Conversions of uniform random integers to variates of other distributions • An RNG outputs random integers that are uniformly distributed (均匀分布), eg, in [0, 232-1] • In applications, we often needs random numbers of other distributions, eg, • Uniform in [0, 1) • Normal (正态分布) • Exponential (指数分布) • Gamma (伽玛分布) • Poisson • Binomial (二项式分布) • Fast methods are needed for the conversions
c Y 0 0 X a 7. Conversions • Given I, a random integer uniformly distributed in [0, 232-1], generate U that is a uniform random number in [0,1). U = I / 232 • Generate points that are uniformly distributed in a rectangle X = a U Y = c U
7. Conversions • If we generate points that are uniformly distributed under a density function (密度函数), the x-coordinates of the points follow the density distribution
7. Conversions • The acceptance-rejection (接受-拒收) method To generate X with the density f(x), 0 x a • X = a * U • Y = b * U • if (Y < f(X)) return X • Go to Step 1. c f(x) Rej Acc 0 a 0
f’(x) f(x) c o a b 7. Conversions • The Monty Python Method 拼凑法 Put a unit rectangle on top of a density (blue area). Flip the cap onto the empty area in top-right. To generate X with the density • X = b U • If (X < a) return X • Y = c U • If (Y < f(X) return X • If (Y > f’(X) return b-X • Sample from the tail f’ (x) is f(x) after flipping over. f’(x)=c-[f(b-x)-c]
g(x) f(x) 7. Conversions • The Monty Python Method A tail can be sampled using the acceptance-rejection method. Instead of using a rectangle, use an easy-to-sample density function g(x) that dominates and close to the tail, f(x)
7. Conversions • The Monty Python method can be used to generate variates of various distribution, including normal, exponential, gamma, student-t, etc. • G. Marsaglia and W.W. Tsang, The Monty Python method for generating random variables, ACM Transactions on Mathematical Software, Vol. 24, No. 3, Pages 341-350, September, 1998. • G. Marsaglia and W.W. Tsang, The Monty Python method for generating gamma variables, Journal of Statistical Software, Vol. 3, Issue 3, Pages 1-8, January 1999. (available at http://www.jstatsoft.org/ ) • G. Marsaglia and W.W. Tsang, A simple method for generating gamma variables, ACM Transactions on Mathematical Software, Vol. 26, No. 3, Pages 363-372, September, 2000.
7. Conversions • The Ziggurat method is a sophisticated version of the Monty Python method. Instead of using a unit rectangle, it uses a staircase curve that is close to the density being sampled. • The method leads to the fastestway to sample from normal and exponential. It is used in Matlab and other software www.mathworks.com/company/newsletters/news_notes/clevescorner/spring01_cleve.html
7. Conversions • The Ziggurat method • References • G. Marsaglia and W.W. Tsang, The ziggurat method for generating random variables, Journal of Statistical Software, Vol. 5, Issue 8, Pages 1-7, October, 2000. (available at http://www.jstatsoft.org/ ) • G. Marsaglia and W.W. Tsang, A fast, easily implemented method for sampling from decreasing or symmetric unimodal density functions, SIAM J. Sci. Stat. Comput., Vol. 5, No. 2, June 1984.
1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 7. Conversions 不连续的 • The alias method for generating discrete variates 别名法 • Suggested by A. J. Walker in 1970s • First convert a histogram into a rectangle. Then Stack up the bars into a unit rod 0.2
1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 7. Conversions • The alias method • Sample from the rod using a single U. First find out which bar the U lands. Then determine whether it lands on the upper or the lower segment Rtn 4 //First set up V[ ] and K[ ] u = U; L = 1 + 5*u ; If (u > V[L]) then return(K[L]); else return( L ); Rtn 0
999 0 11.....122...233........344..455........5 345 103 276 50 226 7. Conversions • The straightforward table look up method (查表法) for generating discrete variates Let the distribution of Y be Pr[Y=1] = 0.345 Pr[Y=2] = 0.103 Pr[Y=3] = 0.276 Pr[Y=4] = 0.050 Pr[Y=5] = 0.226 Generation Y = V[1000*U] V