450 likes | 978 Views
Mersenne Twister Random Number Generator & Diehard Randomness Test. Samuel Antão Technical University of Lisbon/INESC-ID, Lisbon, Portugal 14th December, 2009. Outline. Mersenne Twister Overview Mersenne Twister Details Bitstream Test (Diehard) Other Diehard results Conclusions.
E N D
Mersenne TwisterRandom Number Generator&Diehard Randomness Test Samuel AntãoTechnical University of Lisbon/INESC-ID, Lisbon, Portugal14th December, 2009 Segurança em Redes Móveis 2009
Outline • Mersenne Twister Overview • Mersenne Twister Details • Bitstream Test (Diehard) • Other Diehard results • Conclusions Segurança em Redes Móveis 2009
Mersenne Twister Overview • Proposed in 1998 by M. Matsumoto and T. Nishimura • Ultra long period of 219937-1 • Based on GF(2) fast arithmetic • Standard random number generation in several applications • Maple • Matlab • Python scripting language • Faced some criticism by computer science field (George Marsaglia): • Not very elegant • Marsaglia proposed some RNG not so complex and with equivalent randomness properties M. Matsumoto and Takuji Nishimura. “Mersenne Twister – A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator ”.ACM Transactions on Modeling and Computer Simulation, Vol. 8, No. 1, January 1998, pp 3-30 Segurança em Redes Móveis 2009
Mersenne Twister Overview • History • MT based on General Feedback Shift Register (GFSR) approach (Lewis and Payne, 1973) • Example for the polynomial x5+x2+1 and delay 6: Segurança em Redes Móveis 2009
Mersenne Twister Overview • History (cont.) • The GFSR quality depends on judicious chosen seeds • To overcome this problem in 1992 Matsumoto and Kurita proposed the Twisted GFSR (TGFSR) • In the TGFSR one of the feedback operands is multiplied by a matrix • In 1994, Tezuka, discovered significant correlation in the 2 least significant bits of a TGFSR sequence • In 1998, Matsumoto and Nishimura, overcame that problem by setting one of the feedback operands as a concatenation of high/low part of other two • The 1998 proposal sets the generator period to be of the form 2nw-r-1, thus a Mersenne prime: Mersenne Twister Segurança em Redes Móveis 2009
Mersenne Twister Details • Lack of definition of “good randomness” • The authors selected two randomness metrics that, together, are believed to be respected only by good generators • K-distribution • Number of terms in the recurrence characteristic polynomial Segurança em Redes Móveis 2009
Mersenne Twister Details • K-distribution • Considering a a pseudorandom sequence xiof w-bit integers of period P it is possible to obtain a kv-bit vector as: (truncv( xi ), truncv( xi+1 ), … , truncv( xi+k-1 )) (0 ≤ i < P) • truncv( y) is the most significant v bits of y • A sequence is said to be k-distributed to v-bit accuracy if all the 2kv possible combinations occur the same number of times in one period (except for zero) • k(v) is the maximum k such that the sequence is k-distributed to v-bit accuracy • Pseudorandom generator period bound: 2k(v)v-1 ≤ P • We are interested in as large as possible k correspondent to as large as possible v Segurança em Redes Móveis 2009
Mersenne Twister Details • Recurrence characteristic polynomial number of terms • Each step of the pseudorandom number generation (PRNG) performs a linear recurrence. This recurrence has a specific characteristic polynomial • Usually the characteristic polynomial is a trinomial (GFSR) • A PRNG based on polynomials with few terms shows poor randomness • A PRNG based on polynomials that can be generated from trinomials also shows poor randomness • We are interested in implementing a linear recurrence with a characteristic polynomial with a large amount of terms Segurança em Redes Móveis 2009
Mersenne Twister Details • MT recurrence step • General formula xk+n = xk+m xor (xuk|xlk+1)A, (k=0,1,2,…) • Matrix A chosen to be easy to operate Segurança em Redes Móveis 2009
Mersenne Twister Details • MT recurrence step • Data set (state) transformation • r bits removed from the state to solve the problems regarding low k-distribution for small v-bit accuracy (v=2) reported in previous TGFSR versions • Period given by 2p-1=2nw-r-1 Segurança em Redes Móveis 2009
Mersenne Twister Details • Tampering matrix T • After observing a sufficient large amount of random data, it becomes possible to predict nest random values • The recurrence does not suit cryptographic applications • Solution: after each step the state is multiplied by an invertible matrix T • It does not completely overcome the problem (hash function required for cryptographic applications) • Matrix T operations (z=xT): • y = x xor (x >> u) • y = y xor ((y<<s) and b) • y = y xor ((y<<t) and c) • z = y xor (y >> l) Segurança em Redes Móveis 2009
Mersenne Twister Details • Matrix T does not change the period of the recurrence • Parameters for the recurrence and matrix T obtained after extensive search • The theoretical period bound 2k(v)v – 1 ≤ 2nw-r – 1 cannot be reached • To attain the maximum period 2p – 1 = 2nw-r – 1 the recurrence characteristic polynomial c(t) must be primitive: • A special method to test for primitivity had to be used to compute the generator parameters in useful time Segurança em Redes Móveis 2009
Mersenne Twister Details • MT parameters search K-distribution bounds MT results Previous TGFSR Segurança em Redes Móveis 2009
Mersenne Twister Details • MT summary • Period 219937-1 • 623-distributed to 32-bit accuracy • Characteristic polynomial with several terms (~100) • Constrained data set of 624 word data • Fast generation (from 1.5 to 2 times faster than AES encryption) Segurança em Redes Móveis 2009
Bitstream Test • The random sequence were obtained from a C library available online by Geoff Kuenning: • http://www.cs.hmc.edu/~geoff/mtwist.html • 132 million random 32-bit integers per second (3.6 GHz Pentium Xeon) • 3000 files of 11,468,800 bytes were obtained, to support 3000 independent Diehard tests • The MT random sequences passed all the Diehard tests (Win32 version) Segurança em Redes Móveis 2009
Bitstream Test • Bitstream test details • The random sequence is used to obtain 221overlapped 20-bit words • Considering a repository that contains all the possible words of 20-bit (220), some of these words may not appear in the 221 obtained from the random sequence • For a truly random sequence, the number of missing words obtained from that sequence should approximately follow a normal distribution of µ=141,909 and σ=428 • Each Diehard test run performs 20 bitstream tests. The total of bitstream tests performed are 20x3000=60000. Segurança em Redes Móveis 2009
Bitstream Test • Bitstream test ~ N(141909, 428) (class width = 30) Segurança em Redes Móveis 2009
Other Diehard results • Birthday Spacings ~ χ2(6) (class width = 0.1) Segurança em Redes Móveis 2009
Other Diehard results • Overlapping 5-Permutation ~ χ2(99) (class width = 1) Segurança em Redes Móveis 2009
Other Diehard results • Binary Rank for 31x31 Matrices ~ χ2(3) (class width = 0.1) Segurança em Redes Móveis 2009
Other Diehard results • Binary Rank for 32x32 Matrices ~ χ2(6) (class width = 0.1) Segurança em Redes Móveis 2009
Other Diehard results • Binary Rank for 6x8 Matrices ~ Exp(2) (class width = 0.01) Segurança em Redes Móveis 2009
Other Diehard results • Overlapping-Pairs-Sparse-Occupancy ~ N(141909,290) (class width = 30) Segurança em Redes Móveis 2009
Other Diehard results • Overlapping-Quadruples-Sparse-Occupancy ~ N(141909,295) (class width = 30) Segurança em Redes Móveis 2009
Other Diehard results • DNA ~ N(141909,339) (class width = 30) Segurança em Redes Móveis 2009
Other Diehard results • Count the 1’s ~ χ2(2500) (class width = 7) Segurança em Redes Móveis 2009
Other Diehard results • Count the 1’s for specific bytes ~ χ2(2500) (class width = 5) Segurança em Redes Móveis 2009
Other Diehard results • Parking Lot ~ N(3523,21.9) (class width = 1) Segurança em Redes Móveis 2009
Other Diehard results • Minimum distance ~ Exp(0.995) (class width = 0.01) Segurança em Redes Móveis 2009
Other Diehard results • 3D Spheres ~ Exp(30) (class width = 1) Segurança em Redes Móveis 2009
Other Diehard results • Squeeze ~ χ2(42) (class width = 0.7) Segurança em Redes Móveis 2009
Other Diehard results • Craps - Wins ~ N(200000p,sqrt(200000p(1-p))) (class width = 15), p=244/495 Segurança em Redes Móveis 2009
Other Diehard results • Craps - Throws ~ χ2(20) (class width = 0.7) Segurança em Redes Móveis 2009
Other Diehard results • The remaining tests only provide the final p-values, thus it was no possible to plot the results • Runs test • Overlapping sums test • Although, both provide uniform p-values, confirming the good randomness of the generator Segurança em Redes Móveis 2009
Conclusions • MT is a widely used pseudorandom number generator • MT has extra large period • MT is computationally fast • MT passes the Diehard statistical tests (and others as well) • Statistical tests are a good platform to test for randomness • The bitstream test, is one of statistical tests to be employed Segurança em Redes Móveis 2009