Random Numbers in HEP

Random Numbers in HEP Tests of Random Engines in CLHEP Heinrich’s note about RandEngine

Essential Concepts • Instances • Use of a random in one module need not affect the sequence of randoms seen by another • Engines vs Distributions • An Engine is a source of uniform (pseudo-) randomness • A Distribution uses an engine to fire off variates • Flat() • Gaussian() • Validity of Randomness • How {unbiased, ergodic, unpredictable} is the engine?

How Engines Can be Bad • Wrong single-number statistics • Wrong mean (c.f. the buggy Linux RandEngine) • Wrong sigma (or higher moment) • Very easy to detect such a major deficiency • Non-uniform distribution with right moments • Possible! (but mathematically pathological) • And certainly possible to an excellent approximation * • Sequence-order dependent flaws • Can be harder to detect • Can still lead to incorrect physics!

An engine with good moments but flawed distribution • Every engine on a real computer! • (But if the granularity is small enough this “flaw” becomes moot)

Typical Sequence Flaws • Tom Nash, doing a thesis in the 70’s, found that pairs of numbers from the “random” engine in use lied in stripes. • He was far from the first to notice this! • But it was affecting the physics

Testing an Engine • The general test: • Start with a mathematical observation: “If Sn is an ergodic sequence, uniformly distributed on (0,1), then for all subsequences T, some property P should hold.” • Property P should hold, to accuracy e(n) and with probability p(n), where e(n)0 and p(n) 1 for large n. • Apply the “Does P hold” test to a bunch of sequences generated by the engine. • An Engine that passes every possible test based on every possible property to perfect precision would be a perfect random engine • But mere mortals have to pick some set of tests • And have to settle for some finite precision on each test

The DIEHARD Suite • Invented (more like collected) by Marsaglia. • 13 tests (for 13 properties of the sequence). • Each test uses the sequence to generate a group of numbers, with mathematically known distribution for random sequences. • The distribution obtained is compared via a Kolmogorov-Smyrnoff test with the proper distribution. • If any P-value is extremely close to 1, this is proof of non-randomness of the sequence.

Interpreting the Results of a test suite • The longer the sequences used, the more discrimination power. • If no test fails, that is not proof that the sequence is random: • It is proof that either the tester was not clever enough to find and expose non-random properties, or that the sequence used was not long enough. • But for practical uses, some suite of tests at some level of N is “good enough.”

The DIEHARD Tests • Birthday Spacings • The likelihood of j birthdays being shared among m people in an n-day year is Poisson distributed with mean m3/(4n). • Not order-sensitive. • Overlapping 5-permutation Test • Each 5-number subsequence forms one of 120 possible ordering numbers. The count for each should be equal and uncorrelated. • Binary Rank Test (32x32 and 6x8) • The distribution of ranks of 32x32 bit matrices is known theoretically. This detects many linear realations involving up to 1000 bits of the sequence. • Bitstream Test • For a random stream of 221 bits, the number of “missing” 20-bit “words” should be normally distributed with mean 142,000 and sigma 428.

and… • Overlapping Pairs and Quadruples, Sparse Occupancy • Missing 2- and 4-”letter” words in a long sequence of “letters” from a 1024-letter alphabet • Not very order sensitive past N=20 or 40 • DNA Test • Missing “10-genome” sequences in the 4-letter genome alphabet • Count-the-1-’s test on a stream of or specific bytes • Uses the number of 1’s in 8-bit sequences, to form pseudo-letters, which are then grouped into 5-letter words. • Runs Test • The theoretical distributions and covariances fro up and down runs of N numbers are known.

and… • Craps test • A long sequence of games of craps is simulated. The distribution of wins and losses, and of the number of dice throws for each game, is known. • Parking Lot Test • Make a number of attempts to park a circle in a random “lot” of a square of side 100. Re-try if there are collisions. After 120,000 tries, the number of successes shuld match some (experimentally determined) distribution. • Overlapping Sums Test • The sums of 100 variables should be virtually normally distributed. • Sensitive to medium-scale imperfections. • Squeeze Test • Number of iterations needed to reduce 231 down to 1, by doing k = ceiling(k*random()).

The Minimum Distance Test • The last two tests are the MD test in 2 and 3 dimensions: • Use N pairs (triplets) of randoms as coordinates of N points in the unit volume. • The closest pair of points has some distance d. • d has a “known” theoretical distribution • As applied by Marsaglia, most generators semi-fail the 2-D Minimum distance test!

The Minimum Distance Test • The last two tests are the MD test in 2 and 3 dimensions: • Use N pairs (triplets) of randoms as coordinates of N points in the unit volume. • The closest pair of points has some distance d. • d has a “known” theoretical distribution • As applied by Marsaglia, most generators semi-fail the 2-D Minimum distance test! • P-values of .99 are typical. • They did not semi-fail when the FORTRAN versions of the tests were applied. • The f2c version showed this behavior. • ?

The Minimum Distance Test • Marsaglia speculated this was due to some funny business (error) with converting array code using f2c. • Turns out it was a function of being able to do more iterations on later processors. • The theoretical distribution had been calculated only to lowest order. By the time ZOOM did these tests, we were exposing the fact that the actual distribution differed. • See my Technical memo TM2170, where I do the next few orders. This resolves the discrepancy.

The CLHEP Engines • JamesRandom • Fred James’ adaptation of Marsaglia’s add-with-carry generator. Large state. • Ranecu • Multiplicative Congruential generator using formula constants of L'Ecuyer. Large state. • Ranecu • Multiplicative Congruential generator using formula constants of L'Ecuyer. Loarge state. • Ranlux and Ranlux64 • Luescher’s “folding” generator. For luxury levels of 2+ it is excellent; for luxury 4 is “provably” ergodic. Moderate state. • TripleRand • Used in the Canopy Lattice gauge computations, this combines a Hurd primitive polynomial generator with two other unrelated generators to form a family of trustworthy yet distinct sequences. Moderate state.

More CLHEP Engines • Hurd288 and Hurd 160 • Based on primitive polynomials over Z(2). Provably ergodic (but I don’t know how well to trust the proof). Used as basis for TripleRand and DualRand. Small state. • RanshiEngine • Modeled on spinning balls. Very large state, but fast and excellent. • MTwistEngine • Based on number-theory considerations involving large Mersenne primes. Very Large state. Many randomness properties proven, though the math is difficult. Fast and excellent (can be made even faster…) • All the above engines have excellent randomness properties.

And some Low-quality CLHEP Engines • DRand48Engine • Based on a common Unix system random number generator. Very small state and fast, but fails several randomness tests. • RandEngine • Directly calls the system function rand(). Not portable; state not savable. Its claim to fame is that the usual implementation of rand() is dissected in Knuth and shown to be a terrible generator!! • RandEngine, when coded properly, fails half of the DIEHARD tests • NonRandomEngine • Lets people test program behavior on a set forced sequence of randoms. Requires user to specify a sequence. Useful for testing. If you use this to try to generate true random numbers, you get what you deserve. • Why was RandEngine left in CLHEP? • According to a web page circa 1997, because Geant4 insisted on it. • Fortunately, they then proceeded not to use it.

How the CLHEP Engines do on DIEHARD • DRand48Engine failed the three “alphabet sequences” tests • indicating bad correllations at the range of 5-number subsequences • Ranlux at luxury level 0 (which was never recommended) failed the Birthday and Craps tests. • At luxury level above 1 it is fine

How the CLHEP Engines do on DIEHARD • Several engines semi-failed the Minimum Distance test • The more numbers we could afford to generate, the more likely the failure • We now understand the flaw • A summer student in 1998 coded the test in C++ from scratch, with my correction terms in the distribution, and all the “good” engines passed. • All engines other than RandEngine and DRand48 pass the entire DIEHARD Suite.

RandEngine • On systems until Linux, rand() used a standardized 15-bit linear congruence • Lame, see Knuth • But kind of OK. • At some point, an attempt was made in CLHEP to combine 2 calls to rand() by a shift and OR – to get 3 bits of randomness. • Still just as lame

RandEngine • This coding was blown, in that when Linux came along with a 32-bit result for rand(), the OR caused the average value returned to be 5/8!!!! • Results from programs run on Linux which used RandEngine must be discarded. • Fortunately, it appears nobody trusted RandEngine anyway • CDF uses Ranecu • Geant4 does not have RandENgine anywhere in there source code.

Recommendations • Ranecu is perfectly fine, no need to change. • Ranlux on level 2 is about as fast but “provably” reliable. • MTwist (if you have only a few instances so can afford a very large state), TripleRand, and JamesEngine are about twice as fast and very probably just as good or better. • Ranshi is faster still but has a huge state, and I’m not 100% convinced it can’t have flaws.

Random Numbers in HEP

Random Numbers in HEP

Presentation Transcript

Pseudo Random and Random Numbers

Generating Random Numbers

Random numbers

Generating Random Numbers

Random Numbers

Random Numbers

Random Numbers

Random Numbers

Random Numbers

Pseudo Random Numbers

7 Random Numbers

Generating Random Numbers in Hardware

Random Numbers

Random Numbers

Random Numbers

Random event or random numbers?

Random Numbers in GAUDI

Random numbers

Using random numbers

Random numbers in Python

Generating Random Numbers

Random Numbers