330 likes | 536 Views
CSE474: Simulation and Modeling Generating Random Numbers. Mushfiqur Rouf nasarouf@gmail.com. Initial trials. Throwing dice Dealing out cards Drawing numbered balls Counting gamma rays Using digits in an expansion of p Picking numbers randomly from a phonebook
E N D
CSE474: Simulation and ModelingGenerating Random Numbers Mushfiqur Rouf nasarouf@gmail.com CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Initial trials • Throwing dice • Dealing out cards • Drawing numbered balls • Counting gamma rays • Using digits in an expansion of p • Picking numbers randomly from a phonebook • Using randomly pulsating vacuum tubes CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Importance of U(0,1) • Easy generation • Generation of variates from all other distributions • Realization of random processes CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Arithmetic Generators • Generated series is not random • Given the seed, the whole series is known. • Call it ‘pseudorandom’ • Benefits • Has low memory requirement • Fast CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Good generator • Produce numbers that appear to be • Distributed uniformly • Uncorrelated • Fast and memory efficient • Reproduce • Debugging • Precise comparison of different model simulation results • Produce separate streams of numbers CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Midsquare Method • Start with a four digit number • Square and take the middle four digits • Apparently scrambles well • But Degenerates to zero and stays there • Doing something strange may not be useful CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Linear Congruantial Generators • Most common, most understood • Recursive formula Zi = (aZi-1 + c) (mod m) Ui = Zi/m Z0 is the seed modulus increment multiplier CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
All Zis are completely determined Objection 1: Pseudorandomness CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Objection 2: Limited Values • Can take only the rational values • 0, 1/m, 2/m, … , (m-1)/m • P(Ui<1/m) = 0 • but should be > 0 • Solution: Use large m. • Might take only a fraction of these values. • Depends on a-c-m combination. CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Objection 3: Looping behaviour • If Z100 = Z50 then Z101 = Z51 • So a cycle starts. • Cycle length is called ‘period’ • There are only m possible values. • Max possible period is m. • If an LCG with “full period” has period = m • Period depends on a-c-m CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
LCG Full period • Iff all of the following hold: • m, c are relatively prime • If prime p divides m, then p divides a-1 • If 4 divides m, then 4 divides a-1 • Multiple Streams Support • If streams of size 100000 needed, • Use seeds Z0, Z100000, Z200000, … CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Mixed LCG • c > 0 • To avoid explicit division, use m = 2b and utilize integer overflow • Optimized ‘a’ a = 2l + 1 • Then the multiplication is just shift and add: Zi = 2lxZi–1 + Zi–1 • But has poor statistical properties CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Multiplicative LCG • c = 0 • Cannot have full period since m, c cannot be relatively prime. • Period m – 1 possible • To avoid explicit division: m = 2b • Period is at most 2b–2 • Then we don’t know how uniform the numbers are • NOT a good idea CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Multiplicative LCG • “Prime modulus multiplicative LCG” PMMLCG • Use largest prime < 2b • Example: 231–1 • a is ‘primitive element modulo m’ • am–1 – 1is divisible by m • No smaller power of a has this property • Period = m – 1 CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Generalization of LCG Zi = g(Zi-1,Zi-2, …) (mod m) • Quadratic Congruential Generator g = a’Zi-12 +aZi-12 + c • a’ = a = 1, c = 0, m is a mercenary • Multiple Recursive Generator g(Zi-1,Zi-2, …) = a1Zi-1 + a2Zi-2 + … + aqZi-q • Huge periods ~ mq – 1 • Fibonacci Generator Zi = (Zi-1 + Zi-2) (mod m) CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Composite Generators • Combine two or more separate generators • Longer period, better statistical behavior • Shuffling • Use a second LCG to shuffle output • Shuffling bad LCG by another bad LCG may produce better results. • Little to gain by shuffling a good LCG CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Empirical Tests • Uniformity Test fj= Number of Ui s in interval j • For large n it will have approximately a chi-sq distribution with k–1 degrees of freedom. • Reject null hypothesis “Uis are U(0,1)” if CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Empirical Tests • Serial Test • Generalization of chi-square test to higher dimensions • Non overlapping d-tuples U1 = (U1, U2, …, Ud), U2 = (Ud+1, Ud+2, …, U2d), …should be distributed uniformly in d-dimensional hipercube [0,1]d. • χ2 will have an approximate chi-square distribution with kd – 1 degrees of freedom. CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Empirical Tests • Runs-up test • Run-up: a “maximal” monotonically increasing subsequence • ri = number of runs up of length i <= 5 • r6 = number of runs up of length >= 6 • For large n, R will have approximate chi-square distribution with 6 df. CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Tests • Empirical Tests • Only local test; for only that segment of the cycle. • Examines the actual random numbers to be used in simulation • Theoretical Tests • Global test; for the whole cycle. • Cannot examine a particular segment CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Crystalline structure • Random numbers mainly fall in planes • Overlapping d-tuples(U1, U2, …, Ud), (U2, U3, …, Ud+1), … fall in a small number of d – 1 dimensional hyper planes passing through the d-dimensional hypercube • If two such planes are far apart the generator will perform poorly CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform • How to generate X ~ F? • Generate U ~ U(0, 1) • Return X = F–1(U) • P(X<=x) = P(F–1(U)<=x) = P(U<=F(x)) = F(x) CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform • Intuitive explanation F(x) CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform • Discrete Case: • Generate U ~ U(0, 1) • Determine the smallest integer I such that U<=F(xI). Return X = xI • P(X = xi) = P[F(xi-1) < U <= F(xi)]=F(xi) – F(xi–1) = p(xi) CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform • Can handle mixed distributions with • Discrete, • Continuous and • Flat components • X = min{x: F(x) >= U} CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform • Disadvantages • F–1 might not be found in a closed form • Use numerical methods • May not be the fastest way • Advantages • Facilitates variance reduction techniques • Easily generates truncated distributions • Useful for generating i-th order statistics (using beta distribution) CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Composition • Target distribution is a ‘convex’ combination of other distributions • Generate a positive random integer J such that P(J = j) = pj • Return X with distribution Fj • Geometric interpretation • Select an area • Select from that distribution CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Convolution • X = Y1 + Y2 + … + Ym • Don’t mix up with ‘Composition’ • Generate Y1, Y2, …, Ym IID • Return X = Y1 + Y2 + … + Ym • P(X<=x) = P(Y1 + Y2 + … + Ym <= x) = F(x) CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
t(x) f(x) Acceptance – Rejection • A less direct approach • Requires a function t that ‘majorizes’ f, ie, t(x) >= f(x) for all x • Define r(x)=t(x)/c, a density function. CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
t(x) f(x) Acceptance – Rejection • Generate Y having density r • Generate U~U(0, 1), independent of Y • If U <= f(Y)/t(Y), return X = YElse goto step 1 Intuitively, how does it work? CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
t(x) f(x) Acceptance – Rejection • Choice of t • Calculations must be very efficient • Probability of acceptance in step 3 is 1/c. So We need t with small c. • Conflicting • Tighter functions are more complicated. CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Acceptance – Rejection t(x) t(x) f(x) f(x) CSE 474 Simulation Modeling | MUSHFIQUR ROUF nasarouf@gmail.com http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474