Computational Methods in Physics PHYS 3437

Computational Methods in Physics PHYS 3437 Dr Rob Thacker Dept of Astronomy & Physics (MM-301C) thacker@ap.smu.ca

Today’s Lecture • Introduction to Monte Carlo methods • Background • Integration techniques

Introduction • “Monte Carlo” refers to the use of random numbers to model random events that may model a mathematical of physical problem • Typically, MC methods require many millions of random numbers • Of course, computers cannot actually generate truly random numbers • However, we can make the period of repetition absolutely enormous • Such pseudo-random number generators are based on truncation of numbers of their significant digits • See Numerical Recipes, p 266-280 (2nd edition FORTRAN)

“Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.” John von Neumann

History of numerical Monte Carlo methods • Another contribution to numerical methods related to research at Los Alamos • Late 1940s: scientists want to follow paths of neutrons following various sub-atomic collision events • Ulam & von Neumann suggest using random sampling to estimate this process • 100 events can be calculated in 5 hours on ENIAC • The method is given the name “Monte Carlo” by Nicholas Metropolis • Explosion of inappropriate use in the 1950’s gave the technique a bad name • Subsequent research illuminated when the method was appropriate

Terminology • Random deviate – a distribution of numbers choosen uniformly between [0,1] • Normal deviate – numbers chosen randomly between (-∞,∞) weighted by a Gaussian

Background to MC integration • Suppose we have a definite integral • Given a “good” set of N sample points {xi} we can estimate the integral as Sample points e.g. x3 x9 Each sample point yields an element of the integral of width (b-a)/N and height f(xi) f(x) a b

What MC integration really does • While the previous explanation is a reasonable interpretation of the way MC integration works, the most popular explanation is below Height given by random sample of f(x) Average a b

Mathematical Applications • Let’s formalize this just a little bit… • Since by the mean value theorem • We can approximate the integral by calculating (b-a)<f>, and we can calculate <f> by averaging many values of f(x) • Where xiє[a,b] and the values are chosen randomly

Consider evaluating Let’s take N=1000, then evaluate f(x)=ex with xє[0,1] at 1000 random points For this set of points define I1=(b-a)<f>N,1=<f>N,1 since b-a=1 Next choose 1000 different xє[0,1] and create a new estimate I2=<f>N,2 Next choose another 1000 different xє[0,1] and create a new estimate I3=<f>N,3 Example

Distribution of the estimates • We can carry on doing this, say 10,000 times at which point we’ll have 10,000 values estimating the integral, and the distribution of these values will be a normal distribution • The distribution of the all of the IN integrals constrains the errors we would expect on a single IN estimate • This is the Central Limit Theorem, for any given IN estimate the sum of the random variables within it will converge toward a normal distribution • Specifically, the standard deviation will be the estimate of the error in a single IN estimate • The mean, x0, will approach e-1 1 1/e x0-sN x0+sN x0

Calculating sN • The formula for the standard deviation of N samples is • If there is no deviation in the data then RHS is zero • Given some deviation as N→∞, the RHS will settle to some constant value > 0 (in this case ~ 0.2420359…) • Thus we can write A rough measure of how good a random number generator is how well does a histogram of the 10,000 estimates fit to a Gaussian.

Add mc.ps plot Increasing the number of integral estimates makes the distribution closer and closer to the infinite limit. 1000 samples per I integration Standard deviation is 0.491/√1000

Resulting statistics • For data that fits a Gaussian, the theory of probability distribution functions asserts that • 68.3% of the data (<f>N) will fall within ±sN of the mean • 95.4% of the data (19/20) will fall within ±2sN of the mean • 99.7% of the data will fall within ±3sNetc… • Interpretation of poll data: • “These results will be accurate to ±4%, (19 times out of 20)” • The ±4% corresponds to ±2s s=0.02 • Since s=1/sqrt(N)  N~2500 • This highlights one of the difficulties with random sampling, to improve the result by a factor of 2 we must increase N by a factor of 4!

Why would we use this method to evaluate integrals? • For 1D it doesn’t make a lot of sense • Taking h~1/N then composite trapezoid rule error~h2~1/N2=N-2 • Double N, get result 4 times better • In 2D, we can use an extension of the trapezoid rule to use squares • Taking h~1/N1/2 then errorh2N-1 • In 3D we get h~1/N1/3 then errorh2N-2/3 • In 4D we get h~1/N1/4 then errorh2N-1/2

MC beneficial for N>4 • Monte Carlo methods always have sN~N-1/2 regardless of the dimension • Comparing to the 4D convergence behaviour we see that MC integration becomes practical at this point • It wouldn’t make any sense for 3D though • For anything higher than 4D (e.g. 6D,9D which are possible!) MC methods tend to be the only way of doing these calculations • MC methods also have the useful property of being comparatively immune to singularities, provided that • The random generator doesn’t hit the singularity • The integral does indeed exist!

Importance sampling • In reality many integrals have functions that vary rapidly in one part of the number line and more slowly in others • To capture this behaviour with MC methods requires that we introduce some way of “putting our points where we need them the most” • We really want to introduce a new function into the problem, one that allows us to put the samples in the right places

General outline • Suppose we have two similar functions g(x) & f(x), and g(x) is easy to integrate, then

General outline cont • The integral we have derived has some nice properties: • Because g(x)~f(x) (i.e. g(x) is a reasonable approximation of f(x) that is easy to integrate) then the integrand should be approximately 1 • and the integrand shouldn’t vary much! • It should be possible to calculate a good approximation with a fairly small number of samples • Thus by applying the change of variables and mapping our sample points we get a better answer with fewer samples

Example • Let’s look at integrating f(x)=ex again on [0,1] • MC random samples are 0.23,0.69,0.51,0.93 • Our integral estimate is then

Apply importance sampling • We first need to decide on our g(x) function, as a guess let us take g(x)=1+x • We’ll it isn’t really a guess – we know this is the first two terms of the Taylor expansion of ex! • y(x) is thus given by • For end points we get y(0)=0, y(1)=3/2 • Rearrange y(x) to give x(y):

Set up integral & evaluate samples • The integral to evaluate is now • We must now choose y’s on the interval [0,3/2] Close to 1 because g(x)~f(x)

Evaluate • For the new integral we have • Clearly this technique of ensuring the integrand doesn’t vary too much is extremely powerful • Importance sampling is particularly important in multidimensional integrals and can add 1 or 2 significant figures of accuracy for a minimal amount of effort

Rejection technique • Thus far we’ve look in detail at the effect of changing sample points on the overall estimate of the integral • An alternative approach may be necessary when you cannot easily sample the desired region which we’ll call W • Particularly important in multi-dimensional integrals when you can calculate the integral for a simple boundary but not a complex one • We define a larger region V that includes W • Note you must also be able to calculate the size of V easily • The sample function is then redefined to be zero outside the volume, but have it’s normal value within it

Rejection technique diagram f(x) V W Region we want to calculate Area of W=integral of region V multiplied by fraction of points falling below f(x) within V Algorithm: just count the total number of points calculated & the number in W!

Better selection of points: sub-random sequences • Choosing N points using a uniform deviate produces an error that converges as N-0.5 • If we could choose points “better” we could make convergence faster • For example, using a Cartesian grid of points leads to a method that converges as N-1 • Think of Cartesian points as “avoiding” one another and thus sampling a given region more completely • However, we don’t know a priori how fine the grid should be • We want to avoid short range correlations – particles shouldn’t be too close to one another • A better solution is to choose points that attempt to “maximally avoid” one another

A list of sub-random sequences • Tore-SQRT sequences • Van der Corput & Halton sequences • Faure sequence • Generalized Faure sequence • Nets & (t,s)-sequences • Sobol sequence • Niederreiter sequence • Well look very briefly at Halton & Sobol sequences, both of which are covered in detail in Numerical Recipes Many to choose from!

Halton’s sequence • Suppose in 1d we obtain the jth number in sequence, denoted Hj, via • (1) write j as a number in base b, where b is prime • e.g. 17 in base 3 is 122 • (2) Reverse the digits and place a radix point in front • e.g. 0.221 base 3 • It should be clear why this works, adding an additional digit makes the “mesh” of numbers progressively finer • For a sequence of points in n dimensions (xi1,…,xin) we would typically use the first n primes to generate separate sequences for each of the xij components

2d Halton’s sequence example Pairs of points constructed from base 3 & 5 Halton sequences

Sobol (1967) sequence • Useful method described in Numerical Recipes as providing close to N-1 convergence rate • Algorithms are also available at www.netlib.org From Num. Recipes

Summary • MC methods are a useful way of numerically integrating systems that are not tractable by other methods • The key part of MC methods is the N-0.5 convergence rate • Numerical integration techniques can be greatly improved using importance sampling • If you cannot write down a function easily then the rejection technique can often be employed

Next Lecture • More on MC methods – simulating random walks

Computational Methods in Physics PHYS 3437