BSTA 670 – Statistical Computing

BSTA 670 – Statistical Computing Lecture 12, Part 2: Numerical Integration and Monte Carlo Methods Partially adapted from: http://www.busim.ee.boun.edu.tr/~busim/bayes/5MCMCv2.ppt

Numerical Integration • Newton-Cotes quadrature (replaces integrand with a polynomial approximation at each sub-interval, where each sub-interval has equal length.) * Riemann rule * Trapezoidal rule, linear approx. for each sub-interval * Simpson's rule, quadratic approx. in each sub-interval • Midpoint rule (linear approximation for each sub-interval) • Guassian quadrature (Unequal sub-interval lengths, with emphasis on regions where integrand is largest. • Monte Carlo methods

In many cases it is difficult to evaluate integrals integrals analytically. • Examples- Computing expectations, Bayesian analysis

Marginal Distribution - Marginalisation integral • Suppose (multidimensional parameter) • We have calculated and we want to calculate the posterior for one parameter only • Where • This is a k-1 dimensional integration

Expectation integral • Evaluate a point estimate • Often, it is not possible to evaluate the integral analytically • This is a common problem with the Bayesian approach

Example: shape parameter of the Gamma distribution • α is known and we observe • Take the prior as uniform distribution • The posterior is proportional to: • Very difficult to have closed form integrals over this complicated integrand

One possible solution is numerical integration • Suppose we wish to integrate • Any of the methods we covered will work, with some modification for the infinite limits.

Approximation of Integral • 1. Find values and beyond which is negligible • 2. Split into N equivalent intervals. Then

With multiparameters, this is numerically inefficient, since we need very large N for good approximation • When there are multi-parameters, you need to form grids on which to assess the function • Where the distributions are peaked, finer grids are needed, and non-equal grids are best. • Methods become quite complicated in practice • Alternative: Monte Carlo methods

Monte Carlo • Monte-Carlo, created in 1866, named in honor of Prince Charles III, hosts an internationally famous Casino, luxury hotels and leisure facilities.Monte Carlo is a district of the Principality of Monaco.

Monte Carlo • Monte Carlo is the use of experiments with random numbers to evaluate mathematical expressions. • The experimental units are the random numbers. • The expressions can be definite integrals, systems of equations, or complicated mathematical models.

Monte Carlo • The term “Monte Carlo” was coined by John von Neumann and S.M. Ulam, as a code word for their secret work at Los Alamos Labs that applied these methods to problems in atomic physics as part of the development of the atomic bomb during World War II. The name was suggested by the gambling casinos in the city of Monte Carlo in Monaco. • The use of these methods increased rapids are the availability of the computer.

Monte Carlo • Standard approximations from numerical analysis are preferred. However, Monte Carlo methods provide an alternative when the numerical approaches are intractable (don’t work). • Monte Carlo methods are typically preferred for the evaluation of integrals over high-dimensional domains.

Monte Carlo • Monte Carlo methods can be used to solve very large systems of equations. In this application, there is no inherent randomness. The randomness is introduced through RVs, which are defined and simulated for the purpose of solving a deterministic problem.

Monte Carlo • In some applications, there is an inherent randomness in the model, and the objective of the Monte Carlo study is to estimate or evaluate the distribution of a variable or statistic. The Monte Carlo method is used to generate data from the underlying distribution of interest. Methods for probability density estimation are then applied using this data, as well as a graphical presentation of the density.

Monte Carlo Studies:Some Practical Considerations • Software: Most statistical software programs have very limited built in simulation functions, but include the basic tools needed to write programs to perform simulations. STATA, Splus, SAS, Matlab (?)

Monte Carlo Studies:Some Practical Considerations • Software: Matlab has Simulation module: Simulink® . But, this is written specifically to design, simulate, implement, and test a variety of time-varying systems, including communications, controls, signal processing, video processing, and image processing. Thus, it is not generally applicable to the problems you will encounter.

Monte Carlo Studies:Some Practical Considerations • Successive sets of Monte Carlo generated data should be based on non-overlapping sequences of random numbers. • One uninterrupted sequence of random numbers should be used for the complete Monte Carlo study.

Monte Carlo Studies:Some Practical Considerations • The results any scientific experiment should be reproducible. This requirement is stricter for a Monte Carlo study. • The experimental units should be strictly reproducible. This means that the complete sequence of random numbers can be repeated, given the same initial conditions (up to machine precision).

What is Monte Carlo used for? • Monte Carlo techniques are used for: • Integration problems • Marginalization (Bayesian problem primarily) • Computing Expectation • Optimization problems • Model selection • Modeling • Simulations

In the case of Bayesian Analysis: • We do not need to numerically calculate the posterior density function. Rather we generate values with its distribution. • These values are then used to approximate the density functions or estimates such as posterior means, variances, etc. • Various ways: • Rejection sampling • Importance sampling

Monte Carlo principle • Given a very large set X and a distribution p(x) over it • We draw i.i.d. a set of N samples • We can then approximate the distribution using these samples

p(x) X Monte Carlo principle

Monte Carlo principle • These samples can also be used to compute expectations • The samples can be used to determine the maximum or minimum (Note: “arg max” stands for “argument of the maximum”, which is the value of the argument for which the value of the expression attains its maximum)

Sampling X: Reverse sampling Method • Simplest method • Used frequently for non-uniform random number generation • Sample a random number ξ from U[0,1) • Evaluate x=Fֿ¹(ξ) • Simple, but you need the functional form of F.

Sampling X: Rejection sampling • More generally, we would like to sample from p(x), but it’s easier to sample from a proposal distribution q(x) • q(x) satisfies p(x) <M q(x) for some M<∞ • Procedure: • Sample x(i) from q(x) • Accept with probability p(x(i)) / Mq(x(i)) • Reject otherwise • The accepted x(i) are sampled from p(x)! • Problem: if M is too large, we will rarely accept samples

Rejection Sampling

Example: Rejection Sampling • Shape parameter of a Gamma distribution • Choosing uniform prior, the prior is bounded and on a finite interval • We need to find a constant such that • In this case c is

Note that only about 0.11% of the proposals are accepted in this example. A lot of waste!

Importance sampling • The problem with the rejection sampling is that we have to be clever in choosing the proposal density! • Importance sampling avoids difficult choices and generates random numbers economically.

Importance sampling • Also called biased sampling • It is a variance reduction sampling technique • Introduced by Metropolis in 1953 • Instead of choosing points from a uniform distribution, they are now chosen from a distribution which concentrates the points where the function being integrated is large. (It encourages important samples)

Importance sampling • Nicholas Constantine Metropolis • Worked ar Los Alamos on the Manhatten project • Worked with Enrico Fermi (developed quantum physics) and Edward Teller (father of the hydrogen bomb) on the first nuclear reactors

Importance sampling • Sample from g(x) and evaluate f(x)/g(x) • The new integrand, f/g, is close to unity and so the variance for this function is much smaller than that obtained when evaluating the function by sampling from a uniform distribution. Sampling from a non-uniform distribution for this function should therefore be more efficient than doing a crude Monte Carlo calculation without importance sampling

In importance sampling we generate N samples from q(x) Importance Sampling To account for the fact we sampled from the wrong distribution we introduce weights Then Monte Carlo estimate of I(f) Choose proposal distribution to minimize variance of the estimator

Markov Chain sampling • Recall again the set X and the distribution p(x) we wish to sample from • Suppose that it is hard to sample p(x) and difficult to suggest a proposal distribution but that it is possible to “walk around” in X using only local state transitions • Insight: we can use a “random walk” to help us draw random samples from p(x)

Andrei A. Markov 1856-1922 Russian mathematician

p(x) X Markov Chain sampling

Markov Chain sampling • That is, if our sampling follows a well defined structure, i.e. if our sample at one instant depends on our sampling at the previous step, we can use this information. • This is a Markov Chain Monte Carlo (MCMC) Sampling and we can benefit from a “random walk”. • MCMC theory basically says that if you sample using a Markov Chain, eventually your samples will be coming from the stationary distribution of the Markov Chain.

Markov Chain sampling • The key to the success of Markov Chain sampling is the fact that at every iteration you sample from a different distribution in a sequence of distributions, which eventually converge to the target distribution. • In importance sampling, you always sample from the same (and wrong) distribution, and then make an adjustment for this.

BSTA 670 – Statistical Computing