260 likes | 406 Views
January 2007. Bayesian methods for parameter estimation and data assimilation with crop models Part 4: The Metropolis-Hastings algorithm. David Makowski and Daniel Wallach INRA, France. Previously. Bayes’ Theorem.
E N D
January 2007 Bayesian methods for parameter estimation and data assimilation with crop modelsPart 4: The Metropolis-Hastings algorithm David Makowski and Daniel Wallach INRA, France
Previously Bayes’ Theorem • Approximation of posterior distribution with the Importance Sampling algorithm. • Implementation with the R statistical software. • Application to estimate one parameter.
Objectives of part 4 • Present another algorithm to approximate the posterior probability distribution, the Metropolis-Hastings algorithm. • Illustrate with 2 examples. The first has 1 parameter, the second 3 parameters. • Furnish a program in the R language that you can run to implement Metropolis-Hastings for the examples. • R is free (see http://www.r-project.org/.)
Two approaches for approximating posterior distributions from Monte Carlo simulations • 1. Non adaptative methods • All parameter vectors can be generated at the start of the procedure. The choice of parameters to be tested does not depend on the results for previous parameters. • Example: Importance Sampling (see part 3). • 2. Markov chain Monte Carlo methods (MCMC) • Parameter values are generated from a Markov chain. The parameter value to be tested at stage i+1 can depend on the parameter value at stage i. • The most important methods are the Metropolis-Hastings algorithm and Gibbs sampling.
The Metropolis-Hastings algorithm General case Step 0. Choose a starting value θ1. Define a proposal distrubution Pp(θc|θi). (For example, use a normal distribution with mean equal to θi ). Repeat steps 1-3 for i=1,…,N Step 1. Generate a candidate parameter value θc from Pp ( θc|θi ). Step 2. Calculate Step 3. If T1, then θi+1 = θc . If T<1, then draw u from a uniform distribution on the interval (0, 1). If u<T then θi+1 = θc otherwise θi+1 = θi . The result of the algorithm is a list of N parameter values. The same value may be repeated several times.
The Metropolis-Hastings algorithm with symmetric proposal distribution • A common choice for the proposal distribution P(θc|θi) is a normal distribution with mean equal to θi and constant variance. • In this case P( θc|θi) = P(θi|θc ) and the expression for T simplifies: Likelihood Prior density
The Metropolis-Hastings algorithm Choices to me made • The proposal distribution. The number of iterations required for a good approximation to the posterior distribution depends on this choice. • The number of iterations. A large number is generally necessary (N=10000, 100000 or more depending on the problem). • The number of parameter values in the list to be discarded (to reduce dependence on the value chosen for starting the algorithm). • We will give some suggestions for these choices with example 1.
Example 1 Example already presented in parts 2 and 3: Estimation of crop yield. • The single unknown parameter is the yield of a particular field. The prior information is an expert’s opinion. There is also information from a measurement. Both the prior density and the likelihood are normal distributions. • In this case, the exact expression of the posterior distribution is known. • This example is used to show that the Metropolis-Hastings method can give a good approximation of the posterior distribution.
Example 1 – Exact posterior distribution • From part 2, we have: • Measurement: Y=9 t/ha (sd=1) • Prior distribution: P(θ) = N(5, 2²) • Likelihood: P(Y|θ) = N(θ, 1) • Exact posterior distribution: P(θ |Y) = N(8.2, 0.8²) Posterior probability distribution Prior probability distribution Likelihood function
Example 1 – Metropolis-Hastings Step 0. Chooseθ1=5t/ha. As proposal distribution use a normal distribution: P( θc|θi ) = N( θi , 0.8²). Repeat steps 1-3 for i=1,…,N Step 1. Generate a candidate parameter value θc from P( θc|θi ). Step 2. Calculate with and Step 3. If u<min (1, T), where u is drawn from a uniform distribution on the interval (0, 1) then θi+1 = θc otherwise θi+1 = θi .
Example 1 - Results N = 500. Chain 1. Chain of parameter values True posterior distribution Posterior distribution approximated from the last 250 values Measurement
Example 1 - Results N = 500. Chain 2. Chain of parameter values Posterior distribution approximated from the last 250 values True posterior distribution Measurement
Example 1 - Results N = 50000. Chain 1. Chain of parameter values True posterior distribution Posterior distribution approximated from the last 25000 values Measurement
Example 1 - Results N = 50000. Chain 2. Chain of parameter values True posterior distribution Posterior distribution approximated from the last 25000 values Measurement
Example 1 – Running the R program • Install the file « MHyield.txt » on your computer. Note the path. This file has the R program that does the calculations. • Open R. • You will use the “source” command to run the program: • The command is given as a comment in the first line of the program. • In my case, I had to type: source("c:\\David\\Enseignements\\Cours ICASA\\MHyield.txt"). • You must replace the path name by your path name. • Copy and paste the corrected command (without the “#” character) in the Commands window of R. • Press RETURN to execute. • You can easily change the value of N, the measurement value, its accuracy… See comments in my R function MHyield.
Example 2 Estimation of the three parameters of a model of yield response to fertilizer. • Non linear model relating wheat yield to nitrogen fertilizer dose. • Yield = θ1 + θ2 (Dose – θ3) if Dose < θ3 • Yield = θ1 if Dose ≥θ3 • Objective: Estimation of the 3 parameters for a given wheat field. θ1 θ2 Dose 0 θ3
Example 2 – Prior distribution • The prior distribution of the parameters was defined in a previous study (Makowski and Lavielle. 2006. JABES 11, 45-60). • It represents the between-field variability of the parameter values in a region (bassin of Paris). • P(θ1) = N(9.18, 1.16²) t/ha (maximal yield value) • P(θ2) = N(0.026, 0.0065²) t/kg N (slope of the linear part) • P(θ1) = N(123.85, 46.7²) kg N /ha (N dose threshold) • The prior means define the « average » response curve in the region of interest.
Example 2 - Data Data collected in a new wheat plot in the same region. Four yield measurements obtained in this plot for four different N doses. Tested doses: 0, 50, 100, and 200 kg/ha. Corresponding yield measurements in the plot: 2.50, 5.01, 7.45, and 7.51 t/ha.
Example 2 - Likelihood with f(D; θ) is the linear-plus-plateau response function (D = N dose). σ was estimated in a previous study and is set equal to 0.3.
Example 2 – Results with N=50000 – Chain 1. Chains of parameter values
Example 2 – Results with N=50000 – Chain 1. Curve based on prior means Curve based on posterior means
Example 2 – Results with N=50000 – Chain 2. Chains of parameter values
Example 2 – Results with N=50000 – Chain 2. Curve based on prior means Curve based on posterior means
Example 2 – Running the R program • The R program is in the file MHresponse.txt. • To run this function yourself, follow the previous instructions • Type « Return » after the first series of graphs to obtain the second series.
Conclusion Importance sampling versus MCMC • Both methods can be used to approximate the posterior distribution of parameters for any model. • Both methods require the definition of a proposal distribution to generate parameter values. Not easy in practice. • The comparison of the two types of methods is an active area of reasearch. • MCMC methods (Gibbs sampling and MH) can be easily implemented with the WinBUGS software. See http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml.