1 / 51

Stochastic simulation: Monte-Carlo methods, power analyses and the bootstrap

This seminar covers Monte-Carlo methods, power analyses, and the bootstrap in stochastic simulation, with a focus on forest ecology applications. Topics include likelihood methods, forward simulation, and randomization tests.

jkenney
Download Presentation

Stochastic simulation: Monte-Carlo methods, power analyses and the bootstrap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic simulation:Monte-Carlo methods, power analyses and the bootstrap Seminar 8 Likelihood Methods in Forest Ecology October 9th – 20th , 2006

  2. Monte-Carlo methods • Behavior of a statistic or a population can be assessed by drawing lots of random samples and observing this behavior. • The strategy is to create an artificial world and use this pseudo-world to investigate our system. • Bootstrap and power analyses are special cases of MC methods.

  3. Rationale and use of “forward” simulation • The how and why of stochastic simulation. • Power analyses. • Randomization tests: Permutation and bootstrapping.

  4. Rationale and use of “forward” simulation • The how and why of stochastic simulation. • Power analyses. • Randomization tests: Permutation and bootstrapping.

  5. 1. Why stochastic simulation (Monte-Carlo)? • Your data is a stochastic representation of some underlying process(es). • Simulation allows you to understand and test your data against some assumptions you make about generating processes. • Simulation is essential to realistic forecasting. • Allows you to bypass some of the restrictive assumptions of parametric statistical tests.

  6. Types of stochastic simulations • Static • Calculate the deterministic part of a model and add noise • Chain different stages of an underlying set of complex processes • Dynamic • Explore the behavior of a generating process over time.

  7. Static simulation I: Linear model with normal errors= x<-1:20 a=2 b=1 y_det<-a+b*x y<-rnorm(20, mean=y_det, sd=2) plot(x, y) abline (lm(y~x), lty=2, col="blue") abline (a, b, col="red")

  8. Static simulation II: Hyperbolic model with Poisson errors= a=6 b=1 x<-runif(50, min=0, max=5) y_det<-a/(b+x) y<-rpois(50, y_det) plot(x, y) curve(a/(b+x), add=TRUE, col="red")

  9. Example III: Pacala & Silander • Simulate spatial distribution of mother plants in 30x 30 m plot • Simulate offspring distribution • Calculate neighborhood density • Simulate biomass as a function of crowding • Compare theoretical and estimated values

  10. Example III: Pacala & Silander nparents=50 noffsprg=10 # for each parent L=30 # Set dimensions of plot parent_x<-runif(nparents, min=0, max=L) parent_y<-runif(nparents, min=0, max=L) plot(parent_x, parent_y, pch=16, col="blue", xlab="", ylab=" ") • Simulate spatial distribution of mother plants in 30x30 m plot

  11. Example III: Pacala & Silander • Simulate offspring location angle<-runif(nparents*noffsprg, min=0, max=2*pi) dist<-rexp(nparents*noffsprg, 0.5) # offspring are located on average within 2 m of parent plant # Calculate offspring locations with respect to parents: offsprg_x<-rep(parent_x, each=noffsprg) + cos(angle)*dist # each element of parent_x is repeated each times (noffspring) offsprg_y<-rep(parent_y, each=noffsprg) + sin(angle)*dist # Calculate distances for all offspring dist<-sqrt((outer(offsprg_x, offsprg_x,"-"))^2 +(outer(offsprg_y, offsprg_y,"-"))^2 )

  12. Example III: Pacala & Silander • Calculate neighborhood crowding nbrcrowd<-apply(dist<2, 1, sum)-1 # 1 indicates the function will be applied to rows, the -1 is so that the plant doesn't count itself # Plot offspring locations points(offsprg_x, offsprg_y, xlab=" ", ylab=" ")

  13. Simulate biomass as a function of crowding ci=nbrcrowd*3 # 3 is a proportionality constant to match numbers in P&S M=2.3 alpha=0.49 # Expected value of biomass: mass_det<-M/(1+ci) # Pick gamma deviates mass<-rgamma(length(mass_det), scale=mass_det, shape=alpha) plot(ci, mass, cex=0.5, xlab="Competition index", ylab="Biomass(g)") # Plot the expected value on top of the data curve(M/(1+x)*alpha, add=TRUE, from=0, col="blue")

  14. Example IV: Schmitt et al. Fish recruitment • Simulate settlers • Simulate recruits as a function of settlers

  15. Dynamic stochastic simulations • Set aside space to record state of population at each time step • Set starting conditions • For each time step • Simulate population dynamics • Record state of population • Analyze plot results

  16. Define simulation function: observation and process error linsim<-function(nt=20, N0=2, dN=1, sd_process=sqrt(2), sd_obs=sqrt(2)) { cur_N=N0 Nobs=numeric(nt) Nobs[1]=cur_N + rnorm(1, sd=sd_obs) # obs error for 1st time step for (i in 2:nt) { cur_N=cur_N +rnorm(1, mean=dN, sd=sd_process) Nobs[i]=cur_N + rnorm(1, sd=sd_obs) } return(Nobs) } Call function repeatedly nsim=1000 Nmat=matrix(nrow=20, ncol=nsim) for (i in 1:nsim) { Nmat[,i]=linsim(sd_process=2, sd_obs=2)}

  17. Plot results matplot(1:20, Nmat, col="red", type="l", lty=1) lines(1:20, rowMeans(Nmat), lwd=2) matlines (1:20, t(apply(Nmat, 1, quantile, c(0.025, 0.975))), lty=2, col=1)

  18. Sink population immigsim = function(nt = 20, N0 = 2, immig, surv) { N = numeric(nt) N[1] = N0 # starting population size for (i in 2:nt) { Nsurv = rbinom(1, size = N[i - 1], prob = surv) N[i] = Nsurv + rpois(1, immig) # add number of immigrants } return(N) }

  19. Vonesh & Bolker 2005

  20. Predator effects on terrestrial stage • Both frogs and flies cause embryos to hatch approximately 30% earlier. • Early hatchlings have lower weights and are at earlier developmental stages • Frogs can reduce density of tadpoles entering the pond by 60% • Flies have a much smaller effect.

  21. Experiment I: Quantifying the functional response • Vary larval density in the presence and absence of the dragonfly.

  22. Scientific Model: Functional response-Mortality as a function of density: Number of predators Handling time Attack rate Number of prey eaten in t days Assume that actual attacked number follows a binomial distribution with p =probability of an individual being killed over the course of the experiment

  23. Experiment II: Effect of larval size on predation risk • Expose five larval age/size classes to aquatic predators. • Dragonflies were replaced daily to keep predator densities constant

  24. The Scientific Model: Size-specific mortality Prey size Phenomenological scientific model Size-specific predation prob. Assume that probability of predation follows a binomial distribution with prob

  25. Combining size & density-dependent mortalityto predict attack rate • Two tricks…. Density of predation in the size experiment Number of prey eaten per predator per day= It becomes the risk of predation at density N

  26. Simulations M-C methods • Use joint distributions of parameters to estimate variance-covariance matrix: Growth DD mortality Size-dep. mortality

  27. M-C methods • Draw repeatedly from parameter estimates and V-C matrix. • Repeat procedure several times to estimate number killed.

  28. Equal growth rates Compensatory survival Vonesh & Bolker 2005

  29. 2. Power Analysis • Frequentist: Probability of failing to reject the null hypothesis when it is false. • General: How do the quality and quantity of my data and the true parameters of my system affect my ability to answer questions accurately and precisely?

  30. Power Analysis • How many data are there? • How are they distributed? • Number per site/species • Temporal and spatial pattern • Even or clustered • Distribution of covariates…… • How much variation is there? • How small are the effects you are trying to measure (tapering effects)?

  31. Power Analysis: Accuracy and precision • Accurate and imprecise better than imprecise and accurate • Bias (accuracy): minimize expected difference between true value and estimate. • Variance (precision): • Mean square error: combines bias and variance

  32. Power Analysis: Accuracy and precision • Coverage (accuracy): percentage of simulations in which the CI’s include true value. • Power (narrow sense): fraction of times that the null hypothesis will be included in the CI’s. Ho True Ho false Accept Ho 1-  Reject Ho  1- Specify , 1- and effect size

  33. Power test: example Schmitt et al. • How difficult is to fit a Shepherd function? d > 1 Overcompensating d =1 Beverton-Holt d < 1 Undercompensating

  34. Distribution of 400 simulations Mean and lower bound n =1000 observations Bolker ms

  35. Bolker ms

  36. b is upward biased Variance-Bias tradeoff Bolker ms

  37. Power analyses example 2 • Sink population data from previous example. • The true value of the immigration slope parameter is given in simulation. immigsim = function(nt = 20, N0 = 2, immig, surv) { N = numeric(nt) N[1] = N0 for (i in 2:nt) { Nsurv = rbinom(1, size = N[i - 1], prob = surv) N[i] = Nsurv + rpois(1, immig) } return(N) }

  38. Power analyses example 2 • How long do I have to sample to determine that the population is recovering? nt = 20 sim0 = immigsim(nt = nt, N0 = 2, surv = 0.9, immig = 10) # start conditions tvec = 1:nt # How much time will I need to sample the population to assess recovery # Run a linear regression and extract the point estimate and confidence # limits for the slope: lm1 = lm(sim0 ~ tvec) slope = coef(lm1)["tvec"] ci.slope = confint(lm1)["tvec", ]

  39. 3. Randomization tests • Permutation: scramble the data to estimate distribution of some statistic. Typically used for hypothesis testing (e.g.: Are two treatment means different?) • Bootstrapping: resample data with replacement to estimate distribution of a statistic under simulated repetition of an experiment. Typically used to estimate confidence intervals of parameters with data that does not follow a std distribution.

  40. Permutation & Bootstrapping:Seed predation data

  41. 3. Bootstrapping x<-taken theta1<-function(x) {tapply(x, Species, mean)} theta<-function(x) {(theta1(x)["abz"]-theta1(x)["cd"])/sd(x)} library(bootstrap) results <- bootstrap(x, 1000, theta) hist(results$thetastar)

  42. Serengeti Wildebeest Recruitment Survival Pascual et al. 1997

  43. Population no. data & Process error • Population numbers: 15 yrs of data for interval of 1961-1992 (31 yrs). • Survival and recruitment even lower. • Cannot properly characterize process error.

  44. Iteratively reweighted least squares (IRLS) Population numbers Survival numbers Recruitment numbers Must minimize total S(theta) but this depends on minimizing the values of sigmas which in turn depend only on the data for each submodel. There may be different combinations of sigmas that produce similar S(theta) values and we may place undue weight on extreme values.

  45. IRLS • First estimates for all param including sigmas. • Minimize equation 4. • New estimate of sigma from data. • Recycle

More Related