170 likes | 245 Views
Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes (Work in Progress). James L. Crooks (SAMSI, Duke University). Background. We desire to predict the distribution of tree species in the North Carolina forest under a variety of future climate change scenarios.
E N D
Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes(Work in Progress) James L. Crooks (SAMSI, Duke University)
Background • We desire to predict the distribution of tree species in the North Carolina forest under a variety of future climate change scenarios. • Toward this end we can use the forest simulator developed by J. Clark and P. Agarwal’s joint research group. • This simulator models the life-cycle of individual trees within a tree stand of pre-specified area. • Growth and fecundity are in part mediated by the climate-influenced variables temperature and soil moisture.
Motivation • The forest simulator has the following properties that make emulation both important and difficult: • Its speed limits the physical area that can be simulated in reasonable time (the current standard is 128 m x 128 m) • Its output is stochastic • Its output distribution can be non-gaussian • Its output distribution can vary over the input space. • Thus there is a need for a local, nonparametric statistical method to emulate the entire outputdistribution across in the input space.
Objectives • Run simulator with 3 species under “standard” climatic conditions for 1000+ years to establish equilibrium initial conditions. • Run simulator for a further 100 years at each of various points in the climate input space (temperature and soil moisture increase rates). • Emulate the output over this input space using the Kernel Stick-Breaking Processes idea of Dunson and Park (2006).
Summary of Input and Output Variables Simulator Climate Input Variables i indexes the run of the simulator xi1 = Mean Temperature Increase / Century xi2 = Mean Soil Moisture Increase / Century Design Matrix (see below) Simulator Output Variables yi1 = Final Number of Adult Trees of Species 1 yi2 = Final Number of Adult Trees of Species 2 yi3 = Final Number of Adult Trees of Species 3
We will focus on number of adult trees. Forest Simulator output for the 1001 year initialization run. Legend — Total — Species 1 — Species 2 — Species 3
Justifying the Choice of Model • We expect that the mean response will be suppressed at extreme values of climate variables. • →Therefore we model the mean response as: • with a design matrix having up to quadratic terms Single Regression Surface Number of Trees Climate Variable (Temperature or Soil Moisture Increase Rate) i indexes simulator run j indexes the tree species k indexes the regression coef.
We do not a priori expect the output distribution to be Gaussian anywhere on the input space. → Use a non-parametric (Dirichlet Process) infinite mixture of regression surfaces instead of a single surface. • We do not a priori expect the shape of the output distribution to be constant over the input space. → Use the Kernel Stick-Breaking Process of Dunson and Park (2006) to allow the DP mixture to be predictor-dependent. Finite (Truncated) Mixture of Regression Surfaces Number of Trees Climate Variable (Temperature or Soil Moisture Increase Rate)
Negative Binomial Likelihood • The output variable of interest is number of adult trees of each species. Why not use a Poisson likelihood? • Preliminary data show Var[y] scales roughly like E[y]2, not E[y], and Var[y] is also inversely dependent on the forest area. →Use the negative binomial distribution, which has pmf: and moments: where the prior range of n can be increased with area.
The Full Model Kernel Stick-Breaking Process
Comments on the Model • This model, unlike Dunson andPark’s original, lacks conjugacy between f and G0; thus two changes must be made to their algorithm: • We no longer have the full conditional for , so we must use a Metropolis-Hastings step to update it. • The integral cannot be evaluated exactly so we must approximate it numerically using (e.g., ) Monte-Carlo integration. • The original MATLAB code is itself not fast, but once a posterior sample has been generated it is cheap to predict the output pmf at new points in the input space.
Generating Simple Climate Change Scenarios • The ballpark estimates of today’s (soil moisture, temperature) mean and covariance are: • The 1000+ year initialization run has temperature and soil moisture generated by a MVN with this mean and covariance. • Temperature is measured in °C and soil moisture in %.
Future 100 year scenarios are generated assuming the means change linearly in time with rates given by the points on plot below: • GCM’s generally predict hotter, drier conditions for the Southeastern US. • Accordingly, ranges were: [-1,+2]*SD/century for Temperature and [-2,+1]*SD/century for Soil Moisture.
Shown are the generated soil moisture and temperature used in the initialization run, and three generated future scenarios. Climate change begins at year 1052. Legend — Stable Climate — Hotter/Drier — Cooler/Wetter
Results • I just got the initialization run back last week, so ask me in 3 months. Other Thoughts • May need to continue the initialization run another 500-1000 years to get a better equilibrium. • Need a lot more runs when using nonparametrics anyway, so the benefits of using a Latin Hyper-Cube design are less obvious (in 2-D anyway).
Acknowledgements • Jim Clark’s group for use of their simulator, and especially Sean McMahon for his invaluable assistance. • David Dunson and Ju-Hyun Park for explaining their paper to me and letting me use their algorithm. • The SAMSI Methodology and Terrestrial Models Working Groups for fruitful discussions. References Dunson, D. B., and J.-H. Park, “Kernel Stick-Breaking Processess”, ISDS Discussion Paper 22 (2006) and Biometrika (accepted) Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable simulator for forest dynamics”, Symposium on Computational Geometry 2004: 106-115 Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable algorithm for dispersing populations”, Journal of Intelligent Information Systems 2004 (online)