610 likes | 708 Views
Mothematical Modeling: Temporal and Spatial Models of Moth Distribution at the H.J. Andrews Experimental Forest. - .
E N D
Mothematical Modeling: Temporal and Spatial Models of Moth Distribution at the H.J. Andrews Experimental Forest - Erin Childs (Pomona College) , Andrew Calderon (Heritage University), Evan Goldman (Bard College, Boston University), Molly O’Neill (Lehigh University), Clay Showalter (Evergreen University), with the help of Olivia Poblacion (Oregon State University)
Acknowledgements Dr. Dietterich, CS Professor Dr. Wong, CS Professor Steven Highland, Geosciences PhD Candidate Jorge Ramirez, Math Professor Dan Sheldon, CS Post-doc Julia Jones, Geosciences Professor Rebecca Hutchinson, CS Post-doc JavierIllan, PhD, Post-doc
Studying Climate Change: Lepidoptera • Why are Lepidoptera are good indicator of climate change? • Past studies on Lepidoptera • Woiwod 1996: Detecting the effects of climate change on Lepidoptera • Dewar and Watt 1992: Predicted changes in the synchrony of larval emergence and budburst under climatic warming
Research Questions How is vegetation related to moth species distribution and composition? How does climate affect moth phenology?
Study Site H.J. Andrews Experimental Forest http://andrewsforest.oregonstate.edu/about.cfm?topnav=2
How is vegetation related to moth species distribution and composition?
Vegetation Surveying: Methods • GPS coordinates • Walked out 30m and 100m radius in all directions • Presence/absence of 71 species of known host plants
Moth Trapping: Methods • Moth Trapping • 9 sites selected • Equipment used • Moth preservation
Methods • Moth Identification
Moth Trapping Results Semiothissignaria Perooccidentalis
Overview: Is vegetation a good predictor of moth species presence/absence? • Develop software tools for exploring/analyzing data • Run generalized boosted regression models (GBMs) for each moth species • Create GIS layers for the predicted locations of each moth species
Software Tasks for Data Exploration • Format data • Compare the similarities and differences between sites, moths and vegetation • Discover correlations between vegetation and moth species • Calculate marginal probabilities of plant occurrences • Visualize results
Measuring Similarity: Hamming Distance • Hamming distance is the number of co-variates that differ between sample sets • Smaller number means sets are more similar
Marginal Probabilities • Using the vegetation data collected at 20 sites, generate marginal probabilities for plants occurrences If huckleberry (VAHU) is found at a site, what is the probability of finding thimbleberry (RUPA) but not licorice root (!LIGR) at that site?
Canonical Correlation Analysis (CCA) Canonical correlations analysis aims at highlighting correlations between two data sets Gives us a way of making sense of cross-covariance matrices Allows ecologists to relate the abundance of species to environmental variables Using CCA we analyzed our vegetation data and moth data
X-correlation: Highlights any correlations among only moth species (422x422) Y-correlation: Highlights any correlations among only plant species (71x71) Cross-correlation: Highlights any correlations between both data sets (71x422)
Generalized Boosted Regression Models (GBMs) • Regression analysis allows us to explore the relationships between individual moth presence/absence (dependent variable) and various characteristics of each site (independent variables) • The goal is to minimize the loss function, which represents the loss associated with an estimate being different from the true value • Basis functions are an element of a set of vectors that, in linear combination, can represent every vector in a given vector space • Every function can be represented as a linear combination of basis function • Boosting is the process of iteratively adding basis functions in a greedy fashion so that each additional basis function further reduces the selected loss function • The model is run several times with different values for the tuning parameters to determine the best values
Validating the GBM • All available regressors are used in the model, meaning that the choice of independent variables is not supported by theory • The standard approach to validating models is to split the data into a training and a test data set • The model is fit on the training data, then used to make predictions on the test data • This ensures that the model is generalizableand not overfit
Running the Model • Ran the model for individual moth species using all 256 trap sites at HJA, using moth trapping data collected from 2004 to 2008 • Did not include vegetation data, since we only collected it at 20 sites • The GBM lays a grid over the Andrews forest and calculates the predicted probability of the moth species being present for each grid cell
Thermal Climate of the H.J. Andrews Experimental Forest PRISM estimated mean monthly maximum and minimum temperature maps showing topographic effects of radiation and sky view factors. Provided by Jonathan W. Smith
Degree Day Curve Use a linear regression model to interpolate the degree for a given trap site for specific days of a year Parameterize temperature in order to later be included in the temporal model Produce degree day curves for any trap site
Multi-Linear Regression Analysis Find Coefficients Each Trap_ID will have two sets of coefficients (Maximum and Minumum)
Predicting Daily Temp • Linear Interpolation • Fill gaps in the daily temperature data In goes the trap_ID, start_date and end_date Out comes the min and max for the given day(s)
The Problem • Year-round distribution of moths • Limited observation points • Unseen, unmeasurable data • Catching probabilities • Total moth population
Example: Flight times t1 t2 t3 Consider 3 trapping times and 4 associated intervals, and moths with flight times as follows I0 I1 I2 I3
Example: Distribution t1 t2 t3 This gives us a distribution table: I0 I1 I2 I3
Example: Distribution t1 t2 t3 This gives us a distribution table: I0 I1 I2 I3
Example: Distribution t1 t2 t3 This gives us a distribution table: I0 I1 I2 I3
Example: Distribution t1 t2 t3 This gives us a distribution table: I0 I1 I2 I3
Example: Distribution t1 t2 t3 This gives us a distribution table: I0 I1 I2 I3
Example: Distribution This gives us a distribution table:
Example con’t This gives us a distribution table … and flight counts
Example con’t This gives us a distribution table … and flight counts
Example con’t This gives us a distribution table … and flight counts
Example: Flight Counts When trapping moths, all we see is flight counts Given flight counts, we want to predict moth distribution
Maximum Likelihood Model Maximize Prob (Data | Parameters) Data = Moth trapping moths trapped: f=(f1, f2, … fT) times trapped: t=(t1, t2, … tT)
Maximum Likelihood Model Emergence ~ N(µE, σE) Life Span ~ N(µS, σS) Parameters = probability distribution of emergence time and life span Emergence and life span assumed to be Gaussian with parameters µE, σE, µS, σS
tj tk tk+1 tj+1 r s d Moth Distribution … Ik Ij Use distributions to calculate p(j,k), the probability of a moth emerging in interval j and dying in interval k
Probability Table Emergence Interval Death Interval
Multinomial Distribution All moths fall into one of the probability squares Moths have a multinomial distribution Approximate this with a multivariate Gaussian (or normal)
Approximation Error What is the error associated with this approximation? approximated as m!=s(m) Error of
Likelihood • ={µE, σE, µS, σS}