460 likes | 683 Views
Correlation Modeling. Find a “Response” between predictor and response (field sample) variables Environmental Modeling Finding a response between environmental variables and a field measurement Examples: Habitat maps, biomas , board feet, etc. Also applies to:
E N D
Correlation Modeling • Find a “Response” between predictor and response (field sample) variables • Environmental Modeling • Finding a response between environmental variables and a field measurement • Examples: • Habitat maps, biomas, board feet, etc. • Also applies to: • Social issues, economic questions, transportation, engineering, public heath, security, disasters, and combinations
Correlation Modeling • Creates a model in N-Dimensional “Hyper-Space” • Vary by: • Predictor variables • Response variables • Mathematics used to create the model • Statistics used to optimize parameters • Options for model evaluation
Linear Regression: 2 Predictors Mathworks.com
Correlation Methods • Continuous Regression: • Linear Regression • Generalized Linear Models (GLM) • Generalized Additive Models (GAMs) • Categorical Regression (trees): • Regression Trees • Classification and regression trees (CART) • Machine Learning: • Maximum Entropy (Maxent) • NPMR, HEMI, BRTs, etc.
Brown Shrimp Size • Add graph from work
Spatial Modeling Process 100 0 Spreadsheets Measurements Predictor Layers Temperature Precipitation Modeling Algorithm Model Parameters Habitat Suitability Map Map Generation Habitat Suitability Map for Purple Loostrife by Catherine Jarnevich
Douglas-Fir sample data Create the Model Model “Parameters” Precip Extract Prediction To Points Text File Attributes To Raster
ArcGIS Commands Extract by Mask: Crop raster with polygon Copy Raster: Change raster data type Resample: Change raster resolution Douglas-Fir sample data Create the Model Model “Parameters” Precip Extract Multi Values to Points Prediction Raster To Points Text File Attributes Point To Raster
Sample Data • Original: • Occurrences (Presence) • Measured value: continuous or categorical • Date & Time • Uncertainty • Processed (aggregated): • Min, Max, Mean, Std. Dev., Range • “Filtered”
Aggregating Sample Data • Occurrences to Density • Gridded? • Height or average height?
Predictor Variables • Distance to: water, roads, cities? • Temperature, precipitation • Elevation, aspect, slope, absolute aspect • Soil types • Other species? • Distance to humans? • Census factors: income, age, etc.
Predictor Layers • Means, mins, maxes • Range of values • Heterogeneity • Spatial layers: • Distance to… • Topography: elevation, slope, aspect
Characterizing Uncertainty • Where did the data come from? • What process has it gone through? • Collection methods • Equipment • Protocol • Processing • Transcription errors • Investigate to develop uncertainty estimates: • Documentation, contact those involved http://museum.sdsmt.edu
Data Qualification • What is the nature of the data? • Is the data good enough for the task? • Data: • Samples of the phenomenon we are going to predict (i.e. the response variable) • Predictor variables • Tools: • Plotting: Scattergrams, histograms • Mapping: Visual inspection • Analysis: Lots!
Gross Errors • Lat/Lon: • Reversed • 0, names, dates, etc. • Dates: • Extended in databases • Measurements: • Inconsistent units • Inconsistent protocols • What can you expect from a field team?
Occurrences of Polar Bears From The Global Biodiversity Information Facility (www.gbif.org, 2011)
Temporal Issues • Divide data into months, seasons, years, decades. • Consistent between predictors and response • Extract predictors as close to sample location and dates as possible • Use the “best” predictor layers
Samples and Predictors • As close to field measurements as possible • Clean and aggregate data as needed • Documenting as you go • Estimate overall uncertainty • Answer the question: • What spatial, temporal, and measurement scales are appropriate to model at given the data?
Basic Tools • Histograms: What is the distribution of occurrences of values (range and shape) • Scattergrams: What is the relationship between response and predictor variables and between predictor variables • QQPlots: Are the residuals normally distributed?
Histograms hist(Temp,breaks=400)
Model Optimization & Selection • Modeling approach • Predictor Selection • Parameter estimation • Validation: • Against sub-sample of data • Against new dataset • Parameter sensitivity • Uncertainty estimation
Model Approach • Model Selection: • There are many different model methods and some methods have many options • Run a wide variety and select the one with the best AIC/AICC
Predictor Selection • Predictors are the most important? • Jackknifing • Remove each predictor, rerun model • All combinations of possible predictors?
Parameter Estimation • Most methods estimate the parameters of the model for us • What can we modify to see what the effect is on parameter estimation: • Data set • Maxent: Regularization parameter
Validating Against Samples (Cross-Validation) • Optimal: • Completely separate dataset • Training/test: • Build model with 70% training • Randomly sampled • Test against 30% • Bootstrapping: • Remove samples randomly, model, repeat • Examine how well model predicts removed values
Parameter Sensitivity • Basic: • Modify parameters to expected bounds and re-run model • More: • Modify parameters based on statistical distribution and rerun repeatedly
Uncertainty Estimation • Document all potential uncertainties • If we know (or can guess at) the uncertainty in the sample data or predictors, we can estimate the uncertainty in the outputs: • “Jiggle” the sample data and/or predictors and re-run the model to see the effect
Monte Carlo Methods • Run models repeatedly changing samples, predictor variables, or model options • Provides insight into: • Uncertainty effects • Model sensitivity / robustness • Parameter estimation • Validation (sub-sampling)
Programming is Important • Python or R: • Subset sample data in different ways • Randomly • Select different predictors • Reject one, all combinations • Select models and options • ? • Repeat
Process Predicted Surface Point to Raster in ArcMap Query Database Predict in R Analysis, Modeling in R Field Data (Points) To Points in ArcMap Save to CSV Grid? Add Predictor Values (ArcMap: Extract From Raster) Predictor Rasters No Yes Convert to raster Convert to points
Ready? • Table of points, polylines, or polygons • Spatial data • Measured values • Predicator layers • Add predictor values to table • Time to model!