1 / 45

Correlation Modeling

Correlation Modeling. Find a “Response” between predictor and response (field sample) variables Environmental Modeling Finding a response between environmental variables and a field measurement Examples: Habitat maps, biomas , board feet, etc. Also applies to:

Download Presentation

Correlation Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation Modeling • Find a “Response” between predictor and response (field sample) variables • Environmental Modeling • Finding a response between environmental variables and a field measurement • Examples: • Habitat maps, biomas, board feet, etc. • Also applies to: • Social issues, economic questions, transportation, engineering, public heath, security, disasters, and combinations

  2. Linear Regression

  3. Correlation Modeling • Creates a model in N-Dimensional “Hyper-Space” • Vary by: • Predictor variables • Response variables • Mathematics used to create the model • Statistics used to optimize parameters • Options for model evaluation

  4. Multiple Linear Regression

  5. Linear Regression: 2 Predictors Mathworks.com

  6. Non-Linear Regression

  7. Correlation Methods • Continuous Regression: • Linear Regression • Generalized Linear Models (GLM) • Generalized Additive Models (GAMs) • Categorical Regression (trees): • Regression Trees • Classification and regression trees (CART) • Machine Learning: • Maximum Entropy (Maxent) • NPMR, HEMI, BRTs, etc.

  8. Brown Shrimp Size • Add graph from work

  9. Exponential Phenomenon

  10. Brown Shrimp in GOM

  11. Spatial Modeling Process 100 0 Spreadsheets Measurements Predictor Layers Temperature Precipitation Modeling Algorithm Model Parameters Habitat Suitability Map Map Generation Habitat Suitability Map for Purple Loostrife by Catherine Jarnevich

  12. Douglas-Fir sample data Create the Model Model “Parameters” Precip Extract Prediction To Points Text File Attributes To Raster

  13. ArcGIS Commands Extract by Mask: Crop raster with polygon Copy Raster: Change raster data type Resample: Change raster resolution Douglas-Fir sample data Create the Model Model “Parameters” Precip Extract Multi Values to Points Prediction Raster To Points Text File Attributes Point To Raster

  14. Sample Data • Original: • Occurrences (Presence) • Measured value: continuous or categorical • Date & Time • Uncertainty • Processed (aggregated): • Min, Max, Mean, Std. Dev., Range • “Filtered”

  15. Aggregating Sample Data • Occurrences to Density • Gridded? • Height or average height?

  16. Doug-Fir Height vs. Precip.

  17. Douglas Fir Height

  18. Predictor Variables • Distance to: water, roads, cities? • Temperature, precipitation • Elevation, aspect, slope, absolute aspect • Soil types • Other species? • Distance to humans? • Census factors: income, age, etc.

  19. Predictor Layers • Means, mins, maxes • Range of values • Heterogeneity • Spatial layers: • Distance to… • Topography: elevation, slope, aspect

  20. Characterizing Uncertainty • Where did the data come from? • What process has it gone through? • Collection methods • Equipment • Protocol • Processing • Transcription errors • Investigate to develop uncertainty estimates: • Documentation, contact those involved http://museum.sdsmt.edu

  21. Data Qualification • What is the nature of the data? • Is the data good enough for the task? • Data: • Samples of the phenomenon we are going to predict (i.e. the response variable) • Predictor variables • Tools: • Plotting: Scattergrams, histograms • Mapping: Visual inspection • Analysis: Lots!

  22. Gross Errors • Lat/Lon: • Reversed • 0, names, dates, etc. • Dates: • Extended in databases • Measurements: • Inconsistent units • Inconsistent protocols • What can you expect from a field team?

  23. Occurrences of Polar Bears From The Global Biodiversity Information Facility (www.gbif.org, 2011)

  24. Temporal Issues • Divide data into months, seasons, years, decades. • Consistent between predictors and response • Extract predictors as close to sample location and dates as possible • Use the “best” predictor layers

  25. Samples and Predictors • As close to field measurements as possible • Clean and aggregate data as needed • Documenting as you go • Estimate overall uncertainty • Answer the question: • What spatial, temporal, and measurement scales are appropriate to model at given the data?

  26. What’s the Impact on Models?

  27. Basic Tools • Histograms: What is the distribution of occurrences of values (range and shape) • Scattergrams: What is the relationship between response and predictor variables and between predictor variables • QQPlots: Are the residuals normally distributed?

  28. CONUS Annual Percip.

  29. Predictor Variables

  30. Min Temp of Coldest Month

  31. Histograms hist(Temp,breaks=400)

  32. Model Optimization & Selection • Modeling approach • Predictor Selection • Parameter estimation • Validation: • Against sub-sample of data • Against new dataset • Parameter sensitivity • Uncertainty estimation

  33. Model Approach • Model Selection: • There are many different model methods and some methods have many options • Run a wide variety and select the one with the best AIC/AICC

  34. Predictor Selection • Predictors are the most important? • Jackknifing • Remove each predictor, rerun model • All combinations of possible predictors?

  35. Parameter Estimation • Most methods estimate the parameters of the model for us • What can we modify to see what the effect is on parameter estimation: • Data set • Maxent: Regularization parameter

  36. Validating Against Samples (Cross-Validation) • Optimal: • Completely separate dataset • Training/test: • Build model with 70% training • Randomly sampled • Test against 30% • Bootstrapping: • Remove samples randomly, model, repeat • Examine how well model predicts removed values

  37. Parameter Sensitivity • Basic: • Modify parameters to expected bounds and re-run model • More: • Modify parameters based on statistical distribution and rerun repeatedly

  38. Uncertainty Estimation • Document all potential uncertainties • If we know (or can guess at) the uncertainty in the sample data or predictors, we can estimate the uncertainty in the outputs: • “Jiggle” the sample data and/or predictors and re-run the model to see the effect

  39. Monte Carlo Methods • Run models repeatedly changing samples, predictor variables, or model options • Provides insight into: • Uncertainty effects • Model sensitivity / robustness • Parameter estimation • Validation (sub-sampling)

  40. Programming is Important • Python or R: • Subset sample data in different ways • Randomly • Select different predictors • Reject one, all combinations • Select models and options • ? • Repeat

  41. Additional Slides

  42. Process Predicted Surface Point to Raster in ArcMap Query Database Predict in R Analysis, Modeling in R Field Data (Points) To Points in ArcMap Save to CSV Grid? Add Predictor Values (ArcMap: Extract From Raster) Predictor Rasters No Yes Convert to raster Convert to points

  43. Digital Elevation Model (DEM)

  44. Ready? • Table of points, polylines, or polygons • Spatial data • Measured values • Predicator layers • Add predictor values to table • Time to model!

More Related