1 / 54

Predicting Median Substrate

Predicting Median Substrate. for Oregon and Washington EMAP sites Utilizing GIS data. Julia J. Smith December 12, 2005. Why Predict Median Substrate?. Indicator of overall stream health Bed load transport Stream Power Microinvertebrate habitat Fish habitat

ojal
Download Presentation

Predicting Median Substrate

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Median Substrate for Oregon and Washington EMAP sites Utilizing GIS data Julia J. Smith December 12, 2005

  2. Why Predict Median Substrate? Indicator of overall stream health • Bed load transport • Stream Power • Microinvertebrate habitat • Fish habitat • How is human development affecting a stream

  3. What is LD50? LD50 is a measure of median substrate. • Geometric mean of class boundaries • Log10 of the geometric means • Several samples at each site • LD50 is the median value of log10(geometric mean of class)

  4. Substrate Classifications

  5. Geomorphic Metrics  is the total bank-full shear stress s is the density of sediment  is fluid density g is gravitational acceleration h is bank-full depth S is channel slope

  6. Geomorphic Metrics Distance-weighted Stream Power versus LD50 r = 0.327, p-value = 2.63 x 10 -12

  7. Geomorphic Metrics Outlet link mean slope versus LD50 r = 0.214, p-value = 3.78 x 10-6

  8. Geologic Metrics Percent Unconsolidated Geologic type versus LD50 r = -0.246, p-value = 1.18 x 10-7

  9. Climatic Metrics Annual average precipitation versus LD50 r = 0.199, p-value = 1.56 x 10-6

  10. Climatic Metrics Average annual potential evapotranspiration (mm) versus LD50 r = -0.046, p-value = 0.342

  11. Land Cover Metrics 1. Developed 2. Barren 3. Forest 4. Grasses 5. Agriculture 6. Wetlands 7. Open water/perennial ice and snow 8. Shrubland

  12. Land Cover Metrics Percentage of watershed that is forest versus LD50 r = 0.19, p-value = 3.516 x 10-5

  13. Distance-Weighted metrics • j represents the land cover type of concern, • Ajrepresents the total area for land cover type j in the watershed, • represents the coefficient of exponential decay, represents average distance from outlet for land cover of type j n represents the total number of the land cover types

  14. Additional Land Cover Metrics • Buffered Metrics – Buffered within a measure of the stream (30 meters, 100 meters, 300 meters) • Buffered and Distance-weighted metrics

  15. Goals • Predict LD50 without visiting sites • Small number of predictors for scientifically sensible model

  16. Methods-Stepwise Variable Selection • Multiple Linear Regression • Top-in-tier models • Top geomorphic models plus one from each of the remaining tiers

  17. Akaike’s Information Criterion N observations p predictors RSS is the sum of squared residuals

  18. AIC in stepwise variable selection Forward Stepwise Selection - Method for choosing the top predictor from each tier Start with the intercept model Choose the variable that reduces AIC the most and include in model. Stepwise selection in both directions- Method chosen for choosing all top Geomorphic predictors Start with full model. Add and subtract variables until the model with minimum AIC is found or iteration stops.

  19. Methods: CART Classification and Regression Trees

  20. Methods: CART Classification and Regression Trees Predicted Response:

  21. Hybrid of Multiple Linear Regression and CART • Utilize CART on the residuals • Add indicator variables to the multiple linear regression equation for one minus the number of terminal nodes in the tree • Create new multiple regression model with variables and indicator variables

  22. Predictive-ability Statistics

  23. Analysis Comparison – Top 4-tier Models • Problems with top 4-tier models • Low Adjusted R2 • Low Predictive Ability • Over-prediction and under-prediction of fine and bedrock substrate • Non-normal residuals • Benefit of top 4-tier models • Small number of predictors

  24. Example of Non-normality of ResidualsTop 4-Tier Model

  25. Analysis Comparison – Geomorphic plus Top 3-Tier Models • Problems with top geomorphic plus top 3-tier model • Increase in number of variables • Predictive ability still low • Over-prediction and under-prediction of fine and bedrock substrate • Some collinearity between variables

  26. Analysis Comparison – Geomorphic plus Top 3-Tier Models • Benefits with top geomorphic plus top 3-tier model • Improved predictions • Improved normality of residuals

  27. Comparison of Analysis - CART • Problems with CART • Low predictive-ability • Predicts several observed substrate sizes in one node • Over-prediction and under-prediction of fines and bedrock substrate • Omitting one site creates different tree • Benefits of CART • Simple analysis • Missing variables not an issue

  28. CART Predictions

  29. Comparison of Analysis-Hybrids • Problems with hybrid models • Increased number of variables • Collinearity with introduction of node indicator variables • Non-normal residuals

  30. Comparison of Analysis-Hybrids • Benefit of hybrid models • Residuals closer to normal • Increased predictive-ability • Explains some of the variation created by fitting a linear model to ordinal data

  31. One example: Residual Tree forHybrid Geomorphic plus Top 3-Tier Model Most promising multiple regression prediction model: Geomorphic plus top 3-tier

  32. One example: Residual Tree forHybrid Geomorphic plus Top 3-Tier Model

  33. One example: Observed vs. Predicted forHybrid Geomorphic plus Top 3-Tier Model Plot of predictions against observed LD50

  34. QQ-Plot of Residuals for Hybrid Model

  35. Coast Range Ecoregion • Less skewed distribution of LD50 • No measurements are outliers • Similar ecosystem throughout region

  36. Ecoregion Distributions

  37. Coast Range EMAP Sites

  38. Top 4-Tier Coast Range Model • Predictors • Average aspect (climatic) • Average watershed elevation (geomorphic) • % watershed as volcanic geologic type (geologic) • % wetlands (distance weighted and buffered)

  39. QQ-Plot: Top 4-Tier Coast Range

  40. Observed versus Predicted: Top 4-Tier Coast Range Model

  41. Coast Range ModelTop Geomorphic Variables • Average watershed elevation (m) • Drainage density • Mean slope within a 300-meter buffer • Ratio of width of stream to width of floodplain • Coefficient of average hill connectivity • Distance to the first tributary (m) • Percent of landscape with less than 4% slope • Percent of landscape with less than 7% slope • Measure of size and complexity of river • Percent of stream as cascade • Distance-weighted stream power • Watershed relief divided by its length

  42. QQ-Plot: Coast Range Geomorphic plus Top 3-Tier model

  43. Observed versus Predicted: Coast Range Geomorphic + Top 3-Tier

  44. CART - Coast Range Ecoregion Predictions versus Observed LD50

  45. Coast Range: Hybrid Models • Benefits of hybrid • Improved prediction • Improved fit • Improved normality of residuals • Problems with hybrid • Increased number of predictors • Collinearity with node indicator variables

  46. QQ-Plot:Coast Range Hybrid Top 4-Tier

  47. Observed versus Predicted:Coast Range Hybrid Top 4-Tier

  48. QQ-Plot: Coast Range Hybrid Geomorphic plus Top 3-Tier

More Related