1 / 26

Mapping Many Species: Individual vs. Simultaneous with Random Forest

This study compares two strategies for community-level predictive mapping: "assemble first, predict later" and "predict first, assemble later". The study uses random forest regression and nearest-neighbor imputation methods to map species distributions and evaluate map accuracy metrics. The results provide insights into the strengths and limitations of each approach.

bangela
Download Presentation

Mapping Many Species: Individual vs. Simultaneous with Random Forest

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. All for one or One for All?Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather Roberts and Harold Zald August 10, 2012 Ecological Society of America Annual Meeting Portland, Oregon

  2. Species Distribution Modeling • Been around for a long time, and has exploded over the last decade. With the rise of new powerful statistical techniques and GIS tools, the development of predictive habitat distribution models has rapidly increased in ecology. – Guisan and Zimmerman 2000 • Generalized Linear/Additive Models • Neural networks • Bayesian models • Ordination • Classification methods • Web of Knowledge: ‘species distribution’ • 2000 - 2001: 556 articles • 2011 – 2012: 1,389 articles

  3. SDM Uses From Giusan and Thuiller 2005

  4. Strategies for community-level modeling • ‘assemble first, predict later’ • ‘predict first, assemble later’ • ‘assemble and predict together’ --Ferrier & Guisan 2006 Objective: Compare two strategies for community-level predictive mapping.

  5. You Are Here

  6. # True / # Trees = 4/6 = .66 For RF Regression, predicted value for a pixel is the average of all the predictions of nodes.

  7. Random forest -- Nearest-Neighbor imputation Imputation = Filling in missing values from existing values.

  8. Methods: k-NN (2) Place new pixel within feature space study area (4) impute nearest neighbor’s Plot ID # to pixel (3) find nearest-neighbor plot within feature space feature space geographic space Elevation (1) Place plots within feature space Rainfall “Assemble and Predict Together”

  9. Methods: GNN (Ohmann and Gregory 2002) (2) calculate axis scores of pixel from mapped data layers study area (4) impute nearest neighbor’s Plot ID# to pixel (3) find nearest-neighbor plot in gradient space gradient space geographic space CCA Axis 2 (e.g., Temperature, Elevation) (1) conduct gradient analysis of plot data CCA Axis 1 (e.g., Rainfall, local topography)

  10. Methods: Random Forest Nearest Neighbor Imputation study area Random Forest space geographic space

  11. 5 3 7 1 7 2 5 4 6 2 5 3 3 7 1 9 7 4 7 6 10 2 5 7 8 8 2 3 1 5 Nearest Neighbor Plot: #3 Second Nearest Neighbor: #5

  12. Strategies for communitiy-level modeling • ‘assemble first, predict later’ • ‘predict first, assemble later’ • Random forest – classification (binary prediction) • Random forest – regression (continuous prediction) • ‘assemble and predict together’ • Random forest – imputation (continuous prediction) --Ferrier & Giusan 2006

  13. Dimensions of Map Accuracy • Single-species metrics • Range – presence/absence • Abundance – How much basal area? • Is the distribution of values predicted realistic? • Community-level metrics • Diversity • Composition

  14. Fails To Predict Absences Sensitivity: True positives/(True Positives + False Negatives) Specificity: True Negatives/(True Negatives + False Positives) True Skill Statistic (TSS): Sensitivity + Specificity - 1

  15. Cannot Predict Abundance Predictions missing Zeros Root Mean Square Difference: 17.72 18.46

  16. Root Mean Square Difference: 21.34 18.73

  17. Single Species Models • Range • Random Forest – Binary: best • Random Forest – Nearest Neighbor: acceptable • Random Forest -- Continuous: fail • Abundance (Basal Area) • RMSD • Random Forest – Continuous: best • Random Forest – Nearest Neighbor: acceptable • Random Forest – Binary: NA • Empirical Cumulative Distribution Functions: (predicted value distributions) • Random Forest – Nearest Neighbor: best • Random Forest – Continuous: fail • Random Forest – Binary: NA

  18. Diversity: Species Richness and Evenness

  19. Beta Diversity

  20. Average Alpha Diversity for Blue Pixel: 3.04

  21. Results – Composition What is the Bray-Curtis distance between our observed and predicted communities?

  22. Discussion • Species absences are an important dimension of composition • Disturbance? • Succession? • Competition/Facilitation? • Dispersal limitations? • Community assembly rules can be used to help refine mapped species lists. (e.g., Guisan and Rahbek, 2011) • But… imputation avoids the pitfalls & complications of re-assembling communities after mapping because they are never taken apart.

  23. Conclusions • Practical Considerations: • Models of individual species may be • Strongest in one dimension • Useful for understanding species’ ecology • The best option for some types of available data (e.g., presence-only data from museum specimens) • Nearest Neighbor mapping is a useful tool for building multipurpose maps. • Ranges and abundances • Composition • Diversity

  24. Acknowledgements • Nationwide Forest Imputation Study • Landscape Ecology Modeling Mapping and Analysis team in Corvallis.

More Related