260 likes | 392 Views
Imputating snag data to forest inventory for wildlife habitat modeling. Kevin Ceder College of Forest Resources University of Washington GMUG – 11 February 2008. Why impute snag data?. Snags are an important habitat element and needed for habitat assessments.
E N D
Imputating snag data to forest inventory for wildlife habitat modeling Kevin Ceder College of Forest Resources University of Washington GMUG – 11 February 2008
Why impute snag data? • Snags are an important habitat element and needed for habitat assessments. • These data are often not collected in forest inventory • The Large-Landscape Wildlife Assessment models will need these data
Why use Nearest-Neighbor? • Non-parametric requiring no assumptions of underlying functional form • Retains the variance/covariance structure of the input data in the output data
The Questions • Can snag data be imputed using kNN techniques with stand and site data? • How well do the results fit observed data? • Which distance measure performs best? • What is the effect of increasing neighborhood size? • How do the results compare with random sampling?
The Process • The database • FIA integrated database version 2.1 • Data for private forests in western Washington (1510 plots) • Both tree and snag data collected between 1989 - 1991 • Representative of the forest targeted for the LLWA project
The Process • The tool - • The yaImpute package for kNN imputation • Raw, Euclidean, Mahalanobis, MSN, MSN2, ICA, and randomForest distance measures • k = 1, 2, 3, 4, 5, 10 • For k>1 imputed data are distance weighted means of neighbors • 9999 permutations of the data for comparisons with random sampling • k = 1, 2, 3, 4, 5, 10 • For k>1 imputed data are distance weighted means of neighbors using Euclidean distance
Goodness of fit Comparison with random The Statistics
The Input Data – Snag data (yData) • 695 of 1510 plots did not have snags present
Results • Can snag data be imputed using kNN techniques with stand and site data? • Yes!
Results • How well do the results fit observed data?
Results • How well do the results fit observed data?
Results • How well do the results fit observed data?
Results • How well do the results fit observed data? • Marginally… • High RMSD and MAD relative to mean snag measures in the data • Observed vs imputed plots show poor patterning
Results • Which distance measure performs best? • What is the effect of increasing neighborhood size?
Results • Which distance measure performs best? • All are generally similar • randomForest imputations provide lower RMSD and MAD but under-predict more than others • What is the effect of increasing neighborhood size? • Increasing k reduces RMSD and MAD • Little effect on bias • Slightly decreased range in imputed values with k = 10
Results • How do the results compare with random sampling?
Results • How do the results compare with random sampling?
Results • How do the results compare with random sampling? • p-values of 0.001 suggest that there is some underlying very weak relationship between snags and overstory • Imputation is better than just randomly assigning snags to stands
Why didn’t it work better? • Very weak correlations between overstory and snags • Snags are from prior stand • Many of the snags in the FIA database have advanced decay classes • Often snags are larger than QMD • Management history • Snags were removed at harvest • Thinning captures mortality
Future Direction • Assessing the effects of imputed data on habitat model outputs • If there are big differences then what?