1 / 29

The Maturation of Nearest Neighbors Techniques

The Maturation of Nearest Neighbors Techniques. Ronald E. McRoberts Northern Research Station U.S. Forest Service St. Paul, Minnesota. Western Mensurationists Meeting 22-23 Jun 2009 Vancouver, WA. Some nearest neighbors terminology:

cbey
Download Presentation

The Maturation of Nearest Neighbors Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Maturation of Nearest Neighbors Techniques Ronald E. McRoberts Northern Research Station U.S. Forest Service St. Paul, Minnesota Western Mensurationists Meeting 22-23 Jun 2009 Vancouver, WA

  2. Some nearest neighbors terminology: ● Response variable: variable for which predictions are desired ● Feature space variable: ancillary variable with observation available for every population unit ● Reference set: population units with observations of both response and feature space variables ● Target set: population units for which predictions of response variables are desired

  3. The k Nearest Neighbors (k-NN) technique

  4. k-Nearest Neighbors {yji: j=1,…,k} is the set of k reference pixels nearest to the ith pixel in feature space with respect to a distance metric, d, Primary parameters: k, t, M

  5. Two primary applications ● Filling holes in databases (classic imputation) - target set < reference set ● Spatial estimation - map-based inference - target set >> reference set

  6. Issues in Nearest Neighbors prediction: ● Search for the nearest neighbors ● Search for parameter values - optimal distance metric - optimal weights {wji} - optimal k ● Inference ● Diagnostic tools

  7. Searching for nearest neighbors k-d tree searching

  8. Diagnostics ● Extrapolations ● Influential observations ● Preserving covariances

  9. Diagnostic tools Ranges of feature space variables

  10. Influential reference elements Particularly relevant when combining information from different sources; e.g., registering plot data to remotely sensed data

  11. Diagnostic tools Preserving covariances Reference set Target set FOR VOL BA TD FOR VOL BA TD k=1 FOR 0.96 0.95 0.95 0.96 0.85 0.91 0.92 0.93 VOL 0.98 0.98 0.98 1.07 1.07 1.06 BA 0.97 0.98 1.09 1.08 TD 0.99 1.08 k=5 FOR 0.65 0.72 0.73 0.74 0.56 0.65 0.66 0.67 VOL 0.39 0.41 0.48 0.40 0.42 0.48 BA 0.43 0.48 0.44 0.49 TD 0.47 0.49

  12. Map-based scientific inference Probability- (design-based) inference - validity based on randomization in sampling design - one and only one value for each population unit True Predicted Total C1 … Cp C1 n11 …n1p n1● … Cp np1 …npp np● Total n●1 … n●p

  13. Inference ● Complete enumeration ● Sample-based - expression of results in probabilistic manner - typically a confidence interval - requires bias assessment - requires variance estimate

  14. Map-based scientific inference Probability- (design-based) inference - validity based on randomization in sampling design - one and only one value for each population unit Difference estimator

  15. Map-based scientific inference Model-based inference - validity based on model - an entire distribution of possible values for each population unit

  16. Bias assessment ● Bootstrap ● Compare to estimates that are unbiased in expectation and asympotically unbiased

  17. Tree density Tree count (count/ha)

  18. Optimal distance matrix, M Find a positive, semi-definite matrix M that minimizes where nearest is defined as

  19. Approaches: ● Canonical correlation analysis (Moeur, Stage, et al.) ● Canonical correspondence analysis (Ohmann et al.) ● Mahalanobis ● Genetic algorithm for weighted Euclidean (Tomppo et al.) ● Bayesian for full matrix (Finley et al.) ● Steepest descent (nonlinear regression) (McRoberts et al.)

  20. Steepest descent

  21. Steepest descent such that m12=m21 and |M|≥0

  22. Steepest descent

  23. Consequences for finding an optimal distance matrix ● surface has many local minima and maxima ● surface is very “rough” ● surface dependent on reference set ● consequences similar for any approach

  24. Synthetic dataset: Dataset Weighted Full matrix Euclidean m22 m12=m21 m22 1 0.61 0.73 0.66 2 1.50 0.92 0.87 3 0.45 0.46 0.50 4 0.40 0.98 0.95 5 0.60 1.02 1.05

  25. Conclusions: ● k-NN is a powerful multivariate, non-parametric technique ● efficient algorithms required for selecting parameter values ● diagnostic tools required for evaluating underlying assumptions, unbiasedness, homogeneity of variance, influential reference elements ● inferential methods required ● new thinking required for optimal distance matrix

  26. South Savoy, Finland k Can Mah Euc Opt Cor 1 125.6 87.1 89.1 75.2 5 95.0 70.2 67.2 64.3 10 91.1 68.5 66.0 62.5 15 90.2 68.1 65.7 62.9 20 88.9 68.2 65.3 62.4 30 88.5 68.0 65.1 61.0

More Related