1 / 33

Transforms and other prestidigitations—or new twists in imputation.

Transforms and other prestidigitations—or new twists in imputation. Albert R. Stage. Imputation:. To use what we know about “everywhere” that may be useful, but not very interesting- the X’s, To fill in detail that is prohibitive to obtain, except on a sample- the Y’s,

adelio
Download Presentation

Transforms and other prestidigitations—or new twists in imputation.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transforms and other prestidigitations—or new twists in imputation. Albert R. Stage

  2. Imputation: • To use what we know about “everywhere” that may be useful, but not very interesting- the X’s, • To fill in detail that is prohibitive to obtain, except on a sample- the Y’s, • By finding surrogates based on similarity of the X’s.

  3. Topics • Measures of similarity (a few in particular) • Alternative MSN distance function leading to some improved estimates • Transformations that improve resolution • On the X-side (known everywhere) • On the Y-side (known for sample only)

  4. Euclidean/Mahalanobis Chord Angular Geodesic Manhattan Canberra Clark Bray-Curtis Marczewski-Steinhaus 1-Kulczynski Pinkham-Pearson Gleason Ellenberg Pandeya Chi-square 1-Correlation 1-similarity ratio Kendall difference Faith intermediate Uppsala coefficient Distance measures for interval and ratio scale variables (Podani 2000)

  5. Symmetric for 0/1 Simple matching Euclidean Rogers-Tanimoto Sokal-Sneath Anderberg I Anderberg II Correlation Yule I Yule II Hamann Asymmetric for 0/1 Baroni-Urbani-Buser I Baroni-Urbani-Buser II Russell-Rao Faith I Faith II Ignore 0 Jaccard Sorenson Chord Kulczynski Sokal-Sneath II Mountford Distance measures for binary variables Podani (2000)

  6. Distance function in matrix notation D2iu = mini [ (Xi-Xu) W (Xi-Xu)’ ] • Where, for • Euclidean distance: W = I (Identity matrix) • Mahalanobis distance: W = S-1 (Inverse covariance matrix) • MSN (1995): W = GL2G’ with: G = matrix of coefficients of canonical variates L = diagonal matrix of canonical correlations

  7. Why Weight with Canonical Analysis? • Not degraded by non-informative X’s

  8. Why Weight with Canonical Analysis? • Not affected by non-informative Y’s if number of canonical pairs is determined by test of significance on rank.

  9. Moeur and Stage 1995 Assumes Y’s are “true” Searches for closest linear combination of Y’s Set of near neighbors sensitive to lower order canonical correlatrions Stage 2003 Assumes Y’s include measurement error Searches for closest linear combination of predicted Y’s Set of near neighbors less sensitive to random elements “swept” into lower order canonical corr. Comparison of MSN Distance Functions

  10. New regression alternative: d ij 2 = (Xi - Xj)   [ (I-2 )]-1 ’ (Xi - Xj )’  is the diagonal matrix of canonical correlationsfor k =

  11. Effect of change: • No change if only first canonical pair is used. • Regression alternative gives more relative weight to higher correlated pairs. • Effects on Root-Mean-Square Error of imputation are mixed: e.g. the following three data-sets---

  12. Statistics for three data sets

  13. Change in Relative Weights Depends on l2

  14. Transforming X-variables • To predict discrete classes of modal species composition (MSC) with Euclidean or Mahalanobis distance. • To predict continuous variables of species composition

  15. Ref. B Variable 2 Ref. A Variable 1 Euclidean Spectral angle Euclidean vs. Cosine (Spectral angle ) Target Obs.

  16. Let: Euclidean distance function with cosine transformation

  17. Effect of using cosine transformation of TM data on classification accuracy* * Kappa statistics **TM data ***TM+ Enhanced data

  18. Transforming the Y-variables • Variance considerations—want homogeneity • And a logical functional form for Y = f(X) • Transformations of species composition • Logarithm of species basal area • Percent basal area by species • Cosine spectral angle • Logistic • Evaluated by predicting discrete Plant Association Group (PAG), Users’ Guide example data (Oregon)

  19. Composition transformations: • Logistic: = ln[(Total BA – spp BA)/spp BA] = ln( Total BA – spp BA) – ln(spp BA) Represented in MSN by two separate variables. • Cosine Spectral Angle: = Spp BA /  (spp BA)2

  20. Implications of transforming • Imputed value derived from the neighbor, not directly from the model as in regression. • Neighbor selection may be improved by transforming Y’s and X’s . • Multivariate Y’s can resolve some indeterminacies from functions having extreme-value points (maxima or minima).

  21. MSN Software Now Includes Alternative Distance Functions: • Both canonical-correlation based distance functions. • Euclidean distance on normalized X’s. • Mahalanobis distance on normalized X’s. • You supply a weight matrix of your derivation. • K-nearest neighbors identification.

  22. So ?? • Of the many methods available for imputation of attributes, no one alternative is clearly superior for all data sets.

  23. http ://forest.moscowfsl.wsu.edu/gems/msn.html. Software Availability • E-mail: ncrookston@fs.fed.us • On the Web: • In print: Crookston, N.L., Moeur, M. and Renner, D.L. 2002. User’s guide to the Most Similar Neighbor Imputation Program Version 2. Gen. Tech. Rpt. RMRS-GTR-96. Ogden, UT: USDA Rocky Mountain Research Station 35p.

More Related