1 / 13

Multivariate Distance and Similarity

Multivariate Distance and Similarity. Robert F. Murphy Cytometry Development Workshop 2000. General Multivariate Dataset. We are given values of p variables for n independent observations Construct an n x p matrix M consisting of vectors X 1 through X n each of length p.

julio
Download Presentation

Multivariate Distance and Similarity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Distance and Similarity Robert F. Murphy Cytometry Development Workshop 2000

  2. General Multivariate Dataset • We are given values of p variables for n independent observations • Construct an n x p matrix M consisting of vectors X1 through Xn each of length p

  3. Multivariate Sample Mean • Define mean vector I of length p or vector notation matrix notation

  4. Multivariate Variance • Define variance vector s2 of length p matrix notation

  5. Multivariate Variance • or vector notation

  6. Covariance Matrix • Define a p x p matrix cov (called the covariance matrix) analogous to s2

  7. Covariance Matrix • Note that the covariance of a variable with itself is simply the variance of that variable

  8. Univariate Distance • The simple distance between the values of a single variable j for two observations i and l is

  9. Univariate z-score Distance • To measure distance in units of standard deviation between the values of a single variable j for two observations i and l we define the z-score distance

  10. Bivariate Euclidean Distance • The most commonly used measure of distance between two observations i and l on two variables j and k is the Euclidean distance

  11. Multivariate Euclidean Distance • This can be extended to more than two variables

  12. Effects of variance and covariance on Euclidean distance The ellipse shows the 50% contour of a hypothetical population. B A Points A and B have similar Euclidean distances from the mean, but point B is clearly “more different” from the population than point A.

  13. Mahalanobis Distance • To account for differences in variance between the variables, and to account for correlations between variables, we use the Mahalanobis distance

More Related