Measuring Distance

Measuring Distance Input for Multidimensional Scaling and Clustering

Distances and Similarities • Both are ways of measuring how similar two objects are • Distances increase as objects are less similar. The distance of an object to itself is 0 • Similarities increase as objects are more similar. The similarity of an object to itself is the maximum value for the similarity measure

Distance Examples • Mileage between two towns measured in straight line (Euclidian) distance (“as the crow flies”), as driving distance, or as great circle (spherical) distance • Instead of geographic locations we can treat measurements such as length, width, and thickness of an artifact as defining its position

Similarity Examples • The number of characteristics two objects have in common (cultural traits, genes, presence/absence traits) • Similarity measures can be converted to distances by subtracting each similarity from the maximum possible similarity

Interval/Ratio Measures • Manhattan Distance (or City Block, 1-norm) • Euclidian Distance (and Squared Euclidian Distance, 2-norm) • Minkowski Distance (p-norm) • Chebyshev Distance (Maximum Distance, infinite norm)

Definitions

Counts • Ecologists use counts of species between plots to analyze compositional changes in community structure • Bray-Curtis compares the number of specimens and number of overlapping species

DefinitionsBray Curtis Dissimilarity Note: If samples j and k are percentages, then the denominator becomes 200.

Ordinal Measures • Few measures specifically for rank data, but rank correlation coefficients (spearman, Kendall) can be used

Dichotomies • Can use interval/ratio measures • Numerous options based on 2x2 table • Many similarity measures based on weighting of presence/presence and absence/absence • Subtract from 1 to create distances

Definitions Simple Matching Coefficient: (a+d)/(a+b+c+d) Jacard’s Coefficient (asymmetric binary): a/(a+b+c) Phi and Yule’s Q measures of association ade4 and proxy have many different options for dichotomies

Nominal Variables • Similarity can be measured with chi-square based measures • Convert to multiple dichotomies • E.g. Temper: Sand, Silt, Gravel becomes three variables: TSand, TSilt, Tgravel • Then use measures for dichotomies/ metric variables

Multiple Types • Gower’s Index is the only one that computes a similarity index using variables with different levels of measurement. Take the mean of the variables: • Presence/Absence – Jaccard • Categorical – 1 if the same, 0 if not • Interval/Ratio/Ranks – absolute difference divided by range

Issues • Weighting – how to weight variables with different variances – standardization, weighting • Correlations between variables – how (and whether) to take correlations into account (Mahalanobis Distances)

Distance Matrix • For simple analyses, dist() in base R provides euclidean, maximum, manhattan, canberra, binary (Jaccard), and minkowski • Other packages including different measures: Many others. See packages ade4, amap, cluster, ecodist, labdsv, proxy, and vegan

# Load Darl # Rcmdr to create scatterplot matrix > Euclid <- dist(Darl[,2:5]) > Euclid 35-3043 35-2871 35-2866 36-3619 36-3520 35-2871 11.437657 35-2866 5.380520 6.542935 36-3619 14.621217 3.682391 9.570266 36-3520 15.309148 4.068169 10.163661 1.757840 36-3036 7.760155 4.442972 2.495997 7.195832 7.860662 > scatterplot(Width~Length, reg.line=lm, smooth=FALSE, spread=FALSE, pch=16, id.n=6, boxplots=FALSE, ellipse=TRUE, grid=FALSE, data=Darl) > mahalanobis(Darl[,2:3], mean(Darl[,2:3]), cov=cov(Darl[,2:3])) 35-3043 35-2871 35-2866 36-3619 36-3520 36-3036 2.2577596 1.8173684 0.4641912 2.9652763 1.7527347 0.7426699

> install.packages("ecodist") > library(ecodist) > Mahal <- distance(Darl[,2:3], method="mahalanobis") > Mahal 35-3043 35-2871 35-2866 36-3619 36-3520 35-2871 4.9367446 35-2866 0.6900956 2.8905096 36-3619 8.5903617 7.5849187 4.7250487 36-3520 6.8826044 0.6084649 3.6631704 4.9720621 36-3036 2.4467510 4.8835727 0.8163226 1.9192663 4.3901066

# Rcmdr > .PC <- princomp(~Length+Weight, cor=TRUE, data=Darl) > Darl$PC1 <- .PC$scores[,1] > Darl$PC2 <- .PC$scores[,2] # Typed commands > PCDist <- dist(Darl[,6:7]) > PCDist 35-3043 35-2871 35-2866 36-3619 36-3520 35-2871 2.5498737 35-2866 2.1968323 1.1918768 36-3619 3.7858013 1.2539806 1.9883494 36-3520 4.2220041 1.8034110 2.1957351 0.7029308 36-3036 2.6677120 0.9201698 0.5717135 1.4339465 1.6290415 > scatterplot(PC2~PC1, reg.line=FALSE, smooth=FALSE, spread=FALSE, grid=FALSE, boxplots=FALSE, pch=16, ellipse=TRUE, id.n=6, span=0.5, data=Darl) [1] "35-3043" "35-2866" "36-3619" "36-3520" "35-2871" "36-3036"

Measuring Distance

Measuring Distance

Presentation Transcript

Long Distance Measuring.

DME (Distance Measuring Equipment )

Distance Measuring

Measuring Length and Distance in Metric Units

Measuring Distance between Language Varieties

Measuring Distance to a Star

MEASURING DISTANCE TO THE STARS

Measuring Distance in Google Earth

Measuring distance to the largest structures in the Universe

Precision Displacement Measurement via a Distance Measuring Interferometer (DMI)

Measuring Distance and Size of Stars

Map Reading – Measuring Distance

1.5 Measuring Segments 1.6 Distance Formula

Distance Measuring Equipment DME

Measuring the Distance to Stars

Skills Sheet: ‘Measuring distance’

6 Measuring length and distance

LED Distance Measuring Sensors Global Market Outlook

Using S2 Laser Distance Meter Your best Measuring partner

A Study on Measuring Distance between Two Trees

Phase Shift Laser Distance Measuring Module