Distance Metric

Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y)is positive definite: if (X  Y), d(X, Y) > 0 if (X = Y), d(X, Y) = 0 d(X, Y) issymmetric: d(X, Y) = d(Y, X) d(X, Y) satisfies triangle inequality:d(X, Y) + d(Y, Z)  d(X, Z)

Standard Distance Metrics Minkowski distance or Lp distance, Manhattan distance, (P = 1) Euclidian distance, (P = 2) Max distance, (P = )

An Example Y (6,4) Z X (2,1) A two-dimensional space: Manhattan, d1(X,Y)= XZ+ ZY =4+3 = 7 Euclidian, d2(X,Y)= XY = 5 Max, d(X,Y)= Max(XZ, ZY) = XZ = 4 d1d2 d For any positive integer p,

HOBbit Similarity These notes contain NDSU confidential & Proprietary material. Patents pending on bSQ, Ptree technology Bit position: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 x1: 0 1 10 1 0 0 1 x2: 0 1 0 11 1 0 1 y1: 0 1 11 1 1 0 1 y2: 0 1 0 1 0 0 0 0 HOBbitS(x1, y1) = 3 HOBbitS(x2, y2) = 4 Higher Order Bit (HOBbit) similarity: HOBbitS(A, B) = A, B: two scalars (integer) ai, bi :ith bit of A and B (left to right) m : number of bits

HOBbit Distance (High Order Bifurcation bit) Example: Bit position: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 x1: 0 1 10 1 0 0 1 x2: 0 1 0 11 1 0 1 y1: 0 1 11 1 1 0 1 y2: 0 1 0 1 0 0 0 0 HOBbitS(x1, y1) = 3 HOBbitS(x2, y2) = 4 dv(x1, y1) = 8 – 3 = 5 dv(x2, y2) = 8 – 4 = 4 HOBbit distance between two scalar value A and B:dv(A, B)= m – HOBbit(A, B) HOBbit distance for X and Y: In our example (considering 2-dim data): dh(X, Y) = max (5, 4) = 5

HOBbit Distance Is a Metric HOBbit distance is positive definite if (X = Y), = 0 if (XY), > 0 HOBbit distance is symmetric HOBbit distance holds triangle inequality

Neighborhood of a Point 2r 2r 2r 2r X X X X T T T T Neighborhood of a target point, T, is a set of points, S, such thatXSif and only if d(T, X) r Manhattan Euclidian Max HOBbit If Xis a point on the boundary, d(T, X) = r

Decision Boundary Manhattan Euclidian Max Max Euclidian Manhattan  > 45  < 45 X A A A A A R1 B B B B B d(A,X) d(B,X) R2 D decision boundary between points A and B, is the locus of the point X satisfying d(A, X) = d(B, X) Decision boundary for HOBbit Distance is perpendicular to axis that makes max distance Decision boundaries for Manhattan, Euclidean and max distance

Minkowski Metrics ? Lp-metrics (aka: Minkowski metrics) dp(X,Y) = (i=1 to n wi|xi - yi|p)1/p (weights, wi assumed =1)Unit DisksBoundary p=1 (Manhattan) p=2 (Euclidean) p=3,4,… . . P= (chessboard) P=½,⅓, ¼, … dmax≡ max|xi - yi|  d≡ limp  dp(X,Y). Proof (sort of) limp  { i=1 to n aip }1/p ‎ max(ai) ≡b. For p large enough, other aip << bp since y=xp increasingly concave, so i=1 to n aip  k*bp(k=duplicity of b in the sum), so {i=1 to n aip }1/p  k1/p*b and k1/p1

P>1Lpmetrics q x1 y1 x2 y2 Lq distance x to y 2 .5 0 .5 0 .7071067812 4 .5 0 .5 0 .5946035575 9 .5 0 .5 0 .5400298694 100 .5 0 .5 0 .503477775 MAX .5 0 .5 0 .5 x y q x1 y1 x2 y2 Lq distance x to y 2 .71 0 .71 0 1.0 3 .71 0 .71 0 .8908987181 7 .71 0 .71 0 .7807091822 100 .71 0 .71 0 .7120250978 MAX .71 0 .71 0 .7071067812 x y q x1 y1 x2 y2 Lq distance x to y 2 .99 0 .99 0 1.4000714267 8 .99 0 .99 0 1.0796026553 100 .99 0 .99 0 .9968859946 1000 .99 0 .99 0 .9906864536 MAX .99 0 .99 0 .99 x y x q x1 y1 x2 y2 Lq distance x to y 2 1 0 1 0 1.4142135624 9 1 0 1 0 1.0800597389 100 1 0 1 0 1.0069555501 1000 1 0 1 0 1.0006933875 MAX 1 0 1 0 1 y q x1 y1 x2 y2 Lq distance x to y 2 .9 0 .1 0 .9055385138 9 .9 0 .1 0 .9000000003 100 .9 0 .1 0 .9 1000 .9 0 .1 0 .9 MAX .9 0 .1 0 .9 y x x q x1 y1 x2 y2 Lq distance x to y 2 3 0 3 0 4.2426406871 3 3 0 3 0 3.7797631497 8 3 0 3 0 3.271523198 100 3 0 3 0 3.0208666502 MAX 3 0 3 0 3 y x q x1 y1 x2 y2 Lq distance x to y 6 90 0 45 0 90.232863532 9 90 0 45 0 90.019514317 100 90 0 45 0 90 MAX 90 0 45 0 90 y

x P<1Lpmetrics q x1 y1 x2 y2 Lq distance x to y 1 .1 0 .1 0 .2 .8 .1 0 .1 0 .238 .4 .1 0 .1 0 .566 .2 .1 0 .1 0 3.2 .1 .1 0 .1 0 102 .04 .1 0 .1 0 3355443 .02 .1 0 .1 0 112589990684263 .01 .1 0 .1 0 1.2676 E+29 2 .1 0 .1 0 .141421356 x y y q x1 y1 x2 y2 Lq distance x to y 1 .5 0 .5 0 1 .8 .5 0 .5 0 1.19 .4 .5 0 .5 0 2.83 .2 .5 0 .5 0 16 .1 .5 0 .5 0 512 .04 .5 0 .5 0 16777216 .02 .5 0 .5 0 5.63 E+14 .01 .5 0 .5 0 6.34 E+29 2 .5 0 .5 0 .7071 q x1 y1 x2 y2 Lq distance x to y 1 .9 0 0.1 0 1 .8 .9 0 0.1 0 1.098 .4 .9 0 0.1 0 2.1445 .2 .9 0 0.1 0 10.82 .1 .9 0 0.1 0 326.27 .04 .9 0 0.1 0 10312196.962 .02 .9 0 0.1 0 341871052443154 .01 .9 0 0.1 0 3.8 E+29 2 .9 0 0.1 0 .906 y x d1/p(X,Y) = (i=1 to n |xi - yi|1/p)p P<1 For p=0 (lim as p0), Lp doesn’t exist (Does not converge.)

Min dissimilarity function The dmin function ( dmin(X,Y) = min i=1 to n|xi - yi| ) is strange. It is not even a psuedo-metric. The Unit Disk is: And the neighborhood of the blue point relative to the red point (the neighborhood of points closer to the blue than the red) is strangely shaped! http://www.cs.ndsu.nodak.edu/~serazi/research/Distance.html

Other Interesting Metrics Canberra metric: dc(X,Y) = (i=1 to n |xi – yi| / (xi + yi) normalized manhattan distance Square Cord metric: dsc(X,Y) = i=1 to n( xi – yi )2 Already discussed as Lp with p=1/2 Squared Chi-squared metric: dchi(X,Y) = i=1 to n (xi – yi)2/ (xi + yi) Scalar Product metric: dchi(X,Y) = X • Y = i=1 to n xi * yi Hyperbolic metrics: (which map infinite space 1-1 onto a sphere) Which are rotationally invariant? Translation invariant? Other? Some notes on distance functions can be found at http://www.cs.ndsu.NoDak.edu/~datasurg/distance_similarity.pdf

Distance Metric

Distance Metric

Presentation Transcript

Understanding Metric Conversions Metric Prefixes

A Survey on Distance Metric Learning (Part 1)

Measuring Length and Distance in Metric Units

A common cortical metric for spatial, temporal, and social distance

Geographically Weighted Regression Using a Non-Euclidean Distance Metric

METRIC

Mining Social Images with Distance Metric Learning for Automated Image Tagging

LearnMet: Learning a Domain-Specific Distance Metric for Graph Mining

Metric

Distance metric learning Vs. Fisher discriminant analysis

Virtual Distance: A Generalized Metric for Overlay Tree Construction

Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis

Metric

A Survey on Distance Metric Learning (Part 2)

Distance metric learning, with application to clustering with side-information

Learning Instance Specific Distance Using Metric Propagation

An Altering Distance Function in Fuzzy Metric Fixed Point Theorems

A Survey on Distance Metric Learning (Part 2)

Distance Metric Learning: A Comprehensive Survey