380 likes | 548 Views
Dissimilarity representation. Veronika Cheplygina (with slides by Bob, Marco and David). Representation. Generalization. Sensor. Representation. A. A. (area). B. B. (perimeter). Examples. Examples. max. max. Examples. a Î A , points of A b Î B , points of B
E N D
Dissimilarity representation Veronika Cheplygina (with slides by Bob, Marco and David)
Representation Generalization Sensor Representation A A (area) B B (perimeter) Representation
max max Examples aÎA, points of A b Î B, points of B d(a,b): Euclidean distance D(A,B) = max_a{min_b{d(a,b)}} D(B,A) = max_b{min_a{d(b,a)}} D(A,B) ≠ D(B,A) Hausdorff Distance : DH = max{D(A,B), D(B,A)}
Examples Strings: shapes, amino acid sequences… Alignment of strings X and Y DE (X,Y) = # of edit operations X Y(insertions, deletions, substitutions) DE ( cat, scan) = 2 Cat can scan
Examples A F B B E E F C D C D Graph ( Nodes, Edges, Attributes ) Distance (Graph_1, Graph_2 )
Unlabeled object x k-Nearest Neighbor Training set B Dissimilarities dij between all training objects A Only use k distances from dx.
Unlabeled object x Nearest Neighbor Training set B Dissimilarities dij between all training objects A We are not using DT. ! Can we use this information to do better?
Kernels? • K = N x N matrix of kernels / similarities • Used with support vector machines, BUT optimization only convex for positive semi-definite K (dis)similarity functions used in practice p.s.d. kernels
Alternatives for the Nearest Neighbor Rule • Embedding • Dissimilarity space Pekalska, Duin. The dissimilarity representation for pattern recognition. World Scientific, 2005.
Embedding Given n x n dissimilarity matrix D Is there a feature matrix X for which Dist(X,X) = D ?
Example Similarity of campaigns Kiescompas.nl 14
Classical scaling • Inner products matrix • Distances in D can be expressed in terms of G • We can do X G D • We want to do: D G X
Classical scaling • Rewrite G in terms of D (assume zero mean data)
Classical scaling • Eigendecomposition of G • Remember that • Therefore • Columns of V are eigenvectors, with corresponding eigenvalues on the diagonal of λ
Classical scaling • X is originally an n x p matrix, but X’ is an n x n matrix • n eigenvalues of G • p non-zero eigenvalues corresponding to dimensions with largest variance • n-p eigenvalues close to 0
PCA • Rotate zero-mean n x p matrix X to principal axes • Distances in X are preserved in Z • Configuration in d (d < p) dimensions (with largest variance), minimizes “fit” • CS solution of EuclDist(X,X) = Z
Euclidean - Non Euclidean - Non Metric Representation
Non-metric distance • Live example!
Non-metric distance • Live example! similar Dutch is German to
Non-metric distances Single-linkage clustering / variants of Hausdorff distance
Non-metric dissimilarities We are only given (an estimate of) D We want to find X such that Dist(X,X) = D If Dist is Euclidean, we can find X (up to a rotation). What if D is not Euclidean (or even metric)?
Pseudo-Euclidean Embedding • Euclidean D positive (or zero) eigenvalues: • More positive λi = larger variance along eigenvector • Further apart on dimensionwith λi > 0 = further apart in feature space • Non-Euclidean D p positive and q negative eigenvalues: • Further apart on dimension with λi< 0 ?
Pseudo-Euclidean Embedding • Non-Euclidean D p positive and q negative eigenvalues • Ignore? • Use absolute value? • D1 X D2, D1 ≠ D2 psem.m
Summary Embedding • Find a (possibly lower-dimensional) representation X from n x n dissimilarity matrix D • Exact (up to rotation) reconstruction with classical scaling for Euclidean D. Link to PCA. • D can be non-Euclidean or non-metric, this may be informative, open question how to deal with this
B A r2(d4) r1(d1) r3(d7) Dissimilarity space Training set Dissimilarities r1 r2 r3 Unlabeled object Selection of 3 objects for representation
Default R = T Selecting R Ì T Random Feature selection Clustering Sparse classifiers (Friday) Prototype selection r1 r2 r3
Example: NIST Digits 3 and 8 Pękalska, E., Duin, R.P.W. and Paclík, P. "Prototype selection for dissimilarity-based classifiers." Pattern Recognition 39.2 (2006): 189-208.
Nearest neighbor vs dissimilarity space knndc.m knnc.m clevald.m
Nearest neighbor vs dissimilarity space d(blue) d(red)
Dissimilarity representation is an alternative for features Classifiers can be built in: (pseudo-)Euclidean spaces by embedding dissimilarity space by selecting a representation set Conclusions
expert knowledge more information in dissimilarity matrix better performance less restrictions on dissimilarity matrix (non-Euclidean / non-metric) favorite classifier Conclusions