440 likes | 477 Views
Classification on Manifolds. Suman K. Sen joint work with Dr. J. S. Marron & Dr. Mark Foskey. Outline. Introduction Introduction to M-reps Goal Classification Popular methods of classification Why do M-reps need special treatment ? Proposed Approach Results Future Research.
E N D
Classification on Manifolds Suman K. Sen joint work with Dr. J. S. Marron & Dr. Mark Foskey
Outline • Introduction • Introduction to M-reps • Goal • Classification • Popular methods of classification • Why do M-reps need special treatment ? • Proposed Approach • Results • Future Research
2 groups: Patients (56) and Controls (26). Can we say anything about the shape variation between the 2 groups?? Styner et al. (2004) m-rep model of hippocampus
Visualization: Change of shape along separating direction • Blue : rightly classified • Magenta: misclassified • +’s & 0’s are the two classes • X-axis : scores along separating direction • Right panel: projected model
DWD in Face Recognition, (cont.) • Registered Data • Shifts and scale • Manually chosen • To align eyes and mouth • Still large variation • See males vs. females???
DWD in Face Recognition , (cont.) • DWD Direction • Good separation • Images “make sense” • Garbage at ends? (extrapolation effects?)
Visualization: Change of shape along separating direction • Separation better • shape change – flattening at the top
Medial Atom M-reps
Definitions • Geodesic: Curve locally minimizing distance between points • Exponential Map Expp(X): Maps point X ϵ TpM on to the Manifold M along geodesics
ExpMaps and LogMaps X ϵ TpM, 9x(t) with X as its initial velocity x(t) = Expp(tX). ||x/t||(t) = ||X|| , therefore preserves distance LogMap Inverse of ExpMap d(x,y)= ||Logx(y)|| = ||Logy(x)||
Literature Review on M-reps • Medial Locus, proposed by Blum (1967). • Property studied in 2D by Blum and Nagel (1978) and in 3D by Nackman and Pizer (1985). • Pizer et al. (1999) describes discrete M-reps. • Yushkevich et al. (2003) describes continuous M-reps. • Terriberry and Gerig (2006) treats continuous M-reps with branching.
Classification • Xi: attributes describing individuals (i=1…n) • n: # of individuals • Yi: class label Є {1,2,…,K} • K: # of classes NOTE: We work with only two groups; and for some mathematical convenience Y Є {1,-1}. Goal of Classification: Given a set of (Xi, Yi), find a rule f(x) that assigns a new individual to a group on the basis of its attributes X.
Classification: Popular Methods • Mean Difference (MD): assigns new observation to the class whose mean is closest to it. • Fisher (1936): improvement over MD; optimal rule when the two classes are from Normal distribution & have same covariance matrix. Now called Fisher Linear Discrimination (FLD). • Vapnik (1982, 1995): Support Vector Machine (SVM). Also see Burges (1998). • Marron et al. (2004): Distance Weighted Discrimination (DWD); unlike SVM it does not suffer from “data piling” and improves generalizability in High Dimension Low Sample Size (HDLSS) situations. • Kernel Embedding: Linear Classification done after embedding data in higher dimensional space. See SchÖlkopf and Smola (2002). http://www.kernel-machines.org/publications.html : list of publications on kernel methods
Classification These methods give us: • Separating plane • Normal vector (separating direction) c) Projections of data on the separating direction
Different approach in Manifolds ? • Difficult to describe separating surfaces • No inner products • Easier to calculate distances
Importance of Geodesic Distance jjjjjjj
Black and blue points represent different groups. Figure shows that choice of base point has a significant effect. Choice of Base Point
Key concept – control points (representative of a class). Use distance from control points. Proposed Approach in Manifolds
Key concept – control points (representative of a class). Use distance from control points. Proposed Approach in Manifolds
Proposed Approach in Manifolds • Key concept – control points (representative of a class). • Use distance from control points. • Goal: find “good” control points. For e.g., in the sphere, control points corr. to red boundary separates the data better.
Decision Function • f(x) = d2(c-1,x) – d2(c1,x) • If f(x) > 0, then x1, else x -1 • If yf(x) > 0, correct decision ( y is the class label (+/-1) ) < 0, wrong NOTE: H={x: f(x)=0} : the separating boundary. Level set for f(x) = 0 f(x) = 1 c C-1 o o H c1
Proposed Methods 1) Geodesic Mean Difference (GMD) Analogous to Mean Difference Method, we take the two control points as the geodesic mean of the two classes. 2) Iterative Tangent Plane SVM (ITanSVM) Standard SVM done on tangent plane, with the base point carefully chosen through iterative steps. 3) Manifold SVM (MSVM) A generalization of the SVM criterion to Manifolds.
ITanSVM: The Algorithm • Calculate mean of the 2 classes, and then compute their mean (c). Construct the tangent plane at c.
ITanSVM: The Algorithm 2) Compute the SVM decision line on the tangent plane
ITanSVM: The Algorithm 3) Find the pair of points so that a) SVM line is the perpendicular bisector of the line joining the points, and b) The distance between the new points to the old points are minimum.
ITanSVM: The Algorithm 4) Map these 2 new points back to the manifold.
Manifold SVM (MSVM): the setup • Decision fn: • = Distance of point xi from the separating plane given by c1 and c-1 • Goal: find c1 and c-1 such that: • maximize the min distance of the training points to H (one of the ways to look at SVM that generalizes to manifolds) .
SVM Separating rule (w,b) between 2 groups w.x + b =0
Results: Hippocampi DataSeparation shown by different methods GMD MSVM TSVM ITanSVM
Results: Generated Ellipsoids • 25 randomly distorted (bending, twisting, tapering) ellipsoids. • Two groups • 11 with negative twist parameter. • 14 with positive twist parameter.
Extend DWD for Manifold data. Marron et al. (2004): Distance Weighted Discrimination (DWD): unlike SVM it does not suffer from “data piling” and improves generalizability in HDLSS situations. Future Research
Future Research • Application to Diffusion Tensor Imaging data (at each voxel data observed is a 3X3 positive definite matrix). • Develop MSVM for multi-category case.