760 likes | 875 Views
Object Orie’d Data Analysis, Last Time. Kernel Embedding Embed data in higher dimensional manifold Gives greater flexibility to linear methods Support Vector Machines Aimed at very non-Gaussian Data E.g. from Kernel Embedding Distance Weighted Discrimination HDLSS Improvement of SVM.
E N D
Object Orie’d Data Analysis, Last Time • Kernel Embedding • Embed data in higher dimensional manifold • Gives greater flexibility to linear methods • Support Vector Machines • Aimed at very non-Gaussian Data • E.g. from Kernel Embedding • Distance Weighted Discrimination • HDLSS Improvement of SVM
Support Vector Machines Graphical View, using Toy Example: • Find separating plane • To maximize distances from data to plane • In particular smallest distance • Data points closest are called support vectors • Gap between is called margin
Support Vector Machines Graphical View, using Toy Example:
Support Vector Machines Forgotten last time, Important Extension: Multi-Class SVMs Hsu & Lin (2002) Lee, Lin, & Wahba (2002) • Defined for “implicit” version • “Direction Based” variation???
Support Vector Machines Also forgotten last time, Toy examples illustrating Explicit vs. Implicit Kernel Embedding As well as effect of window width, σ on Gaussian kernel embedding
SVMs, Comput’n & Embedding For an “Embedding Map”, e.g. Explicit Embedding: Maximize: Get classification function: • Straightforward application of embedding • But loses inner product advantage
SVMs, Comput’n & Embedding Implicit Embedding: Maximize: Get classification function: • Still defined only via inner products • Retains optimization advantage • Thus used very commonly • Comparison to explicit embedding? • Which is “better”???
Support Vector Machines Target Toy Data set:
Support Vector Machines Explicit Embedding, window σ = 0.1:
Support Vector Machines Explicit Embedding, window σ = 1:
Support Vector Machines Explicit Embedding, window σ = 10:
Support Vector Machines Explicit Embedding, window σ = 100:
Support Vector Machines Notes on Explicit Embedding: • Too small Poor generalizability • Too big miss important regions • Classical lessons from kernel smoothing • Surprisingly large “reasonable region” • I.e. parameter less critical (sometimes?) Also explore projections (in kernel space)
Support Vector Machines Kernel space projection, window σ = 0.1:
Support Vector Machines Kernel space projection, window σ = 1:
Support Vector Machines Kernel space projection, window σ = 10:
Support Vector Machines Kernel space projection, window σ = 100:
Support Vector Machines Kernel space projection, window σ = 100:
Support Vector Machines Notes on Kernel space projection: • Too small • Great separation • But recall, poor generalizability • Too big no longer separable • As above: • Classical lessons from kernel smoothing • Surprisingly large “reasonable region” • I.e. parameter less critical (sometimes?) Also explore projections (in kernel space)
Support Vector Machines Implicit Embedding, window σ = 0.1:
Support Vector Machines Implicit Embedding, window σ = 0.5:
Support Vector Machines Implicit Embedding, window σ = 1:
Support Vector Machines Implicit Embedding, window σ = 10:
Support Vector Machines Notes on Implicit Embedding: • Similar Large vs. Small lessons • Range of “reasonable results” Seems to be smaller (note different range of windows) • Much different “edge” behavior Interesting topic for future work…
Distance Weighted Discrim’n 2-d Visualization: Pushes Plane Away From Data All Points Have Some Influence
Distance Weighted Discrim’n References for more on DWD: • Current paper: Marron, Todd and Ahn (2007) • Links to more papers: Ahn (2007) • JAVA Implementation of DWD: caBIG (2006) • SDPT3 Software: Toh (2007)
Batch and Source Adjustment Recall from Class Notes 8/28/07 • For Stanford Breast Cancer Data (C. Perou) • Analysis in Benito, et al (2004) Bioinformatics, 20, 105-114. https://genome.unc.edu/pubsup/dwd/ • Adjust for Source Effects • Different sources of mRNA • Adjust for Batch Effects • Arrays fabricated at different times
Why not adjust using SVM? • Major Problem: Proj’d Distrib’al Shape Triangular Dist’ns (opposite skewed) • Does not allow sensible rigid shift
Why not adjust using SVM? • Nicely Fixed by DWD • Projected Dist’ns near Gaussian • Sensible to shift
Why not adjust by means? • DWD is complicated: value added? • Xuxin Liu example… • Key is sizes of biological subtypes • Differing ratio trips up mean • But DWD more robust (although still not perfect)
Why not adjust by means? Next time: Work in before and after, slides like 138-141 from DWDnormPreso.ppt In Research/Bioinf/caBIG
Why not adjust by means? DWD robust against non-proportional subtypes… Mathematical Statistical Question: Are there mathematics behind this? (will answer next time…)
DWD in Face Recognition • Face Images as Data (with M. Benito & D. Peña) • Male – Female Difference? • Discrimination Rule? • Represented as long vector of pixel gray levels • Registration is critical
DWD in Face Recognition, (cont.) • Registered Data • Shifts and scale • Manually chosen • To align eyes and mouth • Still large variation • See males vs. females???
DWD in Face Recognition , (cont.) • DWD Direction • Good separation • Images “make sense” • Garbage at ends? (extrapolation effects?)
DWD in Face Recognition , (cont.) • Unregistered Version • Much blurrier • Since features don’t properly line up • Nonlinear Variation • But DWD still works • Can see M-F differ’ce?
DWD in Face Recognition , (cont.) • Interesting summary: • Jump between means (in DWD direction) • Clear separation of Maleness vs. Femaleness
DWD in Face Recognition , (cont.) • Fun Comparison: • Jump between means (in SVM direction) • Also distinguishes Maleness vs. Femaleness • But not as well as DWD
DWD in Face Recognition , (cont.) Analysis of difference: Project onto normals • SVM has “small gap” (feels noise artifacts?) • DWD “more informative” (feels real structure?)
DWD in Face Recognition, (cont.) • Current Work: • Focus on “drivers”: (regions of interest) • Relation to Discr’n? • Which is “best”? • Lessons for human perception?
Outcomes Data Breast Cancer Study (C. M. Perou): • Outcome of interest = death or survival • Connection with gene expression? Approach: • Treat death vs. survival during study as “classes” • Find “direction that best separates the classes”
Outcomes Data Find “direction that best separates the classes”
Outcomes Data Find “direction that best separates the classes”