190 likes | 344 Views
Kernel based data fusion. Discussion of a Paper by G. Lanckriet. Paper. Overview. Problem : Aggregation of heterogeneous data Idea : Different data are represented by different kernels Question : How to combine different kernels in an elegant/efficient way?
E N D
Kernel based data fusion Discussion of a Paper by G. Lanckriet
Overview Problem: Aggregation of heterogeneous data Idea: Different data are represented by different kernels Question: How to combine different kernels in an elegant/efficient way? Solution: Linear combination and SDP Application: Recognition of ribosomal and membrane proteins
Linear combination of kernels • Resulting kernel K is positive definite (xTKx > 0 for x, provided i > 0 and xTKi x > 0 ) • Elegant aggregation of heterogeneous data • More efficient than training of individual SVMs • KCCA uses unweighted sum over individual kernels xTKx = x2K x2K weight kernel 0 x
Support Vector Machine square norm vector penalty term Hyperplane slack variables
Dual form quadratic, convex scalar 0 positive definite Lagrange multipliers • Maximization instead of minimization • Equality constraints • Lagrange multipliers instead of w,b, • Quadratic program (QP)
Inserting linear combination ugly Fixed trace,avoids trivial solution Combined kernel must be within the cone of positive semidefinite matrices
Cone and other stuff Positive semidefinite cone: A Positive semidefinite: xTAx ≥ 0, x The set of all symmetric positive semidefinite matrices of particular dimension is called the positive semidefinite cone. http://www.convexoptimization.com/dattorro/positive_semidefinate_cone.html
Semidefinite program (SDP) Fixed trace,avoids trivial solution positive semidefinite constraints
Dual form • Quadratically constraint quadratic program (QCQP) • QCQPs can be solved more efficiently than SDPs(O(n3) <-> O(n4.5)) • Interior point methods quadratic constraint
Interior point algorithm • Classical Simplex method follows edges of polyhedron • Interior point methods walk through the interior of the feasible region Linear program: maximize cTx subject to Ax < b x ≥ 0
Application • Recognition of ribosomal and membrane proteins in yeast • 3 Types of data • Amino acid sequences • Protein protein interactions • mRNA expression profiles • 7 Kernels • Empirical kernel map -> sequence homology • BLAST(B), Smith-Waterman(SW), Pfam • FFT -> sequence hydropathy • KD hydropathy profiles, padding, low-pass filter, FFT, RBF • Interaction kernel(LI) -> PPI • Diffusion(D) -> PPI • RBF(E) -> gene expression
Results • Combination of kernels performs better than individual kernels • Gene expression (E) most important for ribosomal protein recognition • PPI (D) most important for membrane protein recognition
Results • Small improvement compared to weights = 1 • SDP robust in the presence of noise • How performs SDP versus kernel weights derived from accuracy of individual SVMs? • Membrane protein recognition • Other methods use sequence information only • TMHMM designed for topology prediction • TMHMM not trained on yeast only
Why is this cool? Everything you ever dreamed of: • Optimization of C included(2-norm soft margin SVM =1/C) • Hyperkernels (optimize the kernel itself) • Transduction (learn from labeled & unlabeled samples in polynomial time) • SDP has many applications(Graph theory, combinatorial optimization, …)
Literature • Learning the kernel matrix with semidefinite programming G.R.G.Lanckrit et. al, 2004 • Kernel-based data fusion and its application to protein function prediction in yeastG.R.G.Lanckrit et. al, 2004 • Machine learning using HyperkernelsC.S.Ong, A.J.Smola, 2003 • Semidefinite optimizationM.J.Todd, 2001 • http://www-user.tu-chemnitz.de/~helmberg/semidef.html
Software • SeDuMi (SDP) • Mosek (QCQP, Java,C++, commercial) • YALMIP (Matlab) … http://www-user.tu-chemnitz.de/~helmberg/semidef.html