Source: Gert Lanckriet s slides

1. Source: Gert Lanckriet�s slides In this� KM / SDPIn this� KM / SDP

2. Project and Presentation Final project Due on April 26th Length: 20�30 pages Hand in a hard copy, as well as an electronic copy of the report and any related source codes (blackboard). Presentation (April 26th and May 1st) 5-minute presentation for each student Email TA the slides one day before the presentation jianhuic@gmail.com

3. Overview Find a mapping f such that, in the new feature space, problem solving is easier (e.g. linear). SVM, PCA, LDA, CCA, etc The kernel is defined as the inner product between data points in this new feature space. Similarity measure Valid kernels Kernel construction Kernels from pairwise similarities Diffusion kernels for graphs Kernels for vectors Kernels for abstract data Documents (string) Protein Sequence New kernels from existing kernels

4. How to choose the optimal kernel? Many different types of kernels for the same data. Different kernels for protein sequences Many different kernels from different types of data Different data sources in bioinformatics Question: How to choose the optimal kernel? Active research in machine learning Simple approach: kernel alignment (with the kernel based on label)

5. Outline of lecture Introduction Kernel based learning Kernel design for different data sources Learning the optimal Kernel Experiments

6. During the past decade, a heterogeneous spectrum of data became available describing the genome: - Seq. Data -> similarities between proteins / genes - mRNA expression levels associated with a gene: under different experimental conditionsDuring the past decade, a heterogeneous spectrum of data became available describing the genome: - Seq. Data -> similarities between proteins / genes - mRNA expression levels associated with a gene: under different experimental conditions

7. Membrane protein prediction

8. Different data sources are likely to contain different and thus partly independent information about the task at hand Protein-protein interactions are best expressed as graphsDifferent data sources are likely to contain different and thus partly independent information about the task at hand Protein-protein interactions are best expressed as graphs

9. Kernel-based learning methods have already proven to be a very useful tool in bioinformaticsKernel-based learning methods have already proven to be a very useful tool in bioinformatics

10. Kernel methods work by embedding data items (genes, proteins, etc.) into a (possibly high dimensional) Euclidean vector space. The embedding is performed implicitely: instead of giving explicit coordinates, the inner product is specified. This can be done by defining a kernel function that specifies the inners product between any pair of data items (whichever needed), this function can be regarded as similarity measure betwene data items. When the amount of data is finite (e.g. here: finite number of genes under consideration), the kernel values between all pair of data points can be organized in a kernel matrix: Gram matrix, which fully describes that embedding. A matrix that is symmm and pos def is a valid kernel matrix in a sense that a mapping/embedding exists�.Kernel methods work by embedding data items (genes, proteins, etc.) into a (possibly high dimensional) Euclidean vector space. The embedding is performed implicitely: instead of giving explicit coordinates, the inner product is specified. This can be done by defining a kernel function that specifies the inners product between any pair of data items (whichever needed), this function can be regarded as similarity measure betwene data items. When the amount of data is finite (e.g. here: finite number of genes under consideration), the kernel values between all pair of data points can be organized in a kernel matrix: Gram matrix, which fully describes that embedding. A matrix that is symmm and pos def is a valid kernel matrix in a sense that a mapping/embedding exists�.

11. Kernel methods work by embedding data items (genes, proteins, etc.) into a (possibly high dimensional) Euclidean vector space. The embedding is performed implicitely: instead of giving explicit coordinates, the inner product is specified. This can be done by defining a kernel function that specifies the inners product between any pair of data items (whichever needed), this function can be regarded as similarity measure betwene data items. When the amount of data is finite (e.g. here: finite number of genes under consideration), the kernel values between all pair of data points can be organized in a kernel matrix: Gram matrix, which fully describes that embedding. A matrix that is symmm and pos def is a valid kernel matrix in a sense that a mapping/embedding exists�.Kernel methods work by embedding data items (genes, proteins, etc.) into a (possibly high dimensional) Euclidean vector space. The embedding is performed implicitely: instead of giving explicit coordinates, the inner product is specified. This can be done by defining a kernel function that specifies the inners product between any pair of data items (whichever needed), this function can be regarded as similarity measure betwene data items. When the amount of data is finite (e.g. here: finite number of genes under consideration), the kernel values between all pair of data points can be organized in a kernel matrix: Gram matrix, which fully describes that embedding. A matrix that is symmm and pos def is a valid kernel matrix in a sense that a mapping/embedding exists�.

12. Kernel methods work by embedding data items (genes, proteins, etc.) into a (possibly high dimensional) Euclidean vector space. The embedding is performed implicitely: instead of giving explicit coordinates, the inner product is specfied. This can be done by defining a kernel function that specifies the inners product between any pair of data items (whichever needed), this function can be regarded as similarity measure between data items. When the amount of data is finite (e.g. here: finite number of genes under consideration), the kernel values between all pair of data points can be organized in a kernel matrix: Gram matrix, which fully describes that embedding. A matrix that is symmm and pos def is a valid kernel matrix in a sense that a mapping/embedding exists�.Kernel methods work by embedding data items (genes, proteins, etc.) into a (possibly high dimensional) Euclidean vector space. The embedding is performed implicitely: instead of giving explicit coordinates, the inner product is specfied. This can be done by defining a kernel function that specifies the inners product between any pair of data items (whichever needed), this function can be regarded as similarity measure between data items. When the amount of data is finite (e.g. here: finite number of genes under consideration), the kernel values between all pair of data points can be organized in a kernel matrix: Gram matrix, which fully describes that embedding. A matrix that is symmm and pos def is a valid kernel matrix in a sense that a mapping/embedding exists�.

14. Find good hyperplane (w,b) Rd+1 that classifies this and future data points as good as possible

16. Intuition (Vapnik, 1965) if linearly separable: Separate the data Place hyerplane �far� from the data: large margin

17. If not linearly separable: Allow some errors Still, try to place hyerplane �far� from each class

20. Hand-writing recognition (e.g., USPS) Computational biology (e.g., micro-array data) Text classification Face detection Face expression recognition Time series prediction (regression) Drug discovery (novelty detection)

21. Kernel-based learning methods represent data by means of a kernel matrix or function, which defines similarities between pairs of genes, proteins, � Such similarities can be established using a broad spectrum of data (examples later on), as long as the corresponding kernel matrix is positive definite: it�s ok! In that case, we can interprete it is the inner products in some high dimensional space, in which we can train our favorite linear classification algorithm� Kernel matrix <-> kernel function So we can have a very heterogeneous set of data up here, and every kernel function/matrix is geared towards a specific type of data, thus extracting a specific type of information from a data set. Just like this each set of data describes the genome partially, in a heteregeneous way, so does each kernel, but in a homogeneous way (all compatible matrices). So, here we have a chance to fuse the many partial descriptions of the data: by combining/fusing/mixing those compatible kernel matrices in a way that is statistically optimal, computationally efficient and robust, we can try to find a kernel K that best represents all of the information available for a given learning task. We�ll explain further on how this can be accomplished.Kernel-based learning methods represent data by means of a kernel matrix or function, which defines similarities between pairs of genes, proteins, � Such similarities can be established using a broad spectrum of data (examples later on), as long as the corresponding kernel matrix is positive definite: it�s ok! In that case, we can interprete it is the inner products in some high dimensional space, in which we can train our favorite linear classification algorithm� Kernel matrix <-> kernel function So we can have a very heterogeneous set of data up here, and every kernel function/matrix is geared towards a specific type of data, thus extracting a specific type of information from a data set. Just like this each set of data describes the genome partially, in a heteregeneous way, so does each kernel, but in a homogeneous way (all compatible matrices). So, here we have a chance to fuse the many partial descriptions of the data: by combining/fusing/mixing those compatible kernel matrices in a way that is statistically optimal, computationally efficient and robust, we can try to find a kernel K that best represents all of the information available for a given learning task. We�ll explain further on how this can be accomplished.

23. Each matrix entry is an mRNA expression measurement. Each column is an experiment. Each row corresponds to a gene.

24. Normalized scalar product Similar vectors receive high values, and vice versa.

25. Use general similarity measurement for vector data: Gaussian kernel

30. Pairwise interactions can be represented as a graph or a matrix. The simplest kernel counts the number of shared interactions between each pair.

31. A general method for establishing similarities between nodes of a graph. Based upon a random walk. Efficiently accounts for all paths connecting two nodes, weighted by path lengths.

32. Integral plasma membrane proteins serve several functions. Often, one divides them into four classes: \emph{transporters}, \emph{linkers}, \emph{enzymes} and \emph{receptors}. - The transporters serve as gates through the cellmembrane, generally for charged or polar molecules that otherwise could not pass the hydrophobic lipid bilayer the plasma membrane consists of. Linkers have a structural function in the cell membrane. Some membrane proteins are merely enzymes, moderating biochemical reactions inside or outside the cell. - Receptors are capable of receiving biochemical signals from inside or outside the cell, thus triggering a reaction on the other side of the membrane. In particular, inside the membrane receptors often interact with kinases\footnote{Kinase is a generic name for enzymes that attach a phosphate to a protein, opposite in action to phosphatases; these enzymes are important metabolic regulators.}, thus initiating a signaling pathway in the cell triggered by an extracellular stimulus.Integral plasma membrane proteins serve several functions. Often, one divides them into four classes: \emph{transporters}, \emph{linkers}, \emph{enzymes} and \emph{receptors}. - The transporters serve as gates through the cellmembrane, generally for charged or polar molecules that otherwise could not pass the hydrophobic lipid bilayer the plasma membrane consists of. Linkers have a structural function in the cell membrane. Some membrane proteins are merely enzymes, moderating biochemical reactions inside or outside the cell. - Receptors are capable of receiving biochemical signals from inside or outside the cell, thus triggering a reaction on the other side of the membrane. In particular, inside the membrane receptors often interact with kinases\footnote{Kinase is a generic name for enzymes that attach a phosphate to a protein, opposite in action to phosphatases; these enzymes are important metabolic regulators.}, thus initiating a signaling pathway in the cell triggered by an extracellular stimulus.

33. We will develop a kernel motivated by the low-frequency alternation of hydrophobic and hydrophilic regions in membrane proteins. However, we also demonstrate that the hydropathy profile only provides partial info: additional info is gained from sequence homology and prot-prot interactionsWe will develop a kernel motivated by the low-frequency alternation of hydrophobic and hydrophilic regions in membrane proteins. However, we also demonstrate that the hydropathy profile only provides partial info: additional info is gained from sequence homology and prot-prot interactions

34. Dir. Inc. �. -> known to be usefull in identifying membrane proteinsDir. Inc. �. -> known to be usefull in identifying membrane proteins

35. Kernel-based learning methods represent data by means of a kernel matrix or function, which defines similarities between pairs of genes, proteins, � Such similarities can be established using a broad spectrum of data (examples later on), as long as the corresponding kernel matrix is positive definite: it�s ok! In that case, we can interprete it is the inner products in some high dimensional space, in which we can train our favorite linear classification algorithm� Kernel matrix <-> kernel function So we can have a very heterogeneous set of data up here, and every kernel function/matrix is geared towards a specific type of data, thus extracting a specific type of information from a data set. Just like this each set of data describes the genome partially, in a heteregeneous way, so does each kernel, but in a homogeneous way (all compatible matrices). So, here we have a chance to fuse the many partial descriptions of the data: by combining/fusing/mixing those compatible kernel matrices in a way that is statistically optimal, computationally efficient and robust, we can try to find a kernel K that best represents all of the information available for a given learning task. We�ll explain further on how this can be accomplished.Kernel-based learning methods represent data by means of a kernel matrix or function, which defines similarities between pairs of genes, proteins, � Such similarities can be established using a broad spectrum of data (examples later on), as long as the corresponding kernel matrix is positive definite: it�s ok! In that case, we can interprete it is the inner products in some high dimensional space, in which we can train our favorite linear classification algorithm� Kernel matrix <-> kernel function So we can have a very heterogeneous set of data up here, and every kernel function/matrix is geared towards a specific type of data, thus extracting a specific type of information from a data set. Just like this each set of data describes the genome partially, in a heteregeneous way, so does each kernel, but in a homogeneous way (all compatible matrices). So, here we have a chance to fuse the many partial descriptions of the data: by combining/fusing/mixing those compatible kernel matrices in a way that is statistically optimal, computationally efficient and robust, we can try to find a kernel K that best represents all of the information available for a given learning task. We�ll explain further on how this can be accomplished.

36. Let�s forget about everything and consider learning the optimal kernel? How can we do this? Convex: local = global optimumLet�s forget about everything and consider learning the optimal kernel? How can we do this? Convex: local = global optimum

37. Let�s forget about everything and consider learning the optimal kernel? How can we do this?Let�s forget about everything and consider learning the optimal kernel? How can we do this?

38. Learning the Optimal Kernel

39. - Convex subset: good for us: we want a subset obtained by mixing our kernels somehow --- here we take: linear subspace in the cone, spanned by those kernels, where we wanna learn the weights - for SVMs, maximum margin classifiers: - Convex subset: good for us: we want a subset obtained by mixing our kernels somehow --- here we take: linear subspace in the cone, spanned by those kernels, where we wanna learn the weights - for SVMs, maximum margin classifiers:

40. Learning the optimal Kernel

41. Learning the optimal Kernel



46. Next class Student presentation Schedule is online Send the ppt slides to TA one day earlier Email: jianhuic@gmail.com

47. Survey Clustering Classification Regression Semi-supervised learning Dimensionality reduction Manifold learning Kernel learning

Source: Gert Lanckriet s slides

Source: Gert Lanckriet s slides

Presentation Transcript

Our United States

indignant: Slides 2-3 usurp: Slides 4-5 tremulous: Slides 6-7 deride: Slides 8-9 insolent: Slides 10-11 revere: Sl

Exam 1 - Lessons 1-9 Review Slides

Ancient Greece

Gert Sibande District

HEART TRANSPLANTATION

Embodied Cognition Course

Gert Flaig, Senior Education Advisor, GTZ Eschborn

Sink or Source ?

“I’m SO Angry!”

Virtual Memory Additional Slides Slide Source: Bryant@cmu

July 2014 Regional Data Slides

Source: highriver

Douglas Turnbull Computer Audition Lab UC San Diego, USA Gert Lanckriet , UC San Diego, USA

NORM: Radiation Surveys (an Attempt for Source Inventory) Gert Jonkers

Key Stage 4 Poetry

The Nutch Open-Source Search Engine