470 likes | 609 Views
Predicting protein function from heterogeneous data. Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology. Outline. Bayesian networks Support vector machines Diffusion / message passing. Annotation transfer.
E N D
Predicting protein function from heterogeneous data Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology
Outline • Bayesian networks • Support vector machines • Diffusion / message passing
Annotation transfer • Rule: If two proteins are linked with high confidence, and one protein’s function is unknown, then transfer the annotation. Protein of known function Protein of unknown function
Guilt by association • Rule: Assign function to unannotated proteins using a majority vote rule among its immediate neighbors. ?
P(B) = 0.001 P(E) = 0.002 Burglary Earthquake P(A|B,E) = 0.95 P(A|B, ¬E) = 0.94 P(A|¬B,E) = 0.29 P(A|¬B, ¬E) = 0.001 Alarm John calls Mary calls P(M|A) = 0.70 P(M|¬A) = 0.01 P(J|A) = 0.90 P(J|¬A) = 0.05
One network per gene pair A B Probability that genes A and B are functionally linked
Conditional probability tables • A pair of yeast proteins that have a physical association will have a positive affinity precipitation result 75% of the time and a negative result in the remaining 25%. • Two proteins that do not physically interact in vivo will have a positive affinity precipitation result in 5% of the experiments, and a negative one in 95%.
Inputs • Protein-protein interaction data from GRID. • Transcription factor binding sites data from SGD. • Stress-response microarray data set.
ROC analysis Using Gene Ontology biological process annotation as the gold standard.
Pros and cons • Bayesian network framework is rigorous. • Exploits expert knowledge. • Does not (yet) learn from data. • Treats each gene pair independently.
Support vector machine + + + + + - - Locate a plane that separates positive from negative examples. + + - + - + + - - - - - + - - - + + - - + - - Focus on the examples closest to the boundary.
Four key concepts • Separating hyperplane • Maximum margin hyperplane • Soft margin • Kernel function (input space feature space)
Input space gene2 1 3 gene1 gene2 patient1 -1.7 2.1 patient2 0.3 0.5 patient3 -0.4 1.9 patient4 -1.3 0.2 patient5 0.9 -1.2 2 4 gene1 5
Each subject may be thought of as a point in an m-dimensional space.
Separating hyperplane • Construct a hyperplane separating ALL from AML subjects.
Choosing a hyperplane • For a given set of data, many possible separating hyperplanes exist.
Maximum margin hyperplane • Choose the separating hyperplane that is farthest from any training example.
Support vectors • The location of the hyperplane is specified via a weight associated with each training example. • Examples near the hyperplane receive non-zero weights and are called support vectors.
Soft margin • When no separating hyperplane exists, the SVM uses a soft margin hyperplane with minimal cost. • A parameter C specifies the relative cost of a misclassifcation versus the size of the margin.
Incorrectly measured or labeled data The separating hyperplane does not generalize well No separating hyperplane exists
The kernel function • “The introduction of SVMs was very good for the most part, but I got confused when you began to talk about kernels.” • “I found the discussion of kernel functions to be slightly tough to follow.” • “I understood most of the lecture. The part that was more challenging was the kernel functions.” • “Still a little unclear on how the kernel is used in the SVM.”
Input space to feature space • SVMs first map the data from the input space to a higher-dimensional feature space.
Kernel function as dot product • Consider two training examples A = (a1, a2) and B = (b1, b2). • Define a mapping from input space to feature space: (X) = (x1x1, x1x2, x2x1, x2x2) • Let K(X,Y) = (X • Y)2 • Write (A) • (B) in terms of K.
Kernel function as dot product • Consider two training examples A = (a1, a2) and B = (b1, b2). • Define a mapping from input space to feature space: (X) = (x1x1, x1x2, x2x1, x2x2) • Let K(X,Y) = (X • Y)2 • Write (A) • (B) in terms of K. • (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2)
Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2)
Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2
Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2 = a1b1a1b1 + a1b1a2b2 + a2b2a1b1 + a2b2a2b2
Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2 = a1b1a1b1 + a1b1a2b2 + a2b2a1b1 + a2b2a2b2 = (a1b1 + a2b2) (a1b1 + a2b2)
Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2 = a1b1a1b1 + a1b1a2b2 + a2b2a1b1 + a2b2a2b2 = (a1b1 + a2b2) (a1b1 + a2b2) = [(a1,a2)• (b1,b2)]2
Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2 = a1b1a1b1 + a1b1a2b2 + a2b2a1b1 + a2b2a2b2 = (a1b1 + a2b2) (a1b1 + a2b2) = [(a1,a2)• (b1,b2)]2 = (A • B)2 = K(A, B)
Kernel function • The kernel function plays the role of the dot product operation in the feature space. • The mapping from input to feature space is implicit. • Using a kernel function avoids representing the feature space vectors explicitly. • Any continuous, positive semi-definite function can act as a kernel function. Need for “positive semidefinite” for kernel function unclear. Proof of Mercer’s Theorem: Intro to SVMs by Cristianini and Shawe-Taylor, 2000, pp. 33-35.
The SVM learning problem • Input: training vectors xi … xn and labels yi … yn. • Output: bias b plus one weight wi per training example • The weights specify the location of the separating hyperplane. • The optimization problem is a convex, quadratic optimization. • It can be solved using standard packages such as MATLAB.
SVM prediction architecture Query = x x1 x2 x3 ... xn k k k k w2 w3 wn w1
A simple SVM training algorithm • Jaakkola, Diekhans, Haussler. “A discriminative framework for detecting remote protein homologies.” ISMB 99. do randomly select a training example find its optimal weight w.r.t. all other (fixed) weights until the weights stop changing