230 likes | 329 Views
Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor. 05/02/2008 Jae Hyun Kim. Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2): 225-33. Contents. Terminology Motivation Method Molecular Signature Signature Kernel
E N D
Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor 05/02/2008 Jae Hyun Kim Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2): 225-33.
Contents • Terminology • Motivation • Method • Molecular Signature • Signature Kernel • Signature Product Kernel • Results • Conclusion jaekim@ku.edu
Terminology (1) • Catalyst • Increases the rate of chemical reaction / biological process • Remains unchanged • Enzyme • Biomolecules that catalyze chemical reactions • Usually proteins • Metabolite • Intermediates & products of metabolism • Restricted to small molecules Reference: www.wikipedia.org jaekim@ku.edu
Terminology (2) • Inhibitor • Molecules that decrease enzyme activity • Compete with substrates • Most of drugs/poisons Reference: www.wikipedia.org jaekim@ku.edu
Enzyme Commission (EC) Number • EC Number • Numerical Classification scheme for Enzyme-catalyzed reactions • Four levels of hierarchy • Example: EC 3.4.11.4 : tripeptide aminopeptidases • EC 3 : hydrolases (enzymes that use water to break up some other molecules ) • EC 3.4 : hydrolases that act on peptide bonds • EC 3.4.11 : hydrolases that cleave off the amino-terminal amino acid from polypeptide • EC 3.4.11.4 : hydrolases that cleave off the amino-terminal end from a tripeptide Reference: www.wikipedia.org jaekim@ku.edu
Motivation • Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor Large-scale Protein-Chemical Interaction Machine-learning Technique jaekim@ku.edu
Molecular Signature • G=(V,E) : Molecular Graph • V : vertex (atom) set • E : edge (bond) set • Atomic Signature • Canonical representation of subgraph surrounding a particular atom • include atoms and bonds up to a predefined distance (height) • Molecular Signature of G : h(G) • hG(x) : atomic signature in G rooted at x of height h • Height • Chemicals : 0~6 • Protein: 6~18 (amino acid residue 1~7) jaekim@ku.edu
Molecular Signature: Example (Isoleucine) (Glycine) (Leucine) c_, n_: sp3 carbon/nitrogen atom c=, o= : sp2 (double-bond) carbon/oxygen atom h_: hydrogen • Depth First Search up to “height” deep • ‘(‘ going down, ‘)’ going back up jaekim@ku.edu
Reaction Signature • General form of enzymatic reaction R • s1S1+s2S2+…+snSn p1P1+p2P2+…+pmPm • Height h signature of reaction R jaekim@ku.edu
Pairwise Kernel • To predict/classify protein-protein interactions • To measure similarity between two pairs of proteins • Kernel Function K( (X1,X2), (X’1,X’2) ) • How to measure similarity between pairs? jaekim@ku.edu
Kernel Types From Ben-Hur, A. and W. S. Noble (2005). "Kernel methods for predicting protein-protein interactions." Bioinformatics 21 Suppl 1: i38-46. • Pairwise similarity by component similarity • If X1~X1’ and X2~X2’ then (X1,X2)~(X1’,X2’) • Assess directly similarity between pairs • x12= (x1ix2j + x2ix1j ): pairwise representation of (X1, X2) • Similarity inside the pair Similarity between pairs jaekim@ku.edu
Signature Kernel • Definition • Apply to chemicals, proteins, reactions jaekim@ku.edu
Signature Product Kernel (1/2) • P: Protein, C: Chemical • Definition : Signature of Complex PC • Two pairs of P-C interaction (P,C) & (Q,D) jaekim@ku.edu
Signature Product Kernel (2/2) • Similarly, • Therefore, jaekim@ku.edu
Signature Kernel : Example (height 1) # of occurrence jaekim@ku.edu
Signature Product Kernel : Example jaekim@ku.edu
Signature Similarity VS. Sequence Alignment Scores • Computed for every pair of amino acids • Correlation : Chemically similar high BLOSUM62 score jaekim@ku.edu
EC Number Classification • Positive Examples • download from KEGG • more than 50, max 500 • Negative Examples: • Equal Number, Random Selection • Signature Kernel, 5-fold CV Using only protein sequences Using only reactions jaekim@ku.edu
EC Classification • Using both sequences & reactions • Signature Product Kernel Class 1 Class 1.1 Class 1.1.1 Class 1.1.1.1 jaekim@ku.edu
Comparison with other Methods • Accuracy = (TP+TN)/ (TP+TN+FP+FN) • Auc = Area Under Curve • Precision = TP/(TP+FP) • Sensitivity=TP/(TP+FN) • Specificity=TN/(TN+FP) • Jaccard Coefficient = TP/(TP+FP+FN) • A larger number indicates better results jaekim@ku.edu
Predicting New Enzyme Interactions • Prediction • EC No. accepted in September 2006 : Test Set • Predict whether or not a given enzyme will catalyze a given reaction • Signature Product Kernel jaekim@ku.edu
Predict DRUGBANK Using KEGG • Class I : Both in training set • Class II: Different Partners • Class III: Only Target • Class IV: Only Drug • Class V: None • Signature Product Kernel Area under ROC = 0.74 jaekim@ku.edu
Conclusion • Unified method for predicting protein-chemical interactions • Atomistic structure representation of proteins encompasses information stored in substitution matrices. jaekim@ku.edu