410 likes | 620 Views
CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore. Classification of Drugs by SVM.
E N D
CZ3253: Computer Aided Drug designLecture 7: Drug Design Methods II: SVM Prof. Chen Yu ZongTel: 6874-6877Email: csccyz@nus.edu.sghttp://xin.cz3.nus.edu.sgRoom 07-24, level 7, SOC1, National University of Singapore
Classification of Drugs by SVM • A drug is classified as either belong (+) or not belong (-) to a class Examples of drug class: inhibitor of a protein, BBB penetrating, genotoxic Examples of protein class: enzyme EC3.4 family, DNA-binding • By screening against all classes, the property of a drug or the function of a protein can be identified Class-1 SVM - Drug Class-2 SVM - Class-3 SVM + Drug belongs to Family-3 - -
Classification of Drugs or Proteins by SVM What is SVM? • Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes. Advantages of SVM: • Diversity of class members (no racial discrimination). • Use of structure-derived physico-chemical features as basis for drug classification (no structure-similarity required in the algorithm).
SVM References • C. Burges, "A tutorial on support vector machines for pattern recognition", Data Mining and Knowledge Discovery, Kluwer Academic Publishers,1998 (on-line). • R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (section 5.11, hard-copy). • S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Pres, 2001 (sections 3.6.2, 3.7.2, hard copy). • Online lecture notes (http://www.cs.unr.edu/~bebis/MathMethods/SVM/lecture.pdf ) • Publications of SVM drug prediction: • J. Chem. Inf. Comput. Sci. 44,1630 (2004) • J. Chem. Inf. Comput. Sci. 44, 1497 (2004) • Toxicol. Sci. 79,170 (2004).
Descriptor Positive examples Negative examples Machine Learning Method Inductive learning: Example-based learning
Feature vectors: Positive examples Descriptor Negative examples Feature vector Machine Learning Method A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1)
Z Input space F B A E Y X SVM Method Feature vectors in input space: Feature vector A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1)
Protein family members Border New border Protein family members Nonmembers Nonmembers Project to a higher dimensional space SVM Method
New border Support vector Support vector Protein family members Nonmembers SVM method
Support vector Protein family members Nonmembers New border Support vector SVM Method
Find using quadratic program Many existing and new solvers.
Best Linear Separator:Supporting Plane Method Maximize distance Between two parallel supporting planes Distance = “Margin” =
SVM Method Border line is nonlinear
Non-linear transformation: use of kernel function SVM method
SVM method Non-linear transformation
SVM for Classification of Drugs How to represent a drug? • Each structure represented by specific feature vector assembled from structural, physico-chemical properties: • Simple molecular properties (molecular weight, no. of rotatable bonds etc. 18 in total) • Molecular Connectivity and shape (28 in total) • Electro-topological state polarity (84 in total) • Quantum chemical properties (electric charge, polaritability etc. 13 in total) • Geometrical properties (molecular size vector, van der Waals volume, molecular surface etc. 16 in total) J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004).
SVM Feature SelectionCACO2 - 718 descriptorsAverage of 10 Models Q2 is MSE scaled by variance: = (mean square error) / (true variance) Test Q2 = .7073
Feature Selection Using subset of descriptors might greatly improve results. • Do feature selection using Linear SVM with 1-norm regularization 2-norm 1-norm
Feature Selection via Sparse SVM/LP • Construct linear -SVM using 1-norm LP: • Pick best C, for SVM • Keep descriptors with nonzero coefficients
Partition Training Data Training Set Validation Set Linear SVM Algorithm For Feature Selection A Linear Regression Model Repeat B times Bag B Models and Obtain Subset of Features Bagged Feature Selection Random Variable - r
Bagged SVM (RBF)CACO2 - 31 Descriptors Test Q2 = .134
SlogP.VSA0 ABSDRN6 DRNB10 DRNB00 PIPB04 PEOE.VSA.FHYD PEOE.VSA.FNEG a.don KB11 PEOE.VSA.4 BNPB31 PEOE.VSA.FPOL PEOE.VSA.PPOS FUKB14 KB54 SlogP.VSA6 PIPMAX EP2 PEOE.VSA.FPPOS SMR.VSA2 ANGLEB45 apol BNPB50 SlogP.VSA9 pmiZ BNP8 PIPB53 ABSFUKMIN BNPB21 ABSKMIN SIKIA Starplot Caco2 - 31 Descriptors
Feature Selection Visualize Features Assess Chemistry Chemistry In/Out Modeling Data +Descriptors Test Data Chemistry Interpretation SVM Model Construct SVM Nonlinear model Predict bioactivities
Bagged SVM (RBF)CACO2 - 15 Descriptors Test Q2 = .166
a.don CACO2 – 15 Variables DRNB10 PEOE.VSA.FNEG BNPB31 KB54 ABSDRN6 ABSKMIN FUKB14 SMR.VSA2 PEOE.VSA.FPPOS SIKIA SlogP.VSA0 ANGLEB45 DRNB00 pmiZ
Chemical Insights • Hydrophobicity - a.don • SIZE and Shape ABSDRN6, SMR.VSA2, ANGLEB45, PmiZ Large is bad. Flat is bad. Globular is good. • Polarity – PEOE.VSA.FPPOS, PEOE.VSA.FNEG: negative partial charge good. Correspond to conventional wisdom – rule of 5.
Hybrid TAE/SHAPE • Shape important overall factor • DRNB10, DRNB00: del rho dot N • BNP31: bare nuclear potential • KB54: kinetic energy descriptors very large lipophilic molecules don’t work • FUKB14: Fukui Surface • Interpretations difficult • Point to chemistry challenges/hypotheses
Final SVM Approach • Construct large set of descriptors. • Perform feature selection: • Sensitivity Analysis or SVM-LP • Construct many SVM models • Optimize using QP or LP • Evaluate by Validation Set or Leave-one-out • Select best models by grid or pattern search • Bag best k models to create final function
Useful for inhibitor/activator/substrate prediction, drug safety and pharmacokinetic prediction. Drug SVM-based drug design and property prediction software Chemical Structure Chemical Structure Your drug structure Option 2 Option 1 http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi Which class your drug belongs to? Send structure to classifier Input structure through internet Support vector machines classifier for every Drug class Computer loaded with SVMProt Drug designed or property predicted Identified classes Input structure on local machine J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004).
Protein inhibitor/activator/substrate prediction: • 86% of the 129 estrogen receptor activators and 84% of 101 non-activators correctly predicted. • 81% of 116 P-glycoprotein substrates and 79% of 85 non-substrates correctly predicted • Drug Toxicity Prediction: • 97% of 102 TdP+ and 84% of 243 TdP- agents correctly predicted • 73% of 229 genotoxic and 93% of 631 non-genotoxic agents correctly predicted • Pharmacokinetics prediction: • 95% of 276 BBB+ and 82% of 139 BBB- agents correctly predicted • 90% of 131 human intestine absorption and 80% of 65 non-absoption agents correctly predicted. • J. Chem. Inf. Comput. Sci. 44,1630 (2004) • J. Chem. Inf. Comput. Sci. 44, 1497 (2004) • Toxicol. Sci. 79,170 (2004). SVM Drug Prediction Results