150 likes | 400 Views
Introduction Support Vector Regression QSAR Problems and Data SVMs for QSAR Linear Program Feature Selection Model Selection and Bagging Computational Results Discussion. Support Vector Regression. e -insensitive loss function. Quadratic SVMs with L 2 -norm.
E N D
Introduction • Support Vector Regression • QSAR Problems and Data • SVMs for QSAR • Linear Program Feature Selection • Model Selection and Bagging • Computational Results • Discussion
Support Vector Regression e-insensitive loss function
QSAR Problems and Data Preparation of Input DATA (Bioactivity value, Structures) 3D Geometry Optimization Calculation of Descriptors SVMs for QSAR Statistical Analysis QSAR Model Building
Data Sets • HIV dataset five classes of Anti-HIV molecules, 64 molecules, 620 descriptors • Lombardo benchmark dataset Brain-blood barrier partitioning dataset, 62 molecules, 649 descriptors Data Matrix descriptor1 descriptor2 - - - descriptor m Activity Molecule 1 x11 x12 x1m ln BB Molecule 2 x21 x22 x2m ln BB - - - - - - Molecule n x n1 x n2 x nm ln BB
Data Matrix descriptor1 descriptor2 descriptor3 - - - descriptor m Activity Molecule 1 x11 x12 x13 x1m ln BB Molecule 2 x21 x22 x23 x2m ln BB - - - - - - Molecule n x n1 x n2 x n3 x nm ln BB
SVMs for QSAR Construct Datasets Model Selection C, e, n, s Feature Selection Bagging Models Optimize Model Final Model
Model Selection • Choose SVM model parameters, C, e or n, s • Select evaluation function Q2 • Evaluate on testing data • Adjust using cross validation Bagging • Different validation sets give different models • Many local minima in SVM parameter search • Average models
Methods (10-fold CV) Full Data (649) LP FS (21) NN SA (9) Computational Results Q2 q2 Q2 q2 Q2 q2 L1-SVM .384 .382 .157 .153 .219 .217 L2-SVM .310 .292 .171 .160 .247 .245 NN .320 .301 .222 .193 .247 .238
Discussion • Robust optimization methods • LPFS outperforms NNSA • L1-SVM can run faster than L2-SVM • ? May improve LPFS method • ? May improve performance of L1-SVM This work is supported by NSF (IIS-9979860 and 970923)