1 / 18

Optimization of SVM parameters in caspase cleavage sites prediction using grid-computing Lawrence Wee

Optimization of SVM parameters in caspase cleavage sites prediction using grid-computing Lawrence Wee. What are caspases?. Caspases are downstream effectors in apoptosis 1. Extrinsic. Intrinsic. As the final effectors of apoptosis, caspases cleave many protein substrates.

misae
Download Presentation

Optimization of SVM parameters in caspase cleavage sites prediction using grid-computing Lawrence Wee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization of SVM parameters in caspase cleavage sites prediction using grid-computing Lawrence Wee

  2. What are caspases? Caspases are downstream effectors in apoptosis 1 Extrinsic Intrinsic As the final effectors of apoptosis, caspases cleave many protein substrates. 1. Hengartner MO. The biochemistry of apoptosis.Nature. 2000 Oct 12;407(6805):770-6.

  3. Caspases are proteases Caspase Cleavage of Substrates1 Caspases are cysteine proteases. Recognize tetrapeptide sequence on substrates (P4-P3-P2-P1). P4 P3 P2 P1 P1’ P2’ - D– E – V – D --- T – Y Cleave after canonical Asp (D) residue at the P1 position. • 1. Fuentes-Prior et al. Biochem J. 2004 Dec 1;384(Pt 2):201-32. • 2. Thornberry et al. J Biol Chem. 1997 Jul 18;272(29):17907-11.

  4. Caspases are proteases The Enormous Range of Caspase Substrates1 Apoptotic regulators Cytoskeletal proteins Caspase Substrates Organelle proteins DNA-associated proteins Caspases RNA-associated proteins Cell signaling proteins Cell cycle proteins Viral proteins More than 400 caspase substrates experimentally determined to date.1Many more await discovery. Other proteins ??? 1. Wee LJ, Tong JC, Tan TW, Ranganathan S. A multi-factor model for caspase degradome prediction. BMC Genomics. 2009, 10:S6.

  5. Computation prediction of caspase cleavage sites • Identification of caspase substrates is important for elucidating biological function of caspases. • Refine our understanding of apoptotic and other caspase-dependent signaling pathways. • Wet-laboratory efforts can be laborious. • Consider computational prediction of caspase cleavage sites?

  6. Support Vector Machines (SVM) • A type of machine learning algorithm • Works very well for several biological problems • Can be computationally hungry with large dimensions or parameters to optimize.

  7. Prediction of caspase cleavage sites Support Vector Machines: A Brief Introduction1 Data-points belonging to 2 distinct classes are represented as vectors. A set of “learning” or “training” data-points belong to 2 classes (green and orange). Each data-point has a unique set of attributes represented by vectors. 1. Cortes,C. and Vapnik,V. (1995) Support vector networks. Machine Learning, 20, 273–293.

  8. Prediction of caspase cleavage sites Support Vector Machines: A Brief Introduction1 The SVM algorithm constructs a “classifier” to discriminate the two classes. Maximal margin hyperplane The classifier is a maximal margin hyperplane that separates the two classes (green and orange) Support Vectors 1. Cortes,C. and Vapnik,V. (1995) Support vector networks. Machine Learning, 20, 273–293.

  9. Prediction of caspase cleavage sites SVM: A Brief Introduction1 The SVM algorithm classifies new unseen data into one of two classes. The classifier assigns the new data-point into one of the two classes based on where it is represented relative to the hyperplane. New data-point assigned to “orange” class. 1. Cortes,C. and Vapnik,V. (1995) Support vector networks. Machine Learning, 20, 273–293.

  10. Prediction of caspase cleavage sites SVM: A Brief Introduction1 SVM Decision Function with RBF kernel: 2 Parameters: C and gamma 1. Cortes,C. and Vapnik,V. (1995) Support vector networks. Machine Learning, 20, 273–293.

  11. Prediction of caspase cleavage sites Computational issues Training dataset (390 sequences) Leave-one-out cross-validation SVM Classifier

  12. Predicting caspase cleavage sites Computational issues Leave-one-out cross-validation for a set of C and gamma values: Training set (5 sequences) Seq 1 Seq 2 Seq 3 Seq 4 Seq 5 Set 1 Set 2 Set 3 Set 4 Set 5 Trained classifier

  13. Prediction of caspase cleavage sites Computational issues Training dataset (390 sequences) For C=0.1, g=0.1, Accuracy = 70% Leave-one-out cross-validation SVM Classifier

  14. Prediction of caspase cleavage sites Grid-based (brute force) optimization of SVM parameters

  15. Two Computational Issues 1. Leave-one-out cross-validation is computationally tedious. With a dataset of 390 training examples, leave-one-out cross-validation takes ~12 secs using an Intel 2.66GHz Core2Duo processor with 4GB ram using 2 parameters (C and gamma). Challenge: How fast will grid computers complete the same computation?

  16. Two Computational Issues 2. Brute-force optimization is computationally tedious. Challenge: How fast will grid computers complete the same computation (but repeated 100 times with different set of C and gamma values)?

  17. Practical

More Related