250 likes | 284 Views
Sixth International Conference on Bioinformatics InCoB2007. Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes. Van Hai Van , Cao Thi Ngoc Phuong, Tran Linh Thuoc Faculty of Biology, University of Natural Sciences, VNU-HCMC, Vietnam.
E N D
Sixth InternationalConference on BioinformaticsInCoB2007 Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes Van Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc Faculty of Biology, University of Natural Sciences, VNU-HCMC, Vietnam
Epitope prediction “Epitope is the portion of an antigen that is recognized by the antigen receptor on lymphocytes” Molecular Biology Epitope prediction: Computers aid to develop epitope-based vaccines against various human pathogens for which no vaccines currently exist http://www.scripps.edu/newsandviews/e_20050228/hiv.html
T-cell epitope prediction • T-cell epitopes are a subset of MHC binding peptides prediction of the peptides binding to MHC is essential for design of peptide-based vaccines • HLA-A0201 Sequence Binding motifs Artificial neural networks Quantitative matrices Hidden Markov models Support vector machines Decision tree Molecular Biology
HMMs & SVMs SVMs (Support Vector Machines): Learning machine that can find the optimal separating hyperplane. HMMs (Hidden Markov Models) Statistical model that can capture complex relationships in data sets.
Epitope prediction for dengue virus • Tropical disease • Dengue fever • Dengue hemorraghic fever • Dengue shock syndrome • Hypothesis of pathogenesis • Antibody – dependent enhancement • Virus virulence • No dengue vaccine is available In our research: . Develop procedure for building automatically T-cell epitope predicting models . Find candidates in silico for making multivalent vaccines on 4 types of Dengue virus
Building models for predicting T-cell epitopes & applying these models on dengue virus
Building effective prediction models? The predicting ability of HMM and SVM models depends on: Experimentally peptides binding to MHC molecules Partition of the peptides into training set and testing set Encoding method A system finds easily and quickly the best prediction model when type of MHC molecules and quantity of binding peptides are changed
Training & testing procedure HMMs (HMMer) SVMs (SVM_light)
Result of the training by HMMs HMM.7.136: AROC=0.914 Choose parameter from HMM.7.136: At point: E=3.4, S=-8.5, SE=0.91, SP= 0.86, AROC=0.885
Result of the training by SVMs At blosum-62 encoding, data set SVM.7.blo62.46: SE=0.83, SP=0.90, AROC=0.87 Binary encoding: AROC=0.42÷0.77 Blosum-62 encoding: AROC= 0.47÷0.87 Chemical-physical encoding: AROC= 0.41÷0.71
Training in 6-amino acid homologous groups HMM.6.78: AROC=0.883 Parameters of HMM.6.78: At point: E=42, S=-9.2, SE=0.91, SP= 0.84, AROC=0.875
: Binary encoding : Blosum-62 encoding : Binary-Blosum-62 encoding Training in 7-amino acid homologous groups At SVM.2.7.85: SE=0.93, SP=0.86, AROC=0.894
Epitope predicting procedure for dengue virus • Do multiple sequence alignment • Extract consensus sequences more than or equal 9 amino acids • Create 9-mer overlap sequences • Predict peptides binding to MHC by HMMs profile or SVMs model
Experiment 1 Experiment 2 Result of epitope prediction (peptide binding to HLA-A0201 prediction): Join overlap 9-amino acid peptides predicted binding to HLA-A0201 molecules
Result of prediction • HMMs profile is stable and increase ability of prediction when there are additional data sets. • SVMs model is good but ability of prediction decreases when amount of training data increases.
Conclusion • Successfully building system for training Hidden Markov models and Support Vector Machines • Generating training and testing data based on separating data set into homologous groups give us good result. • Could predict consensus epitope for 4 types of Dengue virus based on data of peptides binding to HLA-A0201
Future plans • Set other kernels on SVMs method • Survey other encoding method for sequences having flexible length • Survey other methods for classifying MHC data to homologous groups • Automate procedure collecting and updating data of peptide binding MHC from databases