1 / 38

Vo Cam Quy , Nguyen Thanh Khoi, Nguyen Thi Truc Minh, Tran Linh Thuoc Department of Biotechnology

Sixth International Conference on Bioinformatics InCob2007, HongKong. T-cell EPITOPES PREDICTION OF HEMAGGLUTININ, NEURAMINIDASE AND MATRIX PROTEIN OF INFLUENZA A VIRUS USING SUPPORT VECTOR MACHINE AND HIDDEN MARKOV MODEL.

Download Presentation

Vo Cam Quy , Nguyen Thanh Khoi, Nguyen Thi Truc Minh, Tran Linh Thuoc Department of Biotechnology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sixth International Conference on Bioinformatics InCob2007, HongKong T-cell EPITOPES PREDICTION OF HEMAGGLUTININ, NEURAMINIDASE AND MATRIX PROTEIN OF INFLUENZA A VIRUS USING SUPPORT VECTOR MACHINE AND HIDDEN MARKOV MODEL Vo Cam Quy, Nguyen Thanh Khoi, Nguyen Thi Truc Minh, Tran Linh Thuoc Department of Biotechnology University of Natural Sciences Vietnam National University – HoChiMinh city, VietNam

  2. OUTLINE • Introduction • Epitope prediction methods • Influenza A virus • Materials And Methods • Results And Discussion • Conclusion and future work

  3. Epitope in silico Analysis Peptide Multiepitope vaccines VACCINOME Candidate Epitope DB Epitope prediction Disease related protein DB Gene/Protein Sequence Database

  4. Epitope • An epitope is the part of a macromolecule that is recognized by the immune system, specifically by antibodies, B cells, or T cells. • Most referred as three-dimensional surface features of an antigen molecule • linear epitopes are determined by the amino acid sequence

  5. EPITOPE PREDICTION STRATEGIES Epitope prediction B cell epitope prediction T cell epitope prediction Sequence Structure chemical features structure Statitical method Machine learning method Binding motifs, matrices Support Vector Machine, Artifical Neural Network… High accuracy Quantitative Matrices Hidden Markov Model Flexible model

  6. Tcell epitope prediction approach T cell epitope prediction Direct approach Indirect approach Postive: MHC binding peptides (binder) Negative: MHC-I non-binding peptides (non-binder) Negative: non-epitope Postive: Putative epitope Compare Epitope Candidates

  7. Influenza A virus • Influenza A viruses continue to emerge from the aquatic avian reservoir and cause pandemics • Many variances and mutations in the population difficult for vaccine producing • Genome: Consists of s/s (-) sense RNA in 8 segments • Hemagglutinin, neuraminidase, matrix protein are 3 of proteins concerned much. Red: M2 protein Green: hemagglutinin Blue: euraminidase Inside: viral RNA http://www.roche.com/pages/ facets/10/viruse.htm

  8. OBJECTIVE • Building HMM and SVM models for T cell epitope prediction (MHC class I and II) • Direct approach (epitope prediction) • Indirect approach (MHC binder prediction) •  combining the results to get epitope candidates • Epitope prediction of Influenza A virus’s proteins for the design of vaccine in silico

  9. METHODS 3 PARAMETERS OPTIMIZATION AntiJen MHCBN IEDB Training models 1 DATA COLLECTION AND PROCESSING Data collection Evaluating Raw data Optimal model Processing Training set Predict 4 APPLYING Protein 2 BUILDING MODEL EPITOPES epitopes predicted by both methods / both approachs were considered as epitopes SVM method HMM method

  10. RESULTS OF DATA COLLECTION AND PROCESSING Peptide type Allele Alen  24 data sets

  11. METHODS 3 PARAMETERS OPTIMIZATION AntiJen MHCBN IEDB Training models 1 DATA COLLECTION AND PROCESSING Data collection Evaluating Raw data Optimal model Processing Training set Predict 4 APPLYING 2 BUILDING MODEL Protein EPITOPES epitopes predicted by both methods were considered as epitopes SVM method HMM method

  12. Step 2: BUIDLING MODEL – HMM method Positive training set ClustalW Perl script modelfromalign Initial model • Result: 11 matrices x 6 allele x 2 approaches = 132initial models

  13. Sequence is cut into overlaps 8mer/9mer non-binder/non-epitope data processing Step 2: BUIDLING MODEL – SVM method Motif 9mer (binding core) Positive data Motif information from SYFPEITHI database (script perl) Choosing peptide conforming reported motif MHC class I binder/epitope data processing MHC class II binder/epitope data processing Negative data

  14. METHODS 3 PARAMETERS OPTIMIZATION AntiJen MHCBN IEDB Training models 1 DATA COLLECTION AND PROCESSING Data collection Evaluating Raw data Optimal model Processing Training set Predict 4 APPLYING 2 BUILDING MODEL Protein EPITOPES epitopes predicted by both methods were considered as epitopes SVM method HMM method

  15. STEP 3: PARAMETERS OPTIMIZATION HMM METHOD

  16. TRAINING PRINCIPLE COUPLE OF MODELS Positive model 12 Positive data set buildmodel (Baum-Welch or Viterbi) 132 Initial models 12 Negative data set buildmodel (Baum-Welch or Viterbi) Negative model

  17. Test set Training set Initial model (positive) - + ROC analysis Training Training Couple 1 Positive and negative data sets 10-FOLD CROSS VALIDATION 6 7 8 9 10 1 2 3 4 5 Average accuracy Acc. 3 Acc. 9 Acc. 1 Acc. 2 Acc. 4 Acc. 5 Acc. 6 Acc. 7 Acc. 8 Acc. 10

  18. NLL CALCULATING PRINCIPLE hmmscore (Viterbi) NLL 1 Positive model NLL 1 – NLL 2 Compare threshold NLL final NLL PPVPVSKVVSTDEYVAR ? Queried sequence NLL 2 Epitope Non-epitope Negative model hmmscore (Viterbi) final NLL  threshold NLL final NLL  threshold NLL

  19. ROC (Receiver Operating Curve) Analysis AROC > 90%: excellent prediction AROC > 80%:good prediction AROC < 80%: not acceptable prediction

  20. RESULTS OF VALIDATION The validation result of 22 couples of models trained by Baum-Welch and Viterbi algorithm in indirect approach for H-2-Db allele

  21. OPTIMAL PARAMETERS

  22. STEP 3: PARAMETERS OPTIMIZATION SVM METHOD

  23. LOOCV (LEAVE-ONE-OUT-CROSS-VALIDATION) Removing one peptide from the training data Testing was done on the removed peptide Training set The model was built by remaining data

  24. THE ACCURACY (MHC class I MODELS) Accuracy comparing the accuracies of predictive models between direct and indirect method after carrying out LOOCV procedure (mhc class I) Direct method Indirect method MHC allele

  25. THE ACCURACY (MHC class II MODELS) Accuracy Direct method Indirect method MHC allele

  26. OPTIMAL PARAMETERS (MHC CLASS I) Kernel functions: - Linear function - Polynimial function - RBF function - Sigmoid function

  27. OPTIMAL PARAMETERS (MHC CLASS II) Kernel functions: - Linear function - Polynimial function - RBF function - Sigmoid function

  28. METHODS 3 PARAMETERS OPTIMIZATION AntiJen MHCBN IEDB Training models 1 DATA COLLECTION AND PROCESSING Data collection Evaluating Raw data Optimal model Processing Training set Predict 4 APPLYING 2 BUILDING MODEL Protein EPITOPES epitopes predicted by both methods were considered as epitopes SVM method HMM method

  29. EPITOPE PREDICTION RESULTS – SVM METHOD

  30. EPITOPE PREDICTION RESULTS – HMM METHOD

  31. Total amount of epitopes in Influenza A virus Table 7: The number of epitopes in both HMM - SVM method protein Allele

  32. EPITOPE PREDICTION RESULTS – EXAMPLES

  33. WEB PREDICTION TOOL FOR HMM METHOD

  34. WEB PREDICTION TOOL FOR HMM METHOD (cont) Positive results Number of positive sequences Negative results Number of negative sequences

  35. CONCLUSIONS • SVM method: the model accuracy • Indirect method is better • MHC class I: H-2-Db (86.58%), H-2-Kb (80.25% ) and H-2-Kd (83.45%) • MHC class II: H-2-IEd (93.26%), H-2-IEk (95.19%), H-2-IAd (89.42%) • HMM method: the model accuracy • dicrect method is better • MHC class I: H-2-Db (86%), H-2-Kb (84.54% ) and H-2-Kd (84.72%) • MHC class II: H-2-IEd (93.90%), H-2-IEk (95.11%), H-2-IAd (77.84%)

  36. CONCLUSIONS • Built HMM and SVM models for T cell epitope prediction (MHC class I and II) • Direct approach (epitope prediction) • Indirect approach (MHC binder prediction) with a high accuracy • Applying successfully these model for epitope prediction of Influenza A virus’s proteins for the design of vaccine in silico

  37. FUTURE WORKS • Applying this tool to other proteins • Will run any programs by web. • B cell epitope prediction • Test result by biological experiment • …

  38. THANK YOU FOR YOUR ATTENTION

More Related