180 likes | 381 Views
Prediction of Bacterial Effectors using SVM and Naïve Bayes classifier. Sneha Joshi MU Informatics Institute November 30, 2009. Effector Prediction. What are effectors: Why predicting effectors Prime candidates involved in Host pathogen interaction Modulate host cell functions
E N D
Prediction of Bacterial Effectors using SVM and Naïve Bayes classifier Sneha Joshi MU Informatics Institute November 30, 2009
Effector Prediction • What are effectors: • Why predicting effectors • Prime candidates involved in Host pathogen interaction • Modulate host cell functions • What are our goals: • Develop a classifier to classify pathogenic proteins in to effectors or non-effectors • Identify important features of signal • Provide potential drug targets
Available Methods • Experimental: • Translocation assays using fusion proteins of putative effector with reporter gene • Detection of effectors in supernatant • Prior knowledge required to screen effectors using experiment • Computational: • Homology to known effectors • Can not predict novel effectors • Transcriptional co-regulation • Few methods exists – limited to one of the secretion system
SVM prediction Features from N terminal 25 amino acids Features from full length of protein Features from C terminal 25 amino acids SVM 2 SVM 1 SVM 3 Naïve BayesClassifier Effectors Non-Effectors
Features from Protein sequence Dipeptide Composition Secondary structure Dielectric constant MLKYEERKLNNLTLSSFSKVGVSNDARL Charge Amino Acid Composition Relative solvent accessibility Polar, non-polar, charged, acidic, basic amino acids
Features from Nucleotide sequence Distance from known effector
Results: SVM1: Full Length amino acids Precision = TP/(TP+FP) Recall = TP/(TP+FN)
Results: SVM2: N terminal 25 amino acids Precision = TP/(TP+FP) Recall = TP/(TP+FN)
Results: SVM3: C terminal 25 amino acids Precision = TP/(TP+FP) Recall = TP/(TP+FN)
Results • Effect of predicted secondary structure solvent accessibility on prediction accuracy
Results • Effect of serine on prediction accuracy
Feature Selection • Feature space reduction • Correlation based feature selection1 • Hypothesis: Good feature subsets contain features highly correlated with the class yet uncorrelated with each other. • Features space reduced to 36 dimensions for full length, 19 for N terminal, and 25 dimensions for C-terminal. 1 Mark Hall Correlation-based Feature Selection for Machine Learning
Case study Xanthomonas oryzae Causes leaf blight of rice Has T2SS and T3SS System detects 2 effectors substrates of type II secretion system along with other 6 effectors of type III secretion system.
Future Work • Naïve Bayes Classifier: • Application to biological system: Mycobacterium tuberculosis • Evolutionary study of effector proteins • Extending beyond bacterial secretion systems • Nematode effector proteins
Acknowledgement • This work was supported by NSF Award #0845196 • Dmitry Korkin • Gavin Conant.