10 likes | 118 Views
Results & Conclusion. Method. Protein-RNA Interface Conservation. Future Directions. Improving Protein-RNA Interface Prediction by Combining a Sequence Homology-based Method with a Naïve Bayes Classifier: Preliminary Results.
E N D
Results & Conclusion Method Protein-RNA Interface Conservation Future Directions Improving Protein-RNA Interface Prediction by Combining a Sequence Homology-based Method with a Naïve Bayes Classifier: Preliminary Results Li Xue1,2, Rasna Walia1,2, Yasser El-Manzalawy2,4, Drena Dobbs1,3, Vasant Honavar1,2 1 Bioinformatics & Computational Biology Program; 2 Dept. of Computer Science; 3 Dept. of Genetics, Development & Cell Biology, Iowa State University; 4 Dept. of Systems & Computer Engineering, Al-Azhar University, Cairo, Egypt • Protein-RNA interactions play important roles in cellular processes including protein synthesis, RNA processing, and gene expression regulation. Reliable identification of the interfaces involved in protein-RNA interactions is essential for comprehending their mechanisms and functional implications and provides a valuable guide for rational drug discovery and design. • Experimental determination of interfaces in protein-RNA complexes is time-consuming and expensive. Thus computational techniques for predicting RNA-binding sites on proteins are valuable. Here we propose a novel family of sequence homology-based methods: • HomPRIP uses interface information from putative homologs of a query protein to predict interface residues in the query protein. • When no sequence homologs for the query protein can be found, HomPRIP-NB uses a Naïve Bayes (NB) classifier trained on evolutionary information derived from protein sequences in the NCBI nr database to return interface predictions. http://einstein.cs.iastate.edu/HomPRIP-NB • NR216 – for analyzing protein interface conservation • RB199 – for testing the prediction performance of HomPRIP & its combination with a NB classifier • nr_RNAprot_s2c – for searching for putative sequence homologs using BLASTP Query protein sequence Search nr_RNAprot_s2c to find homologous sequences Homologoussequences found? Yes No Safe zone HomPRIP-NB returns predicted interface residues HomPRIP returns predicted interface residues Twilight zone Dark zone • Support Vector Machine & Naïve Bayes classifiers were trained using three different features: • amino acid identity • PSSM profiles • smoothed PSSM profiles • and evaluated using five-fold cross-validation. • Performance of HomPRIP is reported only for 71% of complexes in the RB199 dataset (those for which homologs could be found); HomPRIP-NB returned predictions for the entire RB199 dataset. • Ongoing work is aiming at comparing HomPRIP-NB with other publically available servers that predict RNA-binding sites on proteins (e.g., BindN, PiRaNha, PRIP, RNABindR), using an independent test set. An interface conservation score (ICscore) is calculated as a measurement of the similarity of a homolog’s interface residues to those of the query protein. A regression model is used to calculate the ICscore, based on BLAST sequence alignment statistics. Safe zone: a high degree of conservation (red data points) Twilight zone: moderate conservation of interfaces (yellow & orange data points) Dark zone: poor conservation of interfaces (blue data points) Funding provided by: NIH GM 066387 B.A. Lewis, R.R. Walia, M. Terribilini, J. Ferguson, C. Zheng, V. Honavar, and D. Dobbs. PRIDB: a protein–RNA interface database. Nucleic Acids Research, 39(suppl 1):D277, 2011. L.C. Xue, D. Dobbs, and V. Honavar. HOMPPI: A class of sequence homology based protein-protein interface prediction methods. BMC Bioinformatics, 12:244, 2011. Acknowledgements