10 likes | 185 Views
MSCBB 2007. Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science. Prediction of RNA-Protein interfaces Using Structural Features.
E N D
MSCBB 2007 Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science Prediction of RNA-Protein interfaces Using Structural Features Fadi Towfic, David C. Gemperline, Cornelia Caragea, Feihong Wu, Drena Dobbs, and Vasant Honavar Abstract RNA-protein interactions play a critical role in gene expression: From splicing to translation, proteins must be able to recognize and interact with specific sites of RNA in order to perform their respective functions. In this paper, 147 different chains from RNA-binding proteins in the Protein Databank were characterized according to multiple structural features and the type of RNA bound to each protein chain. Furthermore, Naive Bayes classifiers were constructed to predict protein-RNA interfaces on the surface residues of the proteins. The three structural features used in this study were surface roughness, solid angle and CX value. A possible reason for the aforementioned discrepancy is that the preliminary clustering using ANOVA may have not been sophisticated enough to identify subclusters that lie within each group. The poor clustering may have contributed to the poor classification performance by Naïve Bayes. However, it is appropriate to note that each of the structural features had at least one cluster where classification performance was increased compared to the “No clustering” baseline. This result demonstrates the potential of using more sophisticated clustering as well as classification algorithms to improve the performance of RNA-protein interface prediction algorithms. Dataset and Classification The protein chains in the RB147 dataset available from the RNAbindr website (http://bindr.gdcb.iastate.edu/) were classified according to the type of RNA bound by each chain. Each type of RNA was then clustered using ANOVA as described by Towficet al. (Towficet al., 2007) as shown in Table 1. A Naïve-Bayes classification algorithm with 10-fold cross-validation with a window size of 12 (Witten and Frank, 2005) was then used to classify each of the groups shown in table 1. Table 1: Clustering of each RNA-binding type based on ANOVA analysis of the propensities for each chain. Results As shown in table 2, the clustering of the RNA types seems to improve the prediction accuracy, correlation, sensitivity and specificity in some cases (alpha carbon group2, roughness value group 1, solid angle value group 1) while contributing to poor performance in others (alpha carbon group 3, roughness value group 2, solid angle value group 2) compared to the classifiers that do not use clustering. Table 2: Comparison of the performance of the Naïve Bayes classifier with and without clustering. References F. Towfic, D. C. Gemperline, C. Caragea, F. Wu, D. Dobbs, and V. Honavar . Structural Characterization of RNA-Binding Sites of Proteins: Preliminary Results. Computational Structural Bioinformatics Workshop proceedings, 2007. In Press. I. H. Witten and E. Frank . Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, Morgan Kaufmann, 2005 Acknowledgements: This research was supported in part by a grant from the National Institutes of Health (GM066387) to Vasant Honavar and Drena Dobbs, an Integrative Graduate Education and Research Training (IGERT) fellowship to Fadi Towfic, funded by the National Science Foundation grant (DGE 0504304) to Iowa State University, and a Bioengineering and Bioinformatics Summer Institute (BBSI) fellowship to David Gemperline, funded by a National Science Foundation award (EEC 0608769) to Iowa State University. This work has benefited from discussions with Dr. Robert Jernigan of Iowa State University.