10 likes | 118 Views
Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues Results show that the ROC curve of Struct-SVM dominates the ROC curve of Support Vector Machine (SVM) classifier. X test,j = surface. no. Learning
E N D
Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues Results show that the ROC curve of Struct-SVM dominates the ROC curve of Support Vector Machine (SVM) classifier Xtest,j= surface no Learning System L Resulting Classifier Xtest,j yes Collection of Surface Windows Collection of Non-Surface Windows Test Data h(xtest,j)=y h(xtest,j)=-1 Training Data Final Predictions Seq2SeqWins SeqWins2TargetAA SeqWins2ZeroOne SeqWins2Blast SeqWins2SS SS2ZeroOne TargetAA2Struct Struct2Blast SeqWins2CXValue SeqWins2Roughness xi=(xi,1,…,xi,j-k,…,xi,j,…,xi,j+k,…,xi,m) Sequence: yi=(yi,1,…,yi,j-k,…,yi,j,…,yi,j+k,…,yi,m) Label: windowise … … x’i,j-1=(xi,j-1-k,…,xi,j-1,…,xi,j-1+k) x’i,j-1=(xi,j-1) x’i,j=(xi,j-k,…,xi,j,…,xi,j+k) x’i,j=(xi,j) x’i,j+1=(xi,j+1-k,…,xi,j+1,…,xi,j+1+k) x’i,j+1=(xi,j+1) … … Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science Predicting Protein-RNA Binding Sites Using Structural Information Cornelia Caragea, Michael Terribilini, Jivko Sinapov, Jae-Hyung Lee, Fadi Towfic, Drena Dobbs and Vasant Honavar Introduction Struct-SVM Classifier A machine learning classifier that incorporates domain knowledge to improve classification (that is, the structure of the protein) RNA molecules play diverse functional and structural roles in cells: • messengers for transferring genetic information from DNA to proteins • primary genetic material in many viruses • enzymes important for protein synthesis and RNA processing • essential and ubiquitous regulators of gene expression in living organisms These functions depend on interactions between RNA molecules and specific proteins in cells. 1T0K_B SINQKLALVIKSGKYTLGYKSTVKSLRQGKSKLIIIAANTPVLRKSELEYYAMLSKTKVYYFQGGNNELGTAVGKLFRVGVVSILEAGDSDILTTLA Protein-RNA interface residue identification xi A N T P V L R K S 0 0 1 1 0 0 1 0 0 yi {0,1}* Results Dataset • RNA-Protein Interface dataset, RB181: consists of RNA-binding protein sequences extracted from structures of known RNA-protein complexes solved by X-ray crystallography in the Protein Data Bank Feature Extraction Seq2SeqWins Table 1. Accuracy, Correlation Coefficient and Area Under the ROC Curves for SVM and Struct-SVM Fig. 1. Receiver Operaring Characteristi (ROC) Curves for SVM and Struct-SVM classifiers on the protein-RNA dataset SeqWins2TargetAA Conclusions References [1] Chen, Y., Varani, G. (2005). Protein families and RNA recognition. Febs J 272:2088-2097. [2] Burges, C. J. C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998 [3] Towfic, F., Caragea, C., Dobbs, D., and Honavar, V. (2008). Struct-NB: Predicting protein-RNA binding sites using structural features. International Journal of Data Mining and Bioinformatics, In press. Acknowledgements: This work is supported in part by a grant from the National Institutes of Health (GM 066387) to Vasant Honavar & Drena Dobbs