1 / 22

A Computational Method to Identify Amino Acid Residues in RNA-protein Interactions

BCB. NSF IGERT. A Computational Method to Identify Amino Acid Residues in RNA-protein Interactions. Michael Terribilini & Jae-Hyung Lee Cornelia Caragea, Deepak Reyon, Ben Lewis, Jeffry Sander, Robert Jernigan, Vasant Honavar and Drena Dobbs Bioinformatics and Computational Biology Program

umed
Download Presentation

A Computational Method to Identify Amino Acid Residues in RNA-protein Interactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BCB NSF IGERT A Computational Method to Identify Amino Acid Residues in RNA-protein Interactions Michael Terribilini & Jae-Hyung Lee Cornelia Caragea, Deepak Reyon, Ben Lewis, Jeffry Sander, Robert Jernigan, Vasant Honavar and Drena Dobbs Bioinformatics and Computational Biology Program Center for Computational Intelligence, Learning, and Discovery L.H. Baker Center for Bioinformatics & Biological Statistics ROC 2008 meeting

  2. APPROACH:Generate datasets of known complexes from PDB to train & test machine learning algorithms (Naïve Bayes, SVM, etc.) GOAL:Classify each amino acid in target protein as either interface or non-interface residue PROBLEM:Given the sequence of a protein(& possibly its structure), predict which amino acids participate in protein-RNA interactions • Guiding hypothesis:Principal determinants of protein binding sites are reflected in local sequence features • Observation:Binding site residues are often clustered within primary amino acid sequence ROC 2008 meeting

  3. Sequence-Based Classifier: • RB181 non-redundant dataset: 181 protein-RNA complexes from the PDB • Input: window of amino acid identities centered on target & contiguous in protein sequence • Classifier: Naïve Bayes • Leave-one-out cross validation Structure-Based Classifier: • Calculate distance between each pair of residues in known structure • Input: identities of the nearest n spatial neighbors • Classifier: Naïve Bayes • Leave-one-out cross validation PSSM-Based Classifier: • PSI-BLAST against NCBI nr database to generate PSSMs • Input: PSSM vectors for residues contiguous in sequence • Classifier: Support Vector Machine (SVM)‏ • 10-fold cross validation Ser 28 QSVSTSSFRYM Ser 28 SSFRLNKSGRT Ser 28 QSVSTSSFRYM -3,7,8,… 5,-4,-6, … … … …,5,9,-1,… 20 PROBLEM:Given the sequence of a protein(& possibly its structure), predict which amino acids participate in protein-RNA interactions ROC 2008 meeting

  4. 7,456 Interface Residues (Positive examples) 41,335 Non-Interface Residues (Negative examples) Dataset of RNA-protein Interface Residues PDB Extract All Protein-RNA Complexes Select high resolution structures < 3.5Å Res 503 Complexes Filter using PISCES < 30% pair-wise sequence identity 181 Chains 48,791 Residues Identify Interface Residues usingdistance cutoff 5 Å PISCES: Wang and Dunbrack, 2003 Bioinformatics, 19:1589 ROC 2008 meeting

  5. Performance in predicting interface residues Using only protein sequence as input ROC 2008 meeting

  6. TRUE + TRUE + FALSE + FALSE + FALSE - FALSE - TRUE - TRUE - A few "good" predictions mapped onto structures Using only protein sequence as input Protein-Protein Protein-DNA Protein-RNA Naïve Bayes Naïve Bayes 2-stage classifier SVM + Naïve Bayes ROC 2008 meeting Yan Bioinformatics 2004; Yan BMC Bioinformatics 2006; Terribilini RNA 2006

  7. Predictions illustrated on 3D structures: 30S ribosomal protein S17 (PDB ID 1FJG:Q)‏ Sequence-Based Structure-Based PSSM-Based Combined Combined Results for 1FJG:Q: Spec+ = 0.89 Sens+ = 0.96 Accuracy = 0.91 Correlation Coefficient = 0.83 (For clarity, bound RNA is not shown) TP = True Positive = interface residues predicted as such FP = False Positive = non-interface residues predicted as interface residues TN = True Negative = non-interface residues predicted as such FN = False Negative = interface residues predicted as non-interface Combining Sequence, Structure & PSSM-Based Classifiers Improves Prediction of RNA-Binding Residues 1Specificity (Precision for the positive, RNA-binding class) 2Sensitivity (Recall for the positive, RNA-binding class) 3Area Under the Curve (AUC) from a Receiver Operating Characteristic (ROC) curve ROC 2008 meeting

  8. Predictions for Signal Recognition Particle 19kDa protein (PDB ID 1JID_A) Combined Predictions Accuracy = 82% Specificity = 55% Sensitivity = 75% CC = 0.52 IDSeq Predictions Accuracy = 80% Specificity = 56% Sensitivity = 21% CC = 0.25 ROC 2008 meeting

  9. RNABindR: An RNA Binding Site Prediction Server ROC 2008 meeting

  10. Applications • Lentiviral Rev proteins • Telomerase Reverse Transcriptase (TERT) http://telomerase.asu.edu/ ROC 2008 meeting

  11. Rev - a potential target for novel HIV therapies • Rev is a multifunctional regulatory protein that plays an essential role in the production of infectious virus • A small nucleo-plasmic shuttling protein (HIV Rev 115 aa; EIAV Rev 165 aa) • Recognizes a specific binding site on viral RNA Rev Responsive Element (RRE) • Contains specific domains that mediate nuclear localization, RNA binding and nuclear export • Rev's critical role in lentiviral replication makes it an attractive target for antiviral (AIDs) therapy ROC 2008 meeting

  12. Problem: no high resolution Rev structure! - not even for HIV Rev, despite intense effort • Why? • Rev aggregates at concentrations needed for NMR or X-ray crystallography • The only high resolution information available is for short peptide fragments of HIV-1 Rev: a 22 amino acid fragment of Rev bound to a 34 nucleotide RRE RNA fragment • What about insights from sequence comparisons? • HIV Rev sequence has low sequence identity with proteins with known structure • Very little sequence similarity among different Rev family members (e.g., EIAV vs HIV < 10%) ROC 2008 meeting

  13. 33 43 53 DTRQARRNRRRRWRERQRAA AA ++++++++++++++++++ Actual IR Predicted HIV-1 Rev: Predictions vs Experiments Prediction on RNA-binding protein HIV-1 Rev Sequence based prediction on HIV-1 Rev (not included in the training set) identified every interface residue, plus 3 false positives NMR structure (1ETF:B): 22 aa Rev peptide bound to RNA Battiste et al., 1996,Science 273:1547 Predicted Actual Interface residues = red Non-interface residues = grey RNA = green ROC 2008 meeting

  14. KRRRK + RRDRW + 71 81 91 ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI +++++++++++++++ ++++++++++++++++ 41 51 GPLESDQWCRVLRQSLPEEKISSQTCI ++++++++++ 131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL ++++++++++ ++ +++ ++++++ + ++++++++++++++++++++ VALIDATED: Protein binding residues RNA binding residues 57 125 145 165 31 31-165 31-145 57-165 145-165 WT MBP NES NLS ERLE KRRRK RRDRW Lee J Virol 2006; Terribilini RNA 2006 EIAV Rev: Predictions vs Experiments PREDICTED: Structure Protein binding residues RNA binding residues Ihm Ho Carpenter ROC 2008 meeting

  15. 71 81 91 ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI +++++++++++++++ ++++++++++++++++ 41 51 GPLESDQWCRVLRQSLPEEKISSQTCI ++++++++++ 131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL ++++++++++ ++ +++ ++++++ + ++++++++++++++++++++ 57 125 145 165 31 NES NLS KAAAK AADAA AALA ERDE WT ERLE KRRRK RRDRW Mutations in EIAV Rev: Experimental evaluation of RNA binding sites   AADAA AALA KAAAK Lee J Virol 2006; Terribilini RNA 2006  ERDE ROC 2008 meeting

  16. Summary KRRRK RRDRW HIV-1 Rev EIAV Rev Results show predicted protein & RNA binding sites in Rev proteins of HIV-1 & EIAV agree with available experimental data ROC 2008 meeting

  17. Telomerase Reverse Transcriptase (TERT) Functions: • “Cap” ends of chromosomes to prevent: • Recombination • End-to-end fusion • Degradation • Allow complete replication of chromosomes Interactions: Protein-DNA • Binds linear chromosome ends (& extends them) Protein-RNA • Telomerase reverse transcriptase (TERT) subunit contains an essential RNA component Protein-Protein • Dyskerin - component of active human telomerase complex • Many other interacting proteins: e.g., PPI1, RAP1, TEP1, HSP90 Lingner (1997) Science 276: 561-567 Adapted from P. J. Mason ROC 2008 meeting

  18. Human TERT: Preliminary docking of 3 modeled domains Preliminary model (lacking TEN domain) Kurcinski Kolinski Kloczkowski ROC 2008 meeting

  19. Predicted vs Actual RNA-Binding Residue in Human TRBD Predicted Actual ROC 2008 meeting

  20. Current & future work Progress towards our Goals? √ Model TERT domains from human √ Dock domains to generate a complete model for TERT protein • Generate a working model for TERT-TR complex • Predict TR RNA tertiary structure, then dock with protein Underway… Future: • Experimentally interrogate protein-RNA interfaces suggested by this work • Investigate these interfaces as potential therapeutic targets ROC 2008 meeting

  21. Conclusions • A combined classifier that uses the query sequence plus additional information derived from the known structure & a PSSM generated using PSI-BLAST sequence homologs (trained and tested on RB181, a dataset of diverse protein-RNA interfaces), predicts interface residues with ~ 86% overall accuracy, CC = 0.43 • Combining structure prediction with machine learning has potential to provide valuable insights into structure & function of important large RNP complexes - especially those for which high-resolution experimental structural information is not yet available • Computational methods can provide insight into protein-RNA interfaces, even for "recalcitrant" proteins whose structures are not yet available ROC 2008 meeting

  22. Acknowledgements Dobbs Lab @ Iowa State University http://ddobbs.public.iastate.edu/ Drena Dobbs, BCB & GDCB • Michael Terribilini • Jeffry Sander • Peter Zaback • Deepak Reyon • Ben Lewis Honavar Lab @ Iowa State University http://www.cs.iastate.edu/~honavar/aigroup.htm Vasant Honavar, BCB & Computer Science • Cornelia Caragea Kolinski Lab @ University of Warsaw http://biocomp.chem.uw.edu.pl/ Andrzej Kolinski,Chemistry • Mateusz Kurcinski @ Iowa State University Andrzej Kloczkowski, BBMBRobert Jernigan, BBMBKai-Ming Ho, Physics @ Washington State University Susan Carpenter, Vet Micro & Patho @ UCLA Yungok Ihm, Biochemistry Supported by: NSF IGERT Computational Molecular Biology USDA MGET Animal Genomics Iowa State University: Bioinformatics & Computational Biology Program (BCB) LH Baker Center for Bioinformatics & Biological Statistics Center for Integrated Animal Genomics (CIAG) Center for Computational Intelligence, Learning & Discovery (CILD) ROC 2008 meeting

More Related