1 / 24

Computational Analysis of Protein-DNA Interactions

Computational Analysis of Protein-DNA Interactions. Changhui (Charles) Yan Department of Computer Science Utah State University. Problem I. Identifying amino acid residues involved in protein-DNA interactions from sequence. Materials And Methods.

liz
Download Presentation

Computational Analysis of Protein-DNA Interactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University

  2. Problem I Identifying amino acid residues involved in protein-DNA interactions from sequence

  3. Materials And Methods • 56 double-stranded DNA binding proteins previously used in the study of Jones et al. (2003) • Encoding

  4. Materials And Methods

  5. Naïve Bayes Classifier Leave-one-out cross-validation Naïve Bayes

  6. Naïve Bayes Classifier Leave-one-out cross-validation Naïve Bayes

  7. Leave-One-Out Cross-Validations

  8. Predictions in The Context of 3-D Structures Pit-1, PDB 1au7 TP:30 FP: 16 TN: 86 FN:14 CC: 0.51 (2nd) Accuracy: 79% Actual Predicted

  9. Predictions in The Context of 3-D Structures -Cro, PDB 6cro TP:10 FP: 5 TN: 34 FN:10 CC: 0.37 (19th) Accuracy: 73% Predicted Actual

  10. Predictions Compared With PROSITE Motifs • Predicted binding sites substantially overlap with 34 of the 37 “DNA-binding” PROSITE motifs • In 52 of the 56 proteins, the predictor identifies at least 20% of the DNA-binding residues • 28 of the 56 proteins contain no PROSITE motifs that are annotated as “DNA-binding”

  11. Comparison With Previous Study *Ahmad, S. and Sarai, A. (2005) PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics, 6, 33.

  12. Summary • A simple sequence-based Naive Bayes classifier predicts interface residues in DNA-binding proteins with 75% accuracy, 37% specificity+, 53% sensitivity+ and correlation coefficient of 0.29 • Predicted binding sites • correctly indicate the locations of actual binding sites • substantially overlap with known PROSITE motifs

  13. Problem II Identification of Helix-Turn-Helix (HTH) DNA-binding motifs

  14. HTH Motifs • Sequences sharing low similarities can fold into a similar HTH structure • Identifying HTH motifs from sequence is extremely challenging

  15. Trick 1 • Including more information • Amino acid sequence • Secondary structure

  16. Hidden Markov Model (HMM) LQQITHIANQL-GLE----KDVVRVWF

  17. Hidden Markov Model (HMM_AA_SS) LQQITHIANQL-GLE----KDVVRVWF HHHEEHEEEHMHE----HHEEMMEH

  18. Trick 2 • There are similarities among the 20 naturally occurred amino acids • Reduced alphabets

  19. Reduced Alphabets Schemes for reducing amino acid alphabet based on the BLOSUM50 matrix by Henikoff and Henikoff (1992) derived by grouping and averaging the similarity matrix elements as described in the text. (Murphy et al. 2000)

  20. Cross-Families Evaluations • True positive: HTH motifs that are correctly identified as such. • False positive: Non-HTH motifs that are identified as HTH motifs. • The alphabet used to encode amino acid sequences.

  21. Questions

  22. Within-family Three-Fold Cross-Validations .

  23. Comparisons of HMM_AA_SS with FFAS03 in Cross-Family Evaluations

  24. Putative HTH motifs in Ureaplasma parvum

More Related