1 / 35

Laboratory of Computational Biology eead.csic.es/compbio

The relation between amino-acid substitutions in the interface of transcription factors and their recognized DNA motifs. Álvaro Sebastian Yagüe asebastian@eead.csic.es. Laboratory of Computational Biology http://www.eead.csic.es/compbio Estación Experimental de Aula Dei

olinda
Download Presentation

Laboratory of Computational Biology eead.csic.es/compbio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The relation between amino-acid substitutions in the interface of transcription factors and their recognized DNA motifs Álvaro Sebastian Yagüe asebastian@eead.csic.es Laboratory of Computational Biology http://www.eead.csic.es/compbio Estación Experimental de Aula Dei CSIC, Zaragoza, España February 2, 2010 - V National Conference BIFI 2011

  2. Content index • DNA recognition and binding • 3D footprinting • footprintDB database • alignment of DNA motifs • alignment of protein interfaces

  3. DNA recognition and binding

  4. DNA-binding proteins DNA-binding proteins are proteins that are composed of DNA-binding domains and thus have a specific or general affinity for either single or double stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair. However there are some known narrow-groove DNA-binding ligands. lac repressor Tyr 17 Tyr 12 Tyr 7 Jones CE, Olson OM: Sequence-specific DNA-protein interaction: the lac repressor. J Theor Biol 64:323-332, 1977.

  5. DNA-binding proteins DNA-binding proteins are proteins that are composed of DNA-binding domains and thus have a specific or general affinity for either single or double stranded DNA. lac repressor Tyr 7 Tyr 12 Tyr 17 Lewis M, Chang G, Horton NC, Kercher MA, Pace HC, Schumacher MA, Brennan RG, Lu P: Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science 271:1247-1254, 1996.

  6. DNA-binding proteins DNA-binding proteins are proteins that are composed of DNA-binding domains and thus have a specific or general affinity for either single or double stranded DNA. lac repressor Tyr 7 Tyr 12 Tyr 17 Lewis M, Chang G, Horton NC, Kercher MA, Pace HC, Schumacher MA, Brennan RG, Lu P: Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science 271:1247-1254, 1996.

  7. DNA-binding proteins DNA-binding proteins are proteins that are composed of DNA-binding domains and thus have a specific or general affinity for either single or double stranded DNA. Tyr 7 Tyr 12 Tyr 17

  8. 3D footprinting

  9. Methods for studying protein-DNA interactions Helwa R, Hoheisel JD: Analysis of DNA-protein interactions: from nitrocellulose filter binding assays to microarray studies. Anal Bioanal Chem 398:2551-2561.

  10. 3D Footprinting 3D footprinting is a computational technique developed in our lab that annotates DNA-binding interfaces by analizing 3D published structures from PDB. 3D-footprint calcultated interface: 1D5Y Interface residues for 1d5y_ATF: 32,34,35,37,38 http://floresta.eead.csic.es/3dfootprint/

  11. footprintDB

  12. footprintDB We have designed, implemented and curated a database with more than 3000 unique DNA-binding proteins (mostly transcription factors, TFs) and 4000 Position Weight Matrices (PWMs) extracted from the literature and other repositories. TF sequences in footprintDB have annotated their DNA-binding interface residues by aligning their sequences with 3D-footprint templates.

  13. footprintDB

  14. footprintDB • footprintDB predicts: • Transcription factors which bind a specific DNA site or motif • DNA motifs likely to recognised by a specific DNA-binding protein

  15. http://floresta.eead.csic.es/footprintdb/

  16. alignment of protein interfaces

  17. Alignment of protein interfaces The rationale behind footprintDB is the observation that proteins which recognize a similar DNA motif most often have a similar set of residues at the interface. DNA motif ~ TF interface yCAATTAws ~ RKRTQNTK -yaATTAam ~ RRRIQNTK -yAATTArg ~ RRRIQNAK -TAATTArc ~ RRRIQNAK -tmATTAAs ~ KRRIQNMK

  18. Alignment of protein interfaces Noyes et al. have recently shown that homeodomain binding specificities depend on the interface residues involved in DNA motif recognition. Noyes, M.B., Christensen, R.G., Wakabayashi, A., Stormo, G.D., Brodsky, M.H., Wolfe, S.A.: Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133 (2008) 1277-1289

  19. Alignment of protein interfaces Unknown homeodomain protein Homeodomain interface residues RRRIQNAK Interface alignment with footprintDB annotated interfaces yCAATTAws ~ RKRTQNTK -yaATTAam ~ RRRIQNTK -TAATTArc ~ RRRIQNAK -tmATTAAs ~ KRRIQNMK Predicted DNA binding motif TAATTArc

  20. Alignment of protein interfaces Scoring of aligned protein interfaces will be more accurate in predicting which DNA motif bind a unknown DNA binding protein that other scoring methods like local alignment. Homeodomains: bZIPs: ROC curve shows that interface alignments improve DNA motif predictions in comparisson with Blast scores.

  21. alignment of DNA motifs

  22. DNA motif alignment issues • Three alignment combinations: ATC / GTT ; ATC / AAC ; GAT / GTT • longer calculation time and higher false positive rate than a pairwise alignment • Different motif sizes: TgAGt / ackrTGACGTCAycra • it’s not a big issue if we divide the score by the number of aligned nucleotides • Small motifs are prone to false high-scoring alignments, due to the small nucleotide alphabet size: AGt / CGT • high similarity thresholds are required, particularly with individual Zinc Fingers • that usually recognize 3 nts

  23. DNA motif alignment issues • Complex motifs (multimeric proteins): ackrTGACGTCAycra / rTGACwmAGCA • they are not easy to align and heteromultimers might bind different sites • A single motif for TFs with multiple DNA-binding domains • it might not be possible to know which domain binds to each submotif • TFs with different annotated motifs • as a result of different oligomeric conformations or experimental approaches • Motifs with very low information content: akaTTrchhaAhcw • might be genuine or result from low resolutionexperiments; source of FP hits

  24. Alignment of DNA motifs Some families of transcription factors and their singularities:

  25. Alignment of DNA motifs Motifs are aligned with Smith-Waterman ungapped algorithm and motif similarity is calculated using the sum of the Pearson Correlation Coefficients of the motif positions. G A C G C C Similarity: 1 + 0 + 1 = 2 / 3 = 0.67

  26. Alignment of DNA motifs Motifs are aligned with Smith-Waterman ungapped algorithm and motif similarity is calculated using the sum of the Pearson Correlation Coefficients of the motif positions. A C G T 01 0 0 6 0 G 02 1 4 0 1 C 03 0 4 0 2 C A C G T 01 0 0 3 1 G 02 3 1 0 0 A 03 0 4 0 0 C GCC GAC Simil = r1+r2+r3 = 0.94 + 0.14 + 0.87 = 1.95 Pearson Correlation Coefficient: Position 1:

  27. Alignment of DNA motifs 4900 TRANSFAC individual DNA sites were aligned with their corresponding DNA motifs (PWMs), yielding a mean similarity of 0.70 P0 A C G T 01 2 0 4 0 G 02 1 0 4 1 G 03 0 6 0 0 C 04 2 0 0 4 T 05 0 0 0 6 T 06 0 6 0 0 C 07 0 6 0 0 C 08 3 0 0 3 W 09 1 4 1 0 C AGCTTCCTC GGCATCCAG GTCTTCCTA AGCTTCCAC GGCATCCAC GACTTCCTC Half of DNA sites share <0.70 similarity with its motif DNA motifs have a large variability

  28. Alignment of DNA motifs 4900 TRANSFAC individual DNA sites were aligned against random footprintDB database motifs, yielding a mean similarity of 0.47. P0 A C G T 01 02 03 04 05 06 07 08 09 ? AGCTTCCTC Individual DNA sites and motifs can yield moderate similarities by chance

  29. Alignment of DNA motifs Which motif similarity threshold should we use to identify DNA sites and motifs? 0.47 < ? < 0.70 P0 A C G T 01 2 0 4 0 G 02 1 0 4 1 G 03 0 6 0 0 C 04 2 0 0 4 T 05 0 0 0 6 T 06 0 6 0 0 C 07 0 6 0 0 C 08 3 0 0 3 W 09 1 4 1 0 C AGCTTCCTC

  30. Alignment of DNA motifs Drawing a ROC curve interpolating TPR and FPR from TRANSFAC alignments, we obtain that values of motif similarity ratio beween 0.60 and 0.55 cover a sensitivity (TPR) range of 0.71-0.80 and a specificity (1-FPR) range of 0.88-0.74. similarity 0.55 – 0.60

  31. Thanks for your attention

  32. Laboratory of Computational BiologyEstación Experimental de Aula Dei / CSICAv. Montañana 1.00550059 Zaragoza (Spain)Tel.: +34 976716089Web: http://www.eead.csic.es/compbio/

  33. Questions?

More Related