350 likes | 365 Views
This study explores the relationship between amino acid substitutions in transcription factors' interfaces and their recognized DNA motifs. The research investigates how specific DNA-binding proteins interact with DNA, focusing on the role of interface residues in motif recognition. It utilizes computational techniques like 3D footprinting and protein interface alignment to analyze protein-DNA interactions, with a database (footprintDB) containing information on transcription factors and DNA motifs. By aligning protein interfaces, the study aims to enhance the accuracy of predicting DNA motif binding by unknown proteins. The findings shed light on understanding protein-DNA interactions and predicting DNA motif recognition sites.
E N D
The relation between amino-acid substitutions in the interface of transcription factors and their recognized DNA motifs Álvaro Sebastian Yagüe asebastian@eead.csic.es Laboratory of Computational Biology http://www.eead.csic.es/compbio Estación Experimental de Aula Dei CSIC, Zaragoza, España February 2, 2010 - V National Conference BIFI 2011
Content index • DNA recognition and binding • 3D footprinting • footprintDB database • alignment of DNA motifs • alignment of protein interfaces
DNA-binding proteins DNA-binding proteins are proteins that are composed of DNA-binding domains and thus have a specific or general affinity for either single or double stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair. However there are some known narrow-groove DNA-binding ligands. lac repressor Tyr 17 Tyr 12 Tyr 7 Jones CE, Olson OM: Sequence-specific DNA-protein interaction: the lac repressor. J Theor Biol 64:323-332, 1977.
DNA-binding proteins DNA-binding proteins are proteins that are composed of DNA-binding domains and thus have a specific or general affinity for either single or double stranded DNA. lac repressor Tyr 7 Tyr 12 Tyr 17 Lewis M, Chang G, Horton NC, Kercher MA, Pace HC, Schumacher MA, Brennan RG, Lu P: Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science 271:1247-1254, 1996.
DNA-binding proteins DNA-binding proteins are proteins that are composed of DNA-binding domains and thus have a specific or general affinity for either single or double stranded DNA. lac repressor Tyr 7 Tyr 12 Tyr 17 Lewis M, Chang G, Horton NC, Kercher MA, Pace HC, Schumacher MA, Brennan RG, Lu P: Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science 271:1247-1254, 1996.
DNA-binding proteins DNA-binding proteins are proteins that are composed of DNA-binding domains and thus have a specific or general affinity for either single or double stranded DNA. Tyr 7 Tyr 12 Tyr 17
Methods for studying protein-DNA interactions Helwa R, Hoheisel JD: Analysis of DNA-protein interactions: from nitrocellulose filter binding assays to microarray studies. Anal Bioanal Chem 398:2551-2561.
3D Footprinting 3D footprinting is a computational technique developed in our lab that annotates DNA-binding interfaces by analizing 3D published structures from PDB. 3D-footprint calcultated interface: 1D5Y Interface residues for 1d5y_ATF: 32,34,35,37,38 http://floresta.eead.csic.es/3dfootprint/
footprintDB We have designed, implemented and curated a database with more than 3000 unique DNA-binding proteins (mostly transcription factors, TFs) and 4000 Position Weight Matrices (PWMs) extracted from the literature and other repositories. TF sequences in footprintDB have annotated their DNA-binding interface residues by aligning their sequences with 3D-footprint templates.
footprintDB • footprintDB predicts: • Transcription factors which bind a specific DNA site or motif • DNA motifs likely to recognised by a specific DNA-binding protein
Alignment of protein interfaces The rationale behind footprintDB is the observation that proteins which recognize a similar DNA motif most often have a similar set of residues at the interface. DNA motif ~ TF interface yCAATTAws ~ RKRTQNTK -yaATTAam ~ RRRIQNTK -yAATTArg ~ RRRIQNAK -TAATTArc ~ RRRIQNAK -tmATTAAs ~ KRRIQNMK
Alignment of protein interfaces Noyes et al. have recently shown that homeodomain binding specificities depend on the interface residues involved in DNA motif recognition. Noyes, M.B., Christensen, R.G., Wakabayashi, A., Stormo, G.D., Brodsky, M.H., Wolfe, S.A.: Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133 (2008) 1277-1289
Alignment of protein interfaces Unknown homeodomain protein Homeodomain interface residues RRRIQNAK Interface alignment with footprintDB annotated interfaces yCAATTAws ~ RKRTQNTK -yaATTAam ~ RRRIQNTK -TAATTArc ~ RRRIQNAK -tmATTAAs ~ KRRIQNMK Predicted DNA binding motif TAATTArc
Alignment of protein interfaces Scoring of aligned protein interfaces will be more accurate in predicting which DNA motif bind a unknown DNA binding protein that other scoring methods like local alignment. Homeodomains: bZIPs: ROC curve shows that interface alignments improve DNA motif predictions in comparisson with Blast scores.
DNA motif alignment issues • Three alignment combinations: ATC / GTT ; ATC / AAC ; GAT / GTT • longer calculation time and higher false positive rate than a pairwise alignment • Different motif sizes: TgAGt / ackrTGACGTCAycra • it’s not a big issue if we divide the score by the number of aligned nucleotides • Small motifs are prone to false high-scoring alignments, due to the small nucleotide alphabet size: AGt / CGT • high similarity thresholds are required, particularly with individual Zinc Fingers • that usually recognize 3 nts
DNA motif alignment issues • Complex motifs (multimeric proteins): ackrTGACGTCAycra / rTGACwmAGCA • they are not easy to align and heteromultimers might bind different sites • A single motif for TFs with multiple DNA-binding domains • it might not be possible to know which domain binds to each submotif • TFs with different annotated motifs • as a result of different oligomeric conformations or experimental approaches • Motifs with very low information content: akaTTrchhaAhcw • might be genuine or result from low resolutionexperiments; source of FP hits
Alignment of DNA motifs Some families of transcription factors and their singularities:
Alignment of DNA motifs Motifs are aligned with Smith-Waterman ungapped algorithm and motif similarity is calculated using the sum of the Pearson Correlation Coefficients of the motif positions. G A C G C C Similarity: 1 + 0 + 1 = 2 / 3 = 0.67
Alignment of DNA motifs Motifs are aligned with Smith-Waterman ungapped algorithm and motif similarity is calculated using the sum of the Pearson Correlation Coefficients of the motif positions. A C G T 01 0 0 6 0 G 02 1 4 0 1 C 03 0 4 0 2 C A C G T 01 0 0 3 1 G 02 3 1 0 0 A 03 0 4 0 0 C GCC GAC Simil = r1+r2+r3 = 0.94 + 0.14 + 0.87 = 1.95 Pearson Correlation Coefficient: Position 1:
Alignment of DNA motifs 4900 TRANSFAC individual DNA sites were aligned with their corresponding DNA motifs (PWMs), yielding a mean similarity of 0.70 P0 A C G T 01 2 0 4 0 G 02 1 0 4 1 G 03 0 6 0 0 C 04 2 0 0 4 T 05 0 0 0 6 T 06 0 6 0 0 C 07 0 6 0 0 C 08 3 0 0 3 W 09 1 4 1 0 C AGCTTCCTC GGCATCCAG GTCTTCCTA AGCTTCCAC GGCATCCAC GACTTCCTC Half of DNA sites share <0.70 similarity with its motif DNA motifs have a large variability
Alignment of DNA motifs 4900 TRANSFAC individual DNA sites were aligned against random footprintDB database motifs, yielding a mean similarity of 0.47. P0 A C G T 01 02 03 04 05 06 07 08 09 ? AGCTTCCTC Individual DNA sites and motifs can yield moderate similarities by chance
Alignment of DNA motifs Which motif similarity threshold should we use to identify DNA sites and motifs? 0.47 < ? < 0.70 P0 A C G T 01 2 0 4 0 G 02 1 0 4 1 G 03 0 6 0 0 C 04 2 0 0 4 T 05 0 0 0 6 T 06 0 6 0 0 C 07 0 6 0 0 C 08 3 0 0 3 W 09 1 4 1 0 C AGCTTCCTC
Alignment of DNA motifs Drawing a ROC curve interpolating TPR and FPR from TRANSFAC alignments, we obtain that values of motif similarity ratio beween 0.60 and 0.55 cover a sensitivity (TPR) range of 0.71-0.80 and a specificity (1-FPR) range of 0.88-0.74. similarity 0.55 – 0.60
Laboratory of Computational BiologyEstación Experimental de Aula Dei / CSICAv. Montañana 1.00550059 Zaragoza (Spain)Tel.: +34 976716089Web: http://www.eead.csic.es/compbio/