1 / 26

TF-DNA binding dependency A progress report

TF-DNA binding dependency A progress report. March 17, 2010 Hugo Willy. Outline. Re-Introduction of my problem Current state of affair Known dependency factor 1 – Rotamer Known dependency factor 2 – Water Known dependency factor 3 – DNA flexibility Some thoughts on what to do next.

neith
Download Presentation

TF-DNA binding dependency A progress report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TF-DNA binding dependencyA progress report March 17, 2010 Hugo Willy

  2. Outline • Re-Introduction of my problem • Current state of affair • Known dependency factor 1 – Rotamer • Known dependency factor 2 – Water • Known dependency factor 3 – DNA flexibility • Some thoughts on what to do next

  3. Re-Introduction • I am working on finding dependency model of TF-DNA binding • What is TF-DNA binding? • If you ask this, you may be in the wrong room • It is known that different TFs prefer different DNA sequence to bind to. • Classic example TATA box binding proteins binds the sequence “TATA”.

  4. Re-Introduction (2) • It is commonly assumed that each position in T-A-T-A contributes independently to the binding energy. • That is to say, some guys from the TF will bind the first “T”, some other will bind the second “A” and so on. • If the sequence become CATA, then it depends on how much the guys who binds the 1st position likes the new “C”. If they are OK, the binding energy may change a little but the TF still binds. • Otherwise, too bad.

  5. Re-Introduction (3) • One such model, a very popular one, is the PSSM model. • And it is shown to be very good in estimating the real binding sites of many TF. • However, some were curious whether the model holds for all TF.

  6. Current state of affair • There are quite a few publications which tries to show that there are measurable dependencies among the positions. • RECOMB 2003-Modeling dependencies in Protein-DNA binding sites • Multi PSSM, Tree, Multi Tree. Bayesian network based training. • Bioinformatics 2004-Modeling within-motif dependence for transcription factor binding site predictions • PSSM with pairwise correlated position using Bayes Factor. Gibbs sampling based. • BIBE 2006-Discovering DNA Motifs with Nucleotide Dependency • PSSM with multi-positions, heuristic. • Bioinformatics 2007-Position dependencies in transcription factor binding sites • Checks dependencies within a set of aligned binding site with different statistical measures.

  7. Current state of affair (2) • Bioinformatics 2008-Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors • Neural network based. • PLoSCompBio 2008-A Feature-Based Approach to Modeling Protein-DNA Interactions • Feature based – currently only consider pairwise position dependency feature. • NAR 2010-On the detection and refinement of transcription factor binding sites using ChIP-Seq data • Similar to Bioinformatics 2004.

  8. Current state of affair (3) • However, they have a similar framework • Start with a set of “known” binding sequence • Try to guess a model with and without dependencies • Train the model using the dataset (possibly making gradual change on the model during the training) • Compare which model is better • They will list down the positions with dependencies – most are consecutive positions, but some have quite distant positions.

  9. Current state of affair (4) • Well, these are just a fitting of a model to a set of sequence known to bind. The binding energy was not really taken into account. • So others, with more $$$ in their lab, did a huge biological experiments and try to see if the experimental binding energies of some TFs do exhibit some dependency pattern.

  10. Current state of affair (5) • Hence some more paper, • NAR 2002-Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors • NAR2002-Additivity in protein-DNA interactions-how good an approximation is it? • Nature Biotechnology 2006-Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities • Science 2009-Diversity and Complexity in DNA Recognition by Transcription Factors • PLoSCompBio 2009-Inferring Binding Energies from Selected Binding Sites

  11. Current state of affair (6) From Science 2009, Protein binding microarray experiment.

  12. Current state of affair (7) • Yet, none of the publication I have read so far gives a concrete evidence on HOW such dependencies could happen. • We are now trying to find the answer on what happen on the physical level when two positions in the DNA are dependent.

  13. Known dependency factor 1 – Rotamer • Recently there is an experiment involving the Zinc Finger TF, Zf268 which has been one of the most popular Zinc finger modeling target.

  14. Known dependency factor 1 – Rotamer • They tried to change the DNA sequence of the wildtype GCG to ACG, CCG, AAG, and CAG • We try to see if a program that can change the side chains of the TF to conform to the new DNA sequence can approximate the change in the binding energy. • We tried FoldX – it does rotamer checks-not sure if it is optimal.

  15. FoldX results

  16. Known dependency factor 1 – Rotamer • However, the rotamers that FoldX predict does not coincide with the diagrams. • Either FoldX is not optimal, or the homology modeling done in the paper is not accurate. • But given the close agreement on the predicted and experimental difference in the binding affinity, most probably they are (more) correct. • I am still checking on that.

  17. Known dependency factor 2 – Water • The thing that is explicitly computed in the NAR paper are the solvation penalties (the circles, rectangles and triangles in the diagram). • They claim that the water mediated H-bonds are not that crucial. • We can see that FoldX does compute hydration to a certain extent. Yet the rotamer search may not be good enough.

  18. Different solvation state of polar atoms

  19. Known dependency factor 3 – DNA flexibility • DNA are not a rigid rod.

  20. Known dependency factor 3 – DNA flexibility T A C G

  21. Known dependency factor 3 – DNA flexibility

  22. Known dependency factor 3 – DNA flexibility • G-C will have higher roll angle – making it less stable (weaker stacking energy) and easier to “open”. • There are several work showing that different dinucleotide steps have different bending and twisting energy.

  23. Known dependency factor 3 – DNA flexibility • TATA binding protein actually binds TATA not because it generates the best binding energy • The bindings are mostly non-specific.

  24. Known dependency factor 3 – DNA flexibility

  25. Conclusion • Up to now, the 3 factors are the known/most probable factors of DNA dependency. • The challenge would be to combine all these into one scoring function that is simple enough to run on large dataset.

  26. Thank you for bearing with me.Q & A

More Related