180 likes | 386 Views
Ongoing and Future Projects. Tianyin Zhou 4/9/2011. 1.FIS. Two binding sites: the binding affinity(left) 1000x stronger the binding affinity(right) Why? Left: The intrinsic shape of free DNA == Bound DNA more easily binding
E N D
Ongoing and Future Projects Tianyin Zhou 4/9/2011
1.FIS Two binding sites: the binding affinity(left) 1000x stronger the binding affinity(right) Why? Left: The intrinsic shape of free DNA == Bound DNA more easily binding Right: The FIS protein , induce a shape change(narrowing the minor groove), before its binding (requiresenergy, bottleneck)
Minor groove widthbinding affinty • Previous slide: extreme cases • Quantitatively investigate the relationship between • Average DNA minor groove width change(induced by protein) • Resulting binding affinity • Why minor groove width change is so important? FIS dimer Binding is achieved by the insertion of two groups into the adjacent major grooves Minor groove is not narrowed, distance between two adjacent major grooves is too large for the FIS dimer to fit in.
Minor groove width change • Binding affinity is measured by the disassociation coefficient through experiments • How to measure the minor groove width change? • 1.Gibbs free energy • The Lower, the more stable • Free DNA – it has minimum free energy • Free energy(induced DNA) - Free energy(free DNA)
Gibbs Free Energy Enzyme increase the rate of biological reaction How does Enzyme achieve that? By dramatically reducing the required energy increase during the reaction Hypothsis: The higher the free energy difference between the free DNA and the bound DNA The lower the binding rate The lower the binding affinity Binding Affinity Energy Difference
Direct method • How to measure the minor groove width • 2. take the average of minor groove width change for sites showing narrow minor groove width in the complex Average Number (0.9+1.7)/2 = 1.3
Correlation 11 Complexes • 3IV5 - Fis bound to 27 bp optimal binding sequence F1 • 3JR9 - Fis bound to 27 bp optimal binding sequence F2 • 3JRA - Fis bound to 27bp non consensus sequence DNA F6 • 3JRB - Fis bound to 27 bp DNA F24 containing T-tract at center • 3JRC - Fis bound to 27 bp DNA F29 containing 5 G/Cs at center • 3JRD - Fis bound to 27 bp DNA F25 containing T2A3 sequence at center • 3JRE - Fis bound to 27 bp DNA F26 containing A-tract at center • 3JRF - Fis bound to 27 bp DNA F27 containing a C/G at center • 3JRG - Fis bound to 27 bp non consensus sequence DNA F18 • 3JRH - Fis bound to 27 bp non consensus sequence DNA F21 • 3JRI - Fis bound to 27 bp non consensus sequence DNA F23 • 11 Data points on the plot • Fit the data • Affinity = a x2 + b x + c • Questions could be answered: • What’s the proportion of affinity that could be explained by the minor groove change alone? Predict the binding affinity based on the crystal structure
2.Improve Monte Carlo Algorithms • Traditional way: SequenceStructure Simulation Incorporating the hydroxyl cleavage prediction data to our simulation Sequence (query the Orchid Database) Intensity value for each base Benefit: Reduce the required simulation cycles Avoid to generate the artifact
3.Structure prediction without Simulation • ATTAGCTTGACGTAAAAGGG Distance, minor groove width, angle, ………? Take the Tetramer ACGT, query the database, retrieved all its structural parameters Why use tetramer??? Using a slide window to process all the dimers of the sequence ATTAGCTTGACGTAAAAGGG ATTAGCTTGACGTAAAAGGG ATTAGCTTGACGTAAAAGGG When the structural parameters of all the dimers are retrieved, we get our structure prediction ATTAGCTTGACGTAAAAGGG
4. Discovering motif of TF-binding sites Chip-Sequencing Used to analyze protein interactions with DNA
Common Motif • Get a set of DNA sequences • AATTGCGGTTAAGTGCATAC • GACTAGCATTTAAGTACTTA • AAATTTCCTTAAGTTTTAAA • For the sequenced DNA fragment (fragments bound by DNA),only small part of it is the binding site, responsible for the specificity • How to find the common motif? • Discovering the common subsequence (motif)
Position Weight Matrix • A whole bunch of sequence-dependent algorithm • Graphic representation of the discovered motif- PWM Map back, match/mismatch
Why It may Fail? • Protein-DNA specificity • Base readout • Shape readout • For the middle part of binding sites • No base readout • TF favors narrow minor groove (select for shape)
Extend the Traditional Method Expand the alphabet from 4 to 12 For a sequence TGAACCTG Narrow Minor groove KHAADDKH Pre-processed Sequence with Extended Alphabet Motif Discovery by traditional method
Problem ************A********* ***B**** We have a dilemma In terms of shape Mismatch In terms of sequence Match We have (12*11/2)=66 pairs Assign a similarity score for each pair Arbitrary defined? Similarity Matrix
New Representation • A,T,G,C A,T,G,C,W,M,N • Idea behind: For a binding site position it could be either sequence selected or shape selected, it can not be both sequence and shape selected (Yes or No) Motif Discovery by traditional method (Allowing more mismatches) For mismatch part, replace them with shape representation(W, M,N) Discovered candidate Motif (more than one candidates) AAATTTGTTTGAATTTTGAGCAAATTT Final discovered Motif: AAATTTGTTTGNNNNNTGAGCAAATTT