350 likes | 504 Views
DNA structural properties in functional genomics Pieter Meysman , Kathleen Marchal and Kristof Engelen. Structural properties, scales and profiles. The structural properties of the DNA molecule can be roughly divided into two categories: Conformational: details of the static DNA structure.
E N D
DNA structural properties in functional genomicsPieter Meysman, Kathleen Marchal and KristofEngelen
Structural properties, scales and profiles The structural properties of the DNA molecule can be roughly divided into two categories: Conformational: details of the static DNA structure. Physicochemical: dynamic potential of the DNA structure or the free energy.
DNA-binding proteins Nucleosomes Recent models estimate the dinucleotide deformation energy which are better at predicting nucleosome positioning. -- rigidity
DNA-binding proteins Transcription factors binding sites Structural preferences can be used to predict new binding sites.
Promoters Eukaryotes In general, promoters are more rigid than the remainder of the genome. Which is important for excluding nucleosomes from the promoter region. The proximal promoter (where most TFs binding sites are), have a decrease in rigidity (to allow binding of TFs). Extreme rigidity values embedded in elements such as TATA-box (rigid regions are found even when the TATA is not). Prediction of promoters. -- stability -- rigidity
Promoters Prokaryotes Promoters are less stable, more rigid and have more extreme curvature. -- stability -- rigidity -- curvature
Transposons Insertion of transposons preferred into sites with a consensus sequence, a typical deformability, and a high bendability.
Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli. Meysman P, Dang TH, Laukens K, De Smet R, Wu Y, Marchal K, Engelen K. Nucleic Acids Res. 2011 Jan;39(2):e6. Epub 2010 Nov 4.
CRoSSeD methodology CRoSSeD uses structural properties to model and predict novel binding sites. Green: binding sites from RegulonDB
Method evaluation Creating a positive synthetic dataset
Method evaluation 40 positive 1000 negative
Method evaluation Training set 36 positive 104 unknown 900 negative
Method evaluation The synthetic dataset was used to compare the predictive power of CRoSSeD to: Position-weight matrix (PWM). CRFseq: di and tri nucleotides relationships. BioBayesNet: bayesian networks structure-based methodology.
Method evaluation on a real dataset Real datasets were derived from experimentally confirmed binding sites of E.coli (obtained from RegulonDB) Positive: all known binding sites Negative: 1000 random For 17 out of 27 TF CRoSSeD model outperformed the other 3
Screening for novel binding sites To evaluate those novel targets they used gene expression data and extensive literature.
Screening for novel binding sites 14 out of 23 gene sets were enriched with high-scoring predicting binding sites obtained from structural model
CRP Binds as a dimer Introduces two kinks Values in this flexibility is derived from DNase I
PurR Induce a single kink by intercalating a pair of lecine into the minor groove Highest weight was assigned to the stability, the disruption energy.
Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites Bauer AL, Hlavacek WS, Unkefer PJ, Mu F.PLoSComput Biol. 2010 Nov 18;6(11):e1001007
SiteSleuth Combine DNA structural prediction (MD), computational chemistry and machine learning to identify and predict new TFs binding sites.
Sequence specific DNA structure (A) same base, different shape GCTGGGC (left) is twisted −4.3 degrees GCAGAGC (right) is twisted −20.4 degrees. (B) different bases, similar shape GCCAGGC (left) is twisted −9.5 GCCGGGC (right) is twisted −9.5 degrees. (MD simulations)
Mapping of DNA MD simulations 6 shear, buckle, stretch, propeller, stagger and opening. 8 shift, tilt, slide, roll, rise, and twist
Chemical features Interaction energy between the DNA and 31 probes
Mapping of DNA MD simulations 6 shear, buckle, stretch, propeller, stagger and opening. 8 shift, tilt, slide, roll, rise, and twist
Support Vector Machine (SVM) For each of the 54 TFs: Positive: Binding sites from RegulonDB Negative: 10,000 randomly selected non-coding sequences Using SVM
Comparison methods BvH MATRIX SEARCH Match QPMEME
Cross-validation SiteSleuth outperforms all other in 28 cases (out of 54)
SithSleuth VS BvH BvH predicts more estimated false positives than SiteSleuth
Validation against ChIP-chip data *Sitesleuth produced the fewer false positive * SiteSleuth outperformed the other methods with 41% correct predictions
Conclusions * Adding shape information can help in predicting new binding site. * Although SiteSleuth produces the highest fraction of correct predictions, the fraction correct prediction is still small (40%).