500 likes | 634 Views
Corrélations à longue portée dans les séquences génomiques : relation avec la structure et la dynamique des nucléosomes. Analyse multi-échelles des génomes. Etude des corrélations à longue portée dans les génomes
E N D
Corrélations à longue portée dans les séquences génomiques : relation avec la structure et la dynamique des nucléosomes Analyse multi-échelles des génomes
Etude des corrélations à longue portée dans les génomes • Etude des « propriétés globales » des séquences génomiques (Voss, 92 ; Peng et al., 92) : • la nature d’un nucléotide dépend de celle des autres à grande distance (jusqu’à kb) • corrélations à longue portée observées dans les introns (non codants) mais pas dans les exons (codants) • controverse méthodologique (hétérogénéité de composition des génomes) • Mécanismes biologiques proposés : • dynamique des génomes : • réplication- mutation (Li, 93) ; • insertion-deletion (Buldyrev et al., 93) ; • tandem repeats (Dokholian et al., 97 ; Li et al., 98)
What are Long-Range Correlations in DNA sequences ? • Construction of a random sequence • the new nt • without correlations : …ACAGTACT G does not depend on other nucleotides • with short-range correlations : …CGATTAAC A depends on few neighbour nucleotides (Markov chains) • with long-range correlations : …TCCGACGG A depends on all nucleotides over large distances with a power-law correlation function (1/d ) • In genomic sequences these correlation properties can extend over tens to thousands bp • Long-range correlations are scale-invariant
Scale-invariant processes in genomic sequences DNA 10 nt L 100 nt 10 L 1000 nt 100 L Nucleotide compositions are correlated to each other in the same manner whatever the scale
Processes described by Markov Chains • a nucleotide depends only on the adjacent • other nucleotides • the correlation function is : • C (L) ~ exp(-L/L0) • Scale invariant processes • - « zooming » on the sequence does not change the shape of the correlation function • - the correlation function is a power law: • C (L) ~ L-C (k L) ~ k- L- characteristic length L0 : no characteristic scale Short-range correlations Invariance by translation at distances larger than L0,, the nucleotides have no influence Long-range correlations Invariance by dilation - Fractal structure these particular correlation properties differ from repeated motifs or periodic patterns
exon intron • What biological mechanisms ? • Long-range correlations : • observed in introns (non-coding regions) • absent in exons (coding regions) • Long-range correlations can be generated by genome dynamics : • expansion-modification systems (duplication-mutation systems, Li, 1991) • oligonucleotide repeats (Li and Kaneko, 1992) • insertion-deletion of pseudogenes (Buldyrev et al., 1993 • tandem repeats (Dokholian et al., 97 ; Li et al., 98) duplications FIRST HYPOTHESIS: long-range correlations are a consequence of genome dynamics
DNA fragment time mutations Processes without a characteristic length scale Duplication-mutation If the duplication rate is high (e.g. 0.9) and the mutation rate is small (0.1) as the sequence becomes longer and longer, the sequence exhibits long-range correlations (1/f power spectra)
H - 1 1 w1 ( ) 2 w2 Quantification of LRC Sequence length = 260 000 nt (50 % purines) uncorrelated correlated sequence sequence w1 = 32 pb w1 = 32 pb Pu/Pyr coding CPu w1 = 32 pb w2 = 512 pb w2 = 512 pb Pu/Pyr coding CPu w2 = 512 pb (32/512)0.5-1= 4 (32/512)0.9-1= 1.3 roughness exponent H H = 0.5 No LRC H > 0.5 LRC log = (H-1) logw+ Cte
Pu/Pyr coding Pu/Pyr coding Pu/Pyr coding w = 1 pb w = 32 pb w = 512 pb Properties of LRC Sequence length = 260 000 nt (50 % purines) uncorrelated correlated sequence sequence H = 0.5 No LRC H > 0.5 LRC persistence (small “roughness”)
H- 1 1 w1 ( ) log = (H-1) logw+ Cte 2 w2 H=0.8 log2 (wtw) - 0.6 log2w H=0.5 log2w A unique way to display results 1 - Straight line scale invariance properties 2 - The slope gives the roughness exponent H H=0.8 log2 (wtw) H=0.5 log2w H = 0.5 NO LRC H > 0.5 LRC
+∞ 1 x - x0 w w -∞ A WAY TO MEASURE : THE WAVELET TRANSFORM A. Grossmann & J. Morlet 1984 Computation of the wavelets coefficients T [f x0, w f(x) ( ) x0 : spaceparameter w : scaleparameter Advantage : élimination of composition biaises
The wavelet transform eliminates the composition biaises g(1) g(2)
Quantification of LRC w = 8 pb Signal (w8) 8pb DNA Signal coding WT w = 128 pb (w128) Signal 128 pb H x - x0 + 1 T [f x0, w f(x) ( ) dx w w - log2 (CO w)-0.6log2w Signal Wavelet 0.8 128 pb 0.5 log2w wt large wt small
intron exon (high GC) all exons Presence of LRC in exonic sequences (human)
Two regimes of LRC A A S. cerevisiae I H = 0.6 II H = 0.8 I H = 0.6 II H = 0.8 I H = 0.6 II H = 0.8 -1 0 1
Two regimes of LRC E. coli Human I H = 0.5 II H = 0.8 I H = 0.6 II H = 0.8 nucleosomes ?
Two regimes of LRC E. coli I H = 0.5 II H = 0.8
Presence of LRC in exonic sequences necessity of a new hypothesis STRUCTURAL HYPOTHESIS : the LRC are assocated to the bending of DNA in nucleosomes Long-range correlations between DNA bending sites ? Test Existence of long-range correlations between di-, tri-nucleotides associated to DNA bending in nucleosomes ? - nucleosomal DNA bending table (Pnuc) -> LRC ? (Andrew & Travers, 1986) Control : - DNase bending table (Dnase) -> no LRC ? (Satchwell et al., 1995) - eubacteria (no nucleosomes) -> no LRC ?
Nucleosome based bending table (Pnuc) Dnase I bending table (DNase) chromatin fiber released nucleosomes nuclease digestion of linker DNA DNase I induces bending Dnase activity is favoured by DNA flexibility dissociation of histones 146 nucleotide DNA fragments digestion of known DNA fragments by Dnase I cloning and sequencing of nucleosomal DNA measurement of cutting efficiency along the DNA molecule sequence analysis of the cutting profile cloning and sequencing of nucleosomal DNA sequence analysis of aligned nucleosomal fragments (Fourier transform) Dnase table Pnuc table
A - tracts preferred here (minor groove inside) (Luger et al., Nature, 1997)
Analysis of nucleosomal DNA AAA frequency position 0 40 60 20 Fourier analysis
Different ways of coding sequences coding treatment H DNA sequence signal text profile Mononucleotide A T G A T C +1 -1 -1 +1 -1 -1 nucleosomal profile Pnuc A T G A T C 6.75.4 flexibility profile Dnase A T G A T C 8.710
Pnuc Dnase
Human Pnuc Dnase random table I II H = 0.6 H = 0.8 Human (chr 21) Dnase I II H = 0.5 H = 0.8 H (Pnuc) > H (Dnase)
EUKARYOTES Human D. melanogaster C. elegans A. thaliana
EUKARYOTES EUBACTERIA Human H. influenzae D. melanogaster M. pneumoniae C. elegans B. subtilis A. thaliana Synechocystis
Bacteriophages T4 Lambda SPBc2 DNA viruses
Bacteriophages T4 Lambda SPBc2 Animal viruses Adenovirus DNA viruses
Bacteriophages T4 Lambda SPBc2 Animal viruses Adenovirus Herpesvirus DNA viruses
Bacteriophages T4 Lambda SPBc2 Animal viruses Adenovirus Herpesvirus DNA viruses M. Sanguinipes (Pox)
SS RNA (-) SS RNA (-) SS RNA (+) SS RNA (+) RNA viruses dS RNA dS RNA
Retroviruses Spumavirus SS RNA (-) SS RNA (-) SS RNA (+) SS RNA (+) RNA viruses dS RNA dS RNA
Retroviruses Spumavirus SS RNA (-) SS RNA (-) SS RNA (+) MMTV SS RNA (+) HIV (1,2) RNA viruses dS RNA dS RNA
Retroviruses Spumavirus SS RNA (-) SS RNA (-) SS RNA (+) MMTV SS RNA (+) HIV (1,2) RNA viruses dS RNA Retroviruses dS RNA
new test of the structural hypothesis • A’s present LRC • A tracts induce DNA curvature • are these LRC specific of A tracts ? A LRC+ A tracts (curvature)A isolated LRC ? LRC ? Control Test
Human (chr 21) A AA Pnuc Dnase Aiso LRC are associated to A tracts, not isolated A
A Pnuc AA Dnase Aiso structural hypothesis : LRC are associated to DNA curvature
Question - to what extent the sequence of DNA contributes to its own packaging into nucleosomes ? Contradictory answers - Nucleosomal DNA is « periodic » (Drew & Travers, 1985, JMB; Bina, 1994, JMB) - Affinity of Eukaryotic DNA for histone octamer (Lowary & Widom, 1997, JMB) : 5 % of genomic sequences strong affinity 95 % of bulk genomic DNA ~ random DNA Two types of nucleosomes : I - strongly binded : periodic repartition of bending sites 5 % genomic DNA II - weakly binded : same bending sites « apparently random » 95 % genomic DNA
Model For most nucleosomes (weakly binded) the bending sites are distributed with long-range correlations. The persistent nature of the distribution of bending sites favours the dynamics of nucleosome formation and diffusion : displacement requires less energy as in super-diffusive processes. This organisation of genome sequences favors dynamical processes. weakly binded nucleosomes DNA Periodic Long-range correlations : persistence H not defined H > 0.5
Human globin locus (70 kb) globin genes bp
Few bacteria present LRC in the 0 - 200 nt range Hypothesis : DNA pakaging in the 0 - 200 nt range specific of these bacteria ?
Presence of LRC (in the 0 - 200 nt range) in archaebacteria Archaeoglobus fulgidus G
The Pnuc coding does not best « extract » LRC in archaebacteria Archaeoglobus fulgidus G
Aeropyrum pernix (56.3% GC) Sulfolobus solfataricus (35.8% GC)
Conclusion Long-range correlations between DNA bending sites, in the 10-200 nt range are a signature of nucleosomes. Model The persistent nature of the distribution of bending sites favours the dynamics of chromatin Perspectives Find the DNA structural codings (related to DNA packaging?) that better “extract” the LRC in genomic sequences
Samuel Nicolay • Cédric Vaillant • Alain Arnéodo • ENS-Lyon • Benjamin Audit • EMBL-EBI, Cambridge • Marie Touchon • Yves d'Aubenton-Carafa • C. Thermes • CGM, Gif sur Yvette