1 / 50

Corrélations à longue portée dans les séquences génomiques :

Corrélations à longue portée dans les séquences génomiques : relation avec la structure et la dynamique des nucléosomes. Analyse multi-échelles des génomes. Etude des corrélations à longue portée dans les génomes

greg
Download Presentation

Corrélations à longue portée dans les séquences génomiques :

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corrélations à longue portée dans les séquences génomiques : relation avec la structure et la dynamique des nucléosomes Analyse multi-échelles des génomes

  2. Etude des corrélations à longue portée dans les génomes • Etude des « propriétés globales » des séquences génomiques (Voss, 92 ; Peng et al., 92) : • la nature d’un nucléotide dépend de celle des autres à grande distance (jusqu’à kb) • corrélations à longue portée observées dans les introns (non codants) mais pas dans les exons (codants) • controverse méthodologique (hétérogénéité de composition des génomes) • Mécanismes biologiques proposés : • dynamique des génomes : • réplication- mutation (Li, 93) ; • insertion-deletion (Buldyrev et al., 93) ; • tandem repeats (Dokholian et al., 97 ; Li et al., 98)

  3. What are Long-Range Correlations in DNA sequences ? • Construction of a random sequence • the new nt • without correlations : …ACAGTACT G does not depend on other nucleotides • with short-range correlations : …CGATTAAC A depends on few neighbour nucleotides (Markov chains) • with long-range correlations : …TCCGACGG A depends on all nucleotides over large distances with a power-law correlation function (1/d ) • In genomic sequences these correlation properties can extend over tens to thousands bp • Long-range correlations are scale-invariant

  4. Scale-invariant processes in genomic sequences DNA 10 nt L 100 nt 10 L 1000 nt 100 L Nucleotide compositions are correlated to each other in the same manner whatever the scale

  5. Processes described by Markov Chains • a nucleotide depends only on the adjacent • other nucleotides • the correlation function is : • C (L) ~ exp(-L/L0) • Scale invariant processes • - « zooming » on the sequence does not change the shape of the correlation function • - the correlation function is a power law: • C (L) ~ L-C (k L) ~ k- L- characteristic length L0 : no characteristic scale Short-range correlations Invariance by translation at distances larger than L0,, the nucleotides have no influence Long-range correlations Invariance by dilation - Fractal structure these particular correlation properties differ from repeated motifs or periodic patterns

  6. exon intron • What biological mechanisms ? • Long-range correlations : • observed in introns (non-coding regions) • absent in exons (coding regions) • Long-range correlations can be generated by genome dynamics : • expansion-modification systems (duplication-mutation systems, Li, 1991) • oligonucleotide repeats (Li and Kaneko, 1992) • insertion-deletion of pseudogenes (Buldyrev et al., 1993 • tandem repeats (Dokholian et al., 97 ; Li et al., 98) duplications FIRST HYPOTHESIS: long-range correlations are a consequence of genome dynamics

  7. DNA fragment time mutations Processes without a characteristic length scale Duplication-mutation If the duplication rate is high (e.g. 0.9) and the mutation rate is small (0.1) as the sequence becomes longer and longer, the sequence exhibits long-range correlations (1/f power spectra)

  8. H - 1 1 w1 ( ) 2 w2 Quantification of LRC Sequence length = 260 000 nt (50 % purines) uncorrelated correlated sequence sequence w1 = 32 pb w1 = 32 pb Pu/Pyr coding CPu w1 = 32 pb w2 = 512 pb w2 = 512 pb Pu/Pyr coding CPu w2 = 512 pb (32/512)0.5-1= 4 (32/512)0.9-1= 1.3 roughness exponent H H = 0.5  No LRC H > 0.5 LRC log = (H-1) logw+ Cte

  9. Pu/Pyr coding Pu/Pyr coding Pu/Pyr coding w = 1 pb w = 32 pb w = 512 pb Properties of LRC Sequence length = 260 000 nt (50 % purines) uncorrelated correlated sequence sequence H = 0.5  No LRC H > 0.5 LRC persistence (small “roughness”)

  10. H- 1 1 w1 ( ) log = (H-1) logw+ Cte 2 w2 H=0.8 log2 (wtw) - 0.6 log2w H=0.5 log2w A unique way to display results 1 - Straight line scale invariance properties 2 - The slope gives the roughness exponent H H=0.8 log2 (wtw) H=0.5 log2w H = 0.5 NO LRC H > 0.5 LRC

  11. +∞ 1 x - x0 w w -∞ A WAY TO MEASURE  : THE WAVELET TRANSFORM A. Grossmann & J. Morlet 1984 Computation of the wavelets coefficients T [f x0, w   f(x)  ( ) x0 : spaceparameter w : scaleparameter Advantage : élimination of composition biaises

  12. The wavelet transform eliminates the composition biaises g(1) g(2)

  13. Quantification of LRC w = 8 pb Signal (w8) 8pb DNA Signal coding WT w = 128 pb (w128) Signal 128 pb H x - x0 + 1 T [f x0, w   f(x)  ( ) dx w w - log2 (CO w)-0.6log2w Signal Wavelet 0.8 128 pb 0.5 log2w wt large wt small

  14. intron exon (high GC) all exons Presence of LRC in exonic sequences (human)

  15. Presence of LRC in exonic sequences

  16. Two regimes of LRC A A S. cerevisiae I H = 0.6 II H = 0.8 I H = 0.6 II H = 0.8 I H = 0.6 II H = 0.8 -1 0 1

  17. Two regimes of LRC E. coli Human I H = 0.5 II H = 0.8 I H = 0.6 II H = 0.8 nucleosomes ?

  18. Two regimes of LRC E. coli I H = 0.5 II H = 0.8

  19. Presence of LRC in exonic sequences necessity of a new hypothesis STRUCTURAL HYPOTHESIS : the LRC are assocated to the bending of DNA in nucleosomes Long-range correlations between DNA bending sites ? Test Existence of long-range correlations between di-, tri-nucleotides associated to DNA bending in nucleosomes ? - nucleosomal DNA bending table (Pnuc) -> LRC ? (Andrew & Travers, 1986) Control : - DNase bending table (Dnase) -> no LRC ? (Satchwell et al., 1995) - eubacteria (no nucleosomes) -> no LRC ?

  20. Nucleosome based bending table (Pnuc) Dnase I bending table (DNase) chromatin fiber released nucleosomes nuclease digestion of linker DNA DNase I induces bending Dnase activity is favoured by DNA flexibility dissociation of histones 146 nucleotide DNA fragments digestion of known DNA fragments by Dnase I cloning and sequencing of nucleosomal DNA measurement of cutting efficiency along the DNA molecule sequence analysis of the cutting profile cloning and sequencing of nucleosomal DNA sequence analysis of aligned nucleosomal fragments (Fourier transform) Dnase table Pnuc table

  21. A - tracts preferred here (minor groove inside) (Luger et al., Nature, 1997)

  22. Analysis of nucleosomal DNA AAA frequency position 0 40 60 20 Fourier analysis

  23. Different ways of coding sequences coding treatment H DNA sequence signal text profile Mononucleotide A T G A T C +1 -1 -1 +1 -1 -1 nucleosomal profile Pnuc A T G A T C 6.75.4 flexibility profile Dnase A T G A T C 8.710

  24. Pnuc Dnase

  25. Human Pnuc Dnase random table I II H = 0.6 H = 0.8 Human (chr 21) Dnase I II H = 0.5 H = 0.8 H (Pnuc) > H (Dnase)

  26. EUKARYOTES Human D. melanogaster C. elegans A. thaliana

  27. EUKARYOTES EUBACTERIA Human H. influenzae D. melanogaster M. pneumoniae C. elegans B. subtilis A. thaliana Synechocystis

  28. Bacteriophages T4 Lambda SPBc2 DNA viruses

  29. Bacteriophages T4 Lambda SPBc2 Animal viruses Adenovirus DNA viruses

  30. Bacteriophages T4 Lambda SPBc2 Animal viruses Adenovirus Herpesvirus DNA viruses

  31. Bacteriophages T4 Lambda SPBc2 Animal viruses Adenovirus Herpesvirus DNA viruses M. Sanguinipes (Pox)

  32. SS RNA (-) SS RNA (-) SS RNA (+) SS RNA (+) RNA viruses dS RNA dS RNA

  33. Retroviruses Spumavirus SS RNA (-) SS RNA (-) SS RNA (+) SS RNA (+) RNA viruses dS RNA dS RNA

  34. Retroviruses Spumavirus SS RNA (-) SS RNA (-) SS RNA (+) MMTV SS RNA (+) HIV (1,2) RNA viruses dS RNA dS RNA

  35. Retroviruses Spumavirus SS RNA (-) SS RNA (-) SS RNA (+) MMTV SS RNA (+) HIV (1,2) RNA viruses dS RNA Retroviruses dS RNA

  36. new test of the structural hypothesis • A’s present LRC • A tracts induce DNA curvature • are these LRC specific of A tracts ? A LRC+ A tracts (curvature)A isolated LRC ? LRC ? Control Test

  37. Human (chr 21) A AA Pnuc Dnase Aiso LRC are associated to A tracts, not isolated A

  38. A Pnuc AA Dnase Aiso structural hypothesis : LRC are associated to DNA curvature

  39. Question - to what extent the sequence of DNA contributes to its own packaging into nucleosomes ? Contradictory answers - Nucleosomal DNA is « periodic » (Drew & Travers, 1985, JMB; Bina, 1994, JMB) - Affinity of Eukaryotic DNA for histone octamer (Lowary & Widom, 1997, JMB) : 5 % of genomic sequences strong affinity 95 % of bulk genomic DNA ~ random DNA Two types of nucleosomes : I - strongly binded : periodic repartition of bending sites 5 % genomic DNA II - weakly binded : same bending sites « apparently random » 95 % genomic DNA

  40. Model For most nucleosomes (weakly binded) the bending sites are distributed with long-range correlations. The persistent nature of the distribution of bending sites favours the dynamics of nucleosome formation and diffusion : displacement requires less energy as in super-diffusive processes. This organisation of genome sequences favors dynamical processes. weakly binded nucleosomes DNA Periodic Long-range correlations : persistence H not defined H > 0.5

  41. Human globin locus (70 kb) globin genes bp

  42. Presence of LRC in organelles

  43. Few bacteria present LRC in the 0 - 200 nt range Hypothesis : DNA pakaging in the 0 - 200 nt range specific of these bacteria ?

  44. Presence of LRC (in the 0 - 200 nt range) in archaebacteria Archaeoglobus fulgidus G

  45. The Pnuc coding does not best « extract » LRC in archaebacteria Archaeoglobus fulgidus G

  46. Aeropyrum pernix (56.3% GC)

  47. Aeropyrum pernix (56.3% GC) Sulfolobus solfataricus (35.8% GC)

  48. Conclusion Long-range correlations between DNA bending sites, in the 10-200 nt range are a signature of nucleosomes. Model The persistent nature of the distribution of bending sites favours the dynamics of chromatin Perspectives Find the DNA structural codings (related to DNA packaging?) that better “extract” the LRC in genomic sequences

  49. Samuel Nicolay • Cédric Vaillant • Alain Arnéodo • ENS-Lyon • Benjamin Audit • EMBL-EBI, Cambridge • Marie Touchon • Yves d'Aubenton-Carafa • C. Thermes • CGM, Gif sur Yvette

More Related