320 likes | 404 Views
Replication associated strand asymmetries in mammalian genomes In silico detection of replication origins. Samuel Nicolay Benjamin Audit Edward Brodie of Brodie Alain Arneodo (ENS-Lyon). Maxime Huvet Marie Touchon Yves d'Aubenton-Carafa Claude Thermes (CGM, Gif sur Yvette).
E N D
Replication associated strand asymmetries in mammalian genomesIn silico detection of replication origins
Samuel Nicolay Benjamin Audit Edward Brodie of Brodie Alain Arneodo (ENS-Lyon) Maxime Huvet Marie Touchon Yves d'Aubenton-Carafa Claude Thermes (CGM, Gif sur Yvette) Supports: CNRS, ACI IMPBio, ANR
Bacteria/Archaebacteria Human chromosomes A (Mb) A (Mb) T (Mb) T (Mb) G (Mb) G (Mb) C (Mb) C (Mb) « SECOND PARITY RULE » Long genome sequence fragments tend to show on the same strand:fA = fT and fG = fC
Same mutation/repair processes on the 2 DNA strands Same values of complementary sustitution rates G A T C LARGE SCALE PROPERTIES OF GENOMIC MUTATIONS at equilibrium Second Parity rule (PR2): fA = fT and fG = fC (at large scales) (Chargaff, 1962; Sueoka, Lobry, 1995)
replication origin nG – nC nG + nC nT – n A nT + n A SGC = STA = EUBACTERIA: G > C and T > A in the leading strand > 0 > 0 What mechanisms cause composition asymmetries ? REPLICATION : asymmetry of mutation/repair processes between leading and lagging strands lagging strand 5’ 3’ 5’ leading strand 3’
SGC = 1 kb windows nG – nC nG + nC ORI TER TER Bacillus subtilis SGC x 106 pb lagging strand leading strand 5’ 3’ G < C G > C Composition asymmetry in procaryotes
nT – n A nT + n A nG – nC nG + nC STA = SGC = EUBACTERIA: G > C and T > A on the non-transcribed strand > 0 > 0 What mechanisms cause composition asymmetries ? TRANSCRIPTION : asymmetry of mutation/repair processes between transcribed and non-transcribed strands non-transcribed strand RNA POLYMERASE 5’ 3’ 3’ 5’ 3’ transcribed strand 5’
transcriptional skew profile replicative skew profile (-) (+) ORI ORI 5’ 5’ 3’ 3’ 5’ 3’ 3’ 5’ 3’ lagging strand leading strand 5’ 3’ 5’ lagging strand leading strand 3’ transcribed strand non-transcribed strand S 5’ 5’ 3’ 0 S S 0 0 superposition of replication and transcription Skew profiles associated to transcription and replication in Eubacteria S = STA + SGC
genes (strand +) genes (strand -) intergenic regions Bacillus subtilis S Mbp
STRAND ASYMMETRIES IN EUKARYOTES ? 1. Strand asymmetries associated to transcription in the human genome
Intergenic sequences Intergenic sequences 6 6 6 8 8 5’ 3’ 4 4 4 2 2 2 nT – n A nT + n A nG – nC nG + nC STA = SGC = STA 0 0 0 Mean skew associated to transcription -2 -2 -2 -40 -20 0 20 40 -40 -20 0 20 40 8 8 ∆S = STA + SGC ~ 7% 5’ 3’ 6 4 SGC 2 0 -2 -40 -20 0 20 40 -40 -20 0 20 40 (kb) Strand asymmetries associated to transcription in human genes Introns (126 000) ≈ 12 000 genes (no exons, no repeats) Upward jumps (5’) Downward jumps (3’)
2. Strand asymmetries associated to replication in the human genome
genes (strand +) genes (strand -) intergenic regions Skew profiles around human replication origins
ORI 5' 3' genes (strand +) genes (strand -) intergenic regions S S 0 Transcription : ∆S ~ ± 7% Replication : ∆S ~ + 14% Superimposition of replication and transcription biases ORI
Conservation of replication origins in mammalian genomes Conservation of skew profiles in mammalian genomes human mouse rat dog
3. In silico detection of replication origins in the human genome
ORI 5' 3' S 0 Genes ORI ORI Mean size : 30 kb 100 kb 1 Mb Detection of upward jumps associated to replication • Main problem : • necessity to avoid the jumps due only to transcription • Scale of analysis : • larger than typical size of genes • smaller than typical size of replicons necessity of multi-scale analysis
S w numerous jumps high precision w =10 kb first derivative w =50 kb S derivative w =100 kb few jumps low precision w =200 kb Multi scale jump detection using the wavelet transform S S
position of transitions (1 kb) Signal smoothened at large scale (200 kb) Identification of transitions Multi scale jump detection using the wavelet transform
Asymmetry of the human genome Histograms of jump amplitude upward downward %
« Factory roofs » around experimentally determined replication origins MCM4 TOP1 S S x (kb)
Conservation of potential origins in mammalian genomes human mouse dog
O O T T O at each cycle: after several cycles: after N cycles: Ori 1 Ori 2 Ori 1 Ori 2 Ori 1 Ori 2 Procaryote Eucaryote S Model of eucaryotic replicon Replication terminaison sites : distributed between fixed adjacent origins
Detection of factory roofs using the wavelet transform factory roof wavelets • 759 « factory roofs spanning » • ~ 40% of the human genome
factory roofs = 40 % factory roofs < 1 % ASYMMETRY OF HUMAN GENOME
transcriptional skew profile (-) (+) ORI ORI 5’ 3’ 3’ 5’ 5’ 5’ 3’ 3’ 3’ 5’ 5’ 3’ 3’ 5’ transcribed strand non-transcribed strand 5’ 3’ S 0 ORI ORI 5’ 3’ S 0 EUCARYOTIC REPLICON MODEL replicative skew profile superposition of transcription and replication
ori early late Position on human chromosome 6 (Mbp) Comparison with replication timing data Replication timing Woodfine et al., Cell Cycle (2005)
Organisation of transcription around predicted replication origins Co-orientation of transcription and replication
Model of mammalian chromatin organization Open chromatin ORI ORI Genomic DNA S Replication origins are situated at the center of open chromatin regions
Conclusions • Existence of replication-coupled strand asymmetries in human genome • Replication origins correspond to large transitions of skew profiles • These transitions are conserved in mammalian genomes • Detection of more than one thousand putative origins active in germ-line cells • « Factory roof » profiles : regularly distributed termination sites • Essential rome of replication in organisation of gene order and expression