1 / 48

A high-resolution map of transcription in the yeast genome

A high-resolution map of transcription in the yeast genome. Wolfgang Huber EMBL - EBI. Genechip S. cerevisiae Tiling Array. 4 bp tiling path over complete genome (12 M basepairs, 16 chromosomes) Sense and Antisense strands 6.5 Mio oligonucleotides 5 m m feature size

aida
Download Presentation

A high-resolution map of transcription in the yeast genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A high-resolution map of transcription in the yeast genome Wolfgang Huber EMBL - EBI

  2. Genechip S. cerevisiae Tiling Array 4 bp tiling path over complete genome (12 M basepairs, 16 chromosomes) Sense and Antisense strands 6.5 Mio oligonucleotides 5 mm feature size manufactured by Affymetrix designed by Lars Steinmetz (EMBL & Stanford Genome Center)

  3. 3,039,046 perfect match probes 7,359 splice junction probes 127,813 YJM789 polymorphism probes 16,271 Tag3 barcode probes

  4. Samples • Genomic DNA • Poly-A RNA (double enriched) from exponential growth in rich media (RH6) • Total RNA from exponential growth in rich media (RH6) • 3 replicates each

  5. RNA Hybridization

  6. Before normalization

  7. Probe specific response normali-zation S/N 3.22 3.47 4.04 remove ‘dead’ probes 4.58 4.36

  8. Probe-specific response normalization siprobe specific response factor. Estimate taken from DNA hybridization data bi =b(si )probe specific background term. Estimation: for strata of probes with similar si, estimate b through location estimator of distribution of intergenic probes, then interpolate to obtain continuous b(s)

  9. Estimation of b: joint distribution of (DNA, RNA) values of intergenic PM probes unannotated transcripts log2 RNA intensity b(s) background log2 DNA intensity

  10. After normalization

  11. Segmentation One option: Moving window: simple, but estimates of transcript boundaries will be biased and depend on expression level Our solution: Fit a piecewise constant function, only parameter is average segment length change point

  12. Structural change model (SCM): piecewise constant functions t1,…, tS: change points Y: normalized intensities x: genomic coordinates mk: level of k-th segment

  13. Model fitting Minimize ... t1,…, tS: change points J: number of replicate arrays

  14. Maximization Naïve optimization has complexity ns, where n≈105 and s≈103. Fortunately, there is a dynamic programming algorithm with complexity O(n2), and good heuristic O(n): F. Picard, S.Robin, M. Lavielle, C. Vaisse, G. Celeux, JJ Daudin, BMC Bioinformatics (2005) Bai+Perron, Journal of Applied Econometrics (2003) Software: W. Huber, packagetilingArray, www.bioconductor.org A. Zeileis, package strucchange, CRAN

  15. Confidence Intervals Di level difference Qi no. data points per unit t Wi error variance (allowing serial correlations) true and estimated change points Vi(s) appropriately scaled and shifted Wiener process (Brownian motion) Bai and Perron, J. Appl. Econometrics 18 (2003)

  16. Segmentation Results 1. Compare to known 2. Discover new

  17. A closer look Splicing Transcribed introns Gray=redundant probes Unprecedented, strand specific resolution

  18. Mapping of UTRs

  19. UTR lengths for 2044 ORFs 68 nucleotides median On average 3’ UTRs are longer than 5’ UTRs No correlation between 3’ and 5’ lengths 91 nucleotides median

  20. Long 5' UTR including cotranscribed uORFs Mapped to precision of 9 bases to known

  21. Transcriptional architectures 921 ORFs were divided into at least two segments MET7- folylpolyglutamate synthetase, catalyzes extension of the glutamate chains of the folate coenzymes

  22. YCK2 GIM3 PCR product Operon-like structures 123 segments contained ORFs of more than one protein-coding gene YCK2 casein kinase I, involved in cytokinesis GIM3 tubulin binding, involved in microtubule biogenesis

  23. Transcription over active promoters Martens, J. A., Laprade, L. & Winston, F. Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature429, 571-574 (2004).

  24. Expressed Features 5654 ORFs with ≥ 7 unique probes 5104 (90%) detected above background (FDR=0.001) untranscribed: meiosis, sporulation, mating, sugar transport, vitamin metabolism 11,412,997 bp of unique sequence 75.2% density of prior annotation (either strand) 84.5% detected above background (") 16.2% of transcribed bp (exp growth in rich media) not yet annotated Fraction of transcribed basepairs

  25. Categorization of segments All segments Segments overlapping annotation Novel transcription Isolated Antisense Confidence filter >48 bp long reduced signal on both sides lower signal on opposite strand

  26. Novel Transcripts

  27. Antisense transcripts CBF1-bs CBF1: regulatory module involved in cell cycle and stress response; DNA replication and chromosome cycle; defects in growth in rich media

  28. Length and expression levels of segments

  29. Conservation of novel isolated Across four yeast species (S. cerevisiae, S. paradoxus, S. mikatae and S. bayanus ) Although some conserved, little overall sequence conservation Lack of protein coding signature

  30. Antisense and UTR length 3’ UTRs have more antisense than 5’ UTRs UTRs with antisense are longer than UTRs without

  31. Antisense transcripts • Cell wall • M phase of meiotic cell cycle • Transcriptional regulator • Monosaccharide transporter activity • (p<2x10-6)

  32. Antisense transcripts: GAC1

  33. Antisense transcripts: HOS4

  34. RNA mediated regulation • UTR lengths associated with function, localization, regulation • Antisense found predominantly to 3’ UTRs and longer UTRs • Antisense correlated with GO categories • Similar to patterns for miRNAs in other species Suggests a functional role for antisense in S. cerevisiae

  35. Cell Cycle Temperature sensitive cdc28 – arrest at G1 Monitored at 10 min intervals for 230 min in total (~3 cell cycles)

  36. G1 cyclin involved in regulation of the cell cycle; activates Cdc28p kinase to promote the G1 to S phase transition

  37. G1 cyclin involved in regulation of the cell cycle; activates Cdc28p kinase to promote the G1 to S phase transition; late G1 specific expression depends on transcription factor complexes, MBF (Swi6p-Mbp1p) and SBF (Swi6p-Swi4p)

  38. Cycling of novel transcript

  39. Cycling of antisense transcript

  40. cdc28 DPH1, Protein required, along with Dph2p, Kti11p, Jjj3p, and Dph5p, for synthesis of diphthamide, which is a modified histidine residue of translation elongation factor 2 Alpha factor arrest

  41. R package tilingArray contains segmentation algorithm DNA reference normalization along-genome plots vignettes to reproduce the plots shown here

  42. Data is available o from EBI's microarray database ArrayExpress: www.ebi.ac.uk/arrayexpress acc.no.: E-TABM-14 o from Bioconductor, data package davidTiling

  43. Conclusions o Conventional microarrays: measure transcript levels o High resolution tiling arrays: also transcript structure introns, exons transcription start & stop sites overlapping populations of transcripts non-coding RNA: UTRs, ncRNAs, antisense o Probe-response normalization: make signal comparable across probes o Model-based segmentation method with exact algorithm, including confidence intervals o Genome-wide evidence for association of non-coding RNA (antisense, UTRs) with function of the corresponding genes

  44. Further computational challenges o Modelling of (linear) ramps in addition to piecewise constant segments o Sequence-based background correction and gain factor modeling (no need for DNA reference hybe?) o Biological interpretation of confidence intervals; non-asymptotic (resampling-based?) methods? o Multiple conditions and time-courses - discovery and testing for differential segment start and end

  45. Acknowledgements Lars Steinmetz EMBL Heidelberg & Lior David, Curt Palm Stanford Genome Tech. Center Marina Granovskaia EMBL Heidelberg Matt Ritchie, Jörn Tödling, Lee Bofkin, Nick Goldman EMBL-EBI Cambridge Bionductor project Robert Gentleman Ben Bolstad Vince Carey Paul Murrell Rafael Irizarry Achim Zeileis

  46. Reverse transcription artifacts mRNA The array measures the sum of cDNA molecules present at each probe Filtered segments: 234 Isolated and 193 Antisense

More Related