A high-resolution map of transcription in the yeast genome

A high-resolution map of transcription in the yeast genome Wolfgang Huber European Molecular Biology Laboratory (EMBL) - European Bioinformatics Institute (EBI) Cambridge UK

Genechip S. cerevisiae Tiling Array 4 bp tiling path over complete genome (12 M basepairs, 16 chromosomes) Sense and Antisense strands 6.5 Mio oligonucleotides 5 mm feature size manufactured by Affymetrix designed by Lars Steinmetz (EMBL & Stanford Genome Center)

Samples o Genomic DNA o Poly-A RNA (double enriched) from exponential growth in rich media (RH6) o Total RNA from exponential growth in rich media (RH6) o 3 replicates each

Before probe-specific normalization

Probe specific response normali-zation S/N 3.22 3.47 4.04 remove ‘dead’ probes 4.58 4.36

Probe-specific response normalization siprobe specific response factor. Estimate taken from DNA hybridization data bi =b(si )probe specific background term. Estimation: for strata of probes with similar si, estimate b through location estimator of distribution of intergenic probes, then interpolate to obtain continuous b(s)

Estimation of b: joint distribution of (DNA, RNA) values of intergenic PM probes unannotated transcripts log2 RNA intensity b(s) background log2 DNA intensity

After normalization

Segmentation Two obvious options: Smoothing and thresholding: simple, but estimates of transcript boundaries will be biasedand depend on expression level Hidden Markov Model (HMM): but our “states” come from a continuum, unclear how to discretize Our solution: Fit a piecewise constant function change point

Structural change model (SCM): piecewise constant functions t1,…, tS: change points Y: normalized intensities x: genomic coordinates mk: level of k-th segment

Model fitting Minimize t1,…, tS: change points J: number of replicate arrays

Maximization Naïve optimization has complexity ns, where n≈105 and s≈103. Fortunately, there is a dynamic programming algorithm with complexity O(n2), and good heuristic O(n): F. Picard, S.Robin, M. Lavielle, C. Vaisse, G. Celeux, JJ Daudin, BMC Bioinformatics (2005) Bai+Perron, Journal of Applied Econometrics (2003) Software: W. Huber, packagetilingArray, www.bioconductor.org A. Zeileis, package strucchange, CRAN

Confidence Intervals Di level difference Qi no. data points per unit t Wi error variance (allowing serial correlations) true and estimated change points Vi(s) appropriately scaled and shifted Wiener process (Brownian motion) Bai and Perron, J. Appl. Econometrics 18 (2003)

Model extensions:general piecewise linear models t1,…, tS: change points Y: normalized intensities x: genomic coordinates bk: model matrix of k-th segment

A closer look

Mapping of UTRs

UTR lengths for 2044 ORFs 68 nucleotides median On average 3’ UTRs are longer than 5’ UTRs No correlation between 3’ and 5’ lengths 91 nucleotides median

Long 5' UTR including cotranscribed uORFs Mapped to precision of 9 bases to known

Transcriptional architectures 921 ORFs were divided into at least two segments MET7- folylpolyglutamate synthetase, catalyzes extension of the glutamate chains of the folate coenzymes

YCK2 GIM3 PCR product Operon-like structures 123 segments contained ORFs of more than one protein-coding gene YCK2 casein kinase I, involved in cytokinesis GIM3 tubulin binding, involved in microtubule biogenesis

Adjacent transcripts of non-coding and coding genes Martens, J. A., Laprade, L. & Winston, F. Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature429, 571-574 (2004).

Expressed Features 5654 ORFs with ≥ 7 unique probes 5104 (90%) detected above background (FDR=0.001) untranscribed: meiosis, sporulation, mating, sugar transport, vitamin metabolism 11,412,997 bp of unique sequence 75.2% density of prior annotation (either strand) 84.5% detected above background (") 16.2% of transcribed bp (exp growth in rich media) not yet annotated Fraction of transcribed basepairs

Novel Transcripts

Antisense transcripts CBF1-bs CBF1: regulatory module involved in cell cycle and stress response; DNA replication and chromosome cycle; defects in growth in rich media

Novel transcripts Basis: multiple alignment of 4 yeast genomes: S.cerevisiae, S.bayanus, S.mikatae, S.paradoxus. Kellis et al. Nature (2003) Conservation analysis: fraction of segments for which there is a multiple alignment; total tree length Codon signature: 3-periodicity of mutation frequencies novel transcribed segments  untranscribed << annotated transcripts. with Lee Bofkin, Nick Goldman

Antisense and UTR length 3’ UTRs have more antisense than 5’ UTRs UTRs with antisense are longer than UTRs without

Antisense transcripts • microtubule-mediated nuclear migration • cell separation during cytokinesis • cell wall • single-stranded RNA binding (NAB2, NAB3, NPL3, PAB1, SGN1) • Meiosis genes

Antisense transcripts: NAB2

Antisense transcripts: NAB3

RNA mediated regulation UTR lengths correlate with function, localization, regulation Antisense correlate with GO categories Antisense found predominantly to 3’ UTRs and longer UTRs 3’ UTRs are targets of miRNAs in other species … suggesting a functional role of antisense transcripts in S. cerevisiae

Antisense to CLN2 – G1 cyclin

R package tilingArray contains segmentation algorithm DNA reference normalization along-genome plots vignettes to reproduce the plots shown here

Data is available o from EBI's microarray database ArrayExpress: www.ebi.ac.uk/arrayexpress acc.no.: E-TABM-14 o from Bioconductor, data package davidTiling

Conclusions o Conventional microarrays: measure transcript levels o High resolution tiling arrays: also transcript structure introns, exons transcription start & stop sites overlapping populations of transcripts non-coding RNA: UTRs, ncRNAs, antisense o Probe-response normalization: make signal comparable across probes o Model-based segmentation method with exact algorithm, including confidence intervals o Genome-wide evidence for association of non-coding RNA (antisense, UTRs) with function of the corresponding genes

Acknowledgements Lars Steinmetz EMBL Heidelberg & Lior David, Curt Palm Stanford Genome Tech. Center Marina Granovskaia EMBL Heidelberg Jörn Tödling, Lee Bofkin, Nick Goldman EMBL-EBI Cambridge Bionductor project Robert Gentleman Ben Bolstad Vince Carey Paul Murrell Rafael Irizarry Achim Zeileis

Model selection criteria

RRB1 – essential regulator of ribosome biogenesisJPL2 – protein of unknown function

Cell Cycle Temperature sensitive cdc28 – arrest at G1 Monitored at 10 min intervals for 230 min in total (>2 cell cycles)

MET7 has a later conserved M

Length and expression levels of segments

3,039,046 perfect match probes 7,359 splice junction probes 127,813 YJM789 polymorphism probes 16,271 Tag3 barcode probes

The group Matt Ritchie Jörn Tödling Lígia Brás Wolfgang Huber Thomas Horn Oleg Sklyar

A high-resolution map of transcription in the yeast genome

A high-resolution map of transcription in the yeast genome

Presentation Transcript

MAP kinase Pathways in Yeast

A High-Resolution Model of the Apalachee Bay

High-resolution computational models of genome binding events

Linear map of the LSDV genome.

Figures in High Resolution

Transcription regulation (in YEAST ): a genomic network

A high-resolution map of transcription in the yeast genome

array of plenty - results from a 4 base resolution yeast genome tiling array

Analyzing transcription modules in the pathogenic yeast Candida albicans

A High-Resolution

3D model of the folded yeast genome

High-resolution mapping and analysis of the human regulatory genome

Genome Evolution in Yeast

Genome-wide Regulatory Complexity in Yeast Promoters

Defining the transcriptional landscape of the yeast genome at nucleotide resolution

Yeast genome sequencing: the power of comparative genomics

array of plenty - results from a 4 base resolution yeast genome tiling array

High-resolution modelling in mountainous areas: MAP results

High Resolution Hypernuclear Spectroscopy in Hall A

High-resolution genome-wide mapping of histone modifications

Yeast Extracts - high nuclear yeast extract series

Genome map