1 / 37

array of plenty - results from a 4 base resolution yeast genome tiling array

array of plenty - results from a 4 base resolution yeast genome tiling array. Wolfgang Huber Lars Steinmetz European Molecular Biology Laboratory. Genechip S. cerevisiae Tiling Array. 4 bp tiling path over complete genome (12 Mio basepairs, 16 chromosomes) Sense and Antisense strands

Download Presentation

array of plenty - results from a 4 base resolution yeast genome tiling array

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. array of plenty - results from a 4 base resolution yeast genome tiling array Wolfgang Huber Lars Steinmetz European Molecular Biology Laboratory

  2. Genechip S. cerevisiae Tiling Array 4 bp tiling path over complete genome (12 Mio basepairs, 16 chromosomes) Sense and Antisense strands 6.5·106 oligonucleotides 5 mm feature size Chips manufactured by Affymetrix Application + analysis by L. Steinmetz (EMBL/Stanford Genome Center) and W. Huber (EMBL/EBI)

  3. 3,039,046 perfect match probes 7,359 splice junction probes 127,813 YJM789 polymorphism probes 16,271 Tag3 barcode probes The first complete genome on one array

  4. Samples Genomic DNA Poly-A RNA (double enriched) from exponential growth in rich media Total RNA from exponential growth in rich media 3 replicates each

  5. RNA Hybridization

  6. before Probe specific affinity normalization after

  7. Probe specific affinity normali-zation

  8. Probe-specific affinity normalization si probe-sequence specific affinity. Estimation: geometric mean of intensities from DNA hybridization bi =b(si ) probe-sequence specific background. Estimation: for strata of probes with similar si, estimate b through location estimator of distribution of intergenic probes, then interpolate to obtain continuous b(s)

  9. Segmentation Two obvious options: Smoothing (e.g. running median) and thresholding: simple, but estimates of change points will be biased and depend on expression level Hidden Markov Model (HMM): but our “states” come from a continuum! Fiddly. Our solution: Fit a piecewise constant function change point

  10. The model t1,…, tS: change points Y: normalized intensities x: genomic coordinates mk: level of k-th segment

  11. Model fitting Minimize t1,…, tS: change points J: number of replicate arrays

  12. Maximization Naïve optimization has complexity ns, where n≈105 and s≈103. Fortunately, there is a dynamic programming algorithm with complexity ≈n2: F. Picard et al. A statistical approach for array CGH data analysis. BMC Bioinformatics 6 (2005) Implementation: W. Huber, Bioconductor packagetilingArray

  13. Piecewise linear models strucchangepackage by Achim Zeileis TU Vienna (CRAN): - more general piecewise linear models - confidence intervals Confidence intervals based on asymptotic theory of Bai+Perron (2003) Dynamic programming algorithm has been around since mid 1970s. Context: mostly econometrics

  14. Confidence Intervals Di level difference Qi no. data points / unit t Wi error variance (allowing serial correlations) true and estimated change points Vi(s) appropriately scaled and shifted Wiener process (Brownian motion) Bai and Perron, J. Appl. Econometrics 18 (2003)

  15. Model selection criteria model family has just one parameter: no. of segments

  16. Results

  17. Splicing

  18. Unexpected Transcript Structure

  19. Novel Transcripts

  20. Novel Transcripts Potential antisense regulator

  21. Defining Expressed Transcripts Segments not overlapping any annotated features Segments overlapping annotated features Normal distribution

  22. Expressed Features 5646 ORFs with ≥ 7 probes 5306 (94%) above background in poly-A RNA 5192 (92%) in total RNA (FDR=0.001) untranscribed: meiosis, sporulation poly-A RNA: 9356k of 11360k (82.4%) total RNA: 8786k (77.2%) Both: 9612k (84.3%) … of which not annotated: 1559k (13.7%) annotated total: 8997k of 12071k (74.5%) Fraction of transcribed basepairs

  23. Novel transcripts Basis: multiple alignment of 4 yeast genomes: S.cerevisiae, S.bayanus, S.mikatae, S.paradoxus. Kellis et al. Nature (2003) Conservation analysis: fraction of segments for which there is a multiple alignment; total tree length Codon signature: 3-periodicity of mutation frequencies novel transcribed segments  untranscribed << annotated transcripts. with Lee Bofkin, Nick Goldman

  24. Antisense transcripts • microtubule-mediated nuclear migration • cell separation during cytokinesis • cell wall • single-stranded RNA binding (all 5: NAB2, NAB3, NPL3, PAB1, SGN1) • (p<2x10-16)

  25. Antisense transcripts: NAB2

  26. Antisense transcripts: NAB3

  27. Antisense transcripts: NPL3 (?)

  28. Antisense transcripts: PAB1

  29. Antisense transcripts: SGN1 (?)

  30. Mapping of UTRs:

  31. UTR lengths

  32. Bioconductor package tilingArray contains Picard’s segmentation algorithm Along-chromosome plots To do: better user-interface and documentation

  33. Data will be publically available from EBI's microarray database ArrayExpress www.ebi.ac.uk/arrayexpress

  34. Conclusions o Conventional microarrays: measure transcript levels o High resolution tiling arrays: also transcript structure introns, exons, alternative transcription start sites partial degradation novel transcripts annotation errors o Probe-response normalization: make signal comparable across probes. o Simple segmentation algorithm o Well-developed theory, accurate estimation of change-points, including confidence intervals o Software - available as part of Bioconductor (http://www.bioconductor.org, also: CEL file import, normalization, further statistical testing)

  35. Acknowledgements Oleg Sklyar Jörn Tödling Raeka Aiyar EMBL-GE & Stanford: Lior David Marina Granovskaia Robert Gentleman Rafael Irizarry Vince Carey Ben Bolstad Lars Steinmetz

More Related