1 / 35

Transformations… what for? which one?

Transformations… what for? which one?. Wolfgang Huber Div.Molecular Genome Analysis DKFZ Heidelberg.  Microarray intensities x 1 ,…,x n. Log-ratio with/without background correction Shrunken log-ratio (BHM) Variance stabilized log-ratio (=generalized log-ratio, “glog”).

keisha
Download Presentation

Transformations… what for? which one?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transformations… what for?which one? Wolfgang HuberDiv.Molecular Genome Analysis DKFZ Heidelberg

  2.  Microarray intensities x1,…,xn Log-ratio with/without background correction Shrunken log-ratio (BHM) Variance stabilized log-ratio (=generalized log-ratio, “glog”) How do you like to think about (interprete) it?  How do you estimate the parameters?  What comes out (the “bottom line”?)

  3. 3000 3000 x3 ? 1500 200 1000 0 ? x1.5 A A B B C C But what if the gene is “off” (below detection limit) in one condition? ratios and fold changes Fold changes are useful to describe continuous changes in expression

  4. ratios and fold changes • Many interesting genes will be off in some of the conditions of interest • If you want expression measure (“net normalized spot intensity”) to be an unbiased estimator of abundance •  many values  0 •  need something more than (log)ratio • 2. If you let expression measure be biased (always>0) •  can keep ratios. •  how do you choose the bias?

  5. ratios and fold changes Ratios are scale-free: But there is (at least) one absolute scale in the data: Can we use this to construct useful functions

  6. In the following:  How to compare microarray intensities with each other?  How to incorporate measurement uncertainty (“variance”)?  How to simultaneously and consistently deal with calibration (“normalization”)?

  7. Systematic Stochastic o similar effect on many measurements o corrections can be estimated from data o too random to be ex-plicitely accounted for o remain as “noise” Calibration Error model Sources of variation amount of RNA in the biopsy efficiencies of -RNA extraction -reverse transcription -labeling -fluorescent detection probe purity and length distribution spotting efficiency, spot size cross-/unspecific hybridization stray signal

  8. bi per-sample normalization factor bk sequence-wise probe efficiency hik ~ N(0,s22) “multiplicative noise” ai per-sample offset eik ~ N(0, bi2s12) “additive noise” modeling ansatz measured intensity = offset + gain  true abundance

  9. “multiplicative” noise “additive” noise  The two-component model raw scale log scale B. Durbin, D. Rocke, JCB 2001

  10. variance stabilizing transformations Xu a family of random variables with EXu=u, VarXu=v(u). Define var f(Xu ) independent of u derivation: linear approximation

  11. variance stabilizing transformations f(x) x

  12. 1.) constant variance (‘additive’) 2.) constant CV (‘multiplicative’) 3.) offset 4.) additive and multiplicative  variance stabilizing transformations

  13. the “glog” transformation - - - f(x) = log(x) ———hs(x) = asinh(x/s) P. Munson, 2001 D. Rocke & B. Durbin, ISMB 2002

  14. parameter estimation o maximum likelihood estimator: straightforward – but sensitive to deviations from normality o model holds for genes that are unchanged; differentially transcribed genes act as outliers. o robust variant of ML estimator, à la Least Trimmed Sum of Squares regression. o works as long as <50% of genes are differentially transcribed

  15. Least trimmed sum of squares regression minimize P. Rousseeuw, 1980s - least sum of squares - least trimmed sum of squares

  16. evaluation: effects of different data transformations difference red-green rank(average)

  17.  Normality: QQ-plot

  18. evaluation: sensitivity / specificity in detecting differential abundance o Data: paired tumor/normal tissue from 19 kidney cancers, in color flip duplicates on 38 cDNA slides à 4000 genes. o 6 different strategies for normalization and quantification of differential abundance o Calculate for each gene & each method: t-statistics, permutation-p oFor threshold a, compare the number of genes the different methods find, #{pi | pia}

  19. evaluation: comparison of methods one-sided test for upone-sided test for down more accurate quantification of differential expression  higher sensitivity / specificity

  20. evaluation: a benchmark for Affymetrix genechip expression measures o Data: Spike-in series: from Affymetrix 59 x HGU95A, 16 genes, 14 concentrations, complex background Dilution series: from GeneLogic 60 x HGU95Av2, liver & CNS cRNA in different proportions and amounts o Benchmark: 15 quality measures regarding -reproducibility -sensitivity -specificity Put together by Rafael Irizarry (Johns Hopkins) http://affycomp.biostat.jhsph.edu

  21. good affycomp results (28 Sep 2003) bad

  22.  ROC curves

  23. position- and sequence-specific effects wi(s): Naef et al., Phys Rev E 68 (2003)  Stratification wi i collaboration with R. Irizarry

  24. glog versus "sliding z-score" sliding z-score

  25. Availability oimplementation in R oopen source package vsn on www.bioconductor.org oBioconductor is an international collaboration on open source software for bioinformatics and statistical omics

  26.  What to do with the gene lists: the functional genomics pipeline @ DKFZ Neoplastic diseases High-throughput transcriptome sequencing: clones with unannotated full length ORFs functional characterization association of mRNA profiles with - genetic aberrations - histopathology - clinical behavior

  27. GFP-ORF- protein BrdU incorporation expression clone DAPI: identification proliferation +activator -inhibitor CFP: expression automated microscope BrdU: proliferation Image segmentation and quantification Rainer Pepperkok, EMBL  HT functional assays (S. Wiemann, D. Arlt) Library of "unknown" transcripts

  28. Measurement of fluorescence intensities 231.6, -4.8 80.9, 68.5, SMPCell Detection of modulators of cell proliferation by automated image analysis YFP channel Cy5 channel 72.0 761.0 71.0 684.1 119.7 779.0 87.3 820.2 149.5 645.6 70.2 536.1 84.7 799.5 103.1 912.8 81.0 916.7 2621.8 267.6 74.1 766.2 156.8 866.6 169.0 819.8 105.5 757.7 156.0 367.8 76.5 746.2 135.2 731.2 86.2 567.3 77.7 896.3 92.6 1095.4 104.6 633.3 481.2 567.7 539.0 663.9 95.0 726.2 156.7 842.1 DAPI ORF-YFP Anti-BrdU/Cy5 overlay YFP – Cy5 Dorit Arlt

  29. activationinhibition transfected cells control cells Statistical analysis of cellular assay data detect transfection effect: inh (p=10-8) Plate summary plot

  30.  Cellular assays: challenges for statisticians oImage analysis: pattern recognition, classification oLow-level analysis what are good models for calibration, “normalization”, data transformation? oHigh-level analysis models for the dependence of cellular processes on over-/underexpression of genes connect results from different assays, microarray data

  31. Summary olog-ratio Dlog: what about genes that are not expressed in some of the conditions of interest? o generalized log-ratio Dh:a useful extrapolation - interpretability - sensitivity - specificity - computational convenience owhat to do with the gene lists? systematic (high throughput) functional assays

  32. Acknowledgements MPI Molekulare Genetik Anja von Heydebreck Martin Vingron Uni Heidelberg Günther Sawitzki DFCI Harvard Robert Gentleman UMC Leiden Judith Boer RZPD Anke Schroth Bernd Korn EMBL Urban Liebel ...and many more! DKFZ Heidelberg Molecular Genome Analysis Annemarie Poustka Holger Sültmann Andreas Buneß Markus Ruschhaupt Katharina Finis Jörg Schneider Klaus Steiner Stefan Wiemann Dorit Arlt

  33. Models are never correct, but some are useful True relationship: Model: linear dependence Model: quadratic dependence

  34. generalized log-ratio difference log-ratio variance: constant part proportional part variance stabilization raw scale log glog

  35. ratio compression Yue et al., (Incyte Genomics) NAR (2001) 29 e41

More Related