1 / 80

Microarray normalization, error models, quality

Microarray normalization, error models, quality. Wolfgang Huber EMBL-EBI Brixen 16 June 2008. Oligonucleotide microarrays. Base Pairing. *. *. *. *. *. Oligonucleotide microarrays. GeneChip. Hybridized Probe Cell. Target - single stranded cDNA. Oligonucleotide probe. 5µm.

lpate
Download Presentation

Microarray normalization, error models, quality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008

  2. Oligonucleotide microarrays

  3. Base Pairing

  4. * * * * * Oligonucleotide microarrays GeneChip Hybridized Probe Cell Target - single stranded cDNA Oligonucleotide probe 5µm Millions of copies of a specific oligonucleotide probe molecule per patch 1.28cm up to 6.5 Mio different probe patches Image of array after hybridisation and staining

  5. Probe sets

  6. Each target molecule (transcript) is represented by several oligonucleotides of (intended) length 25 bases Probe: one of these 25-mer oligonucleotides Perfect match (PM): A probe exactly complementary to the target sequence Mismatch (MM): same as PM but with a single homomeric base change for the middle (13th) base (G-C, A-T) Probe-pair: a (PM, MM) pair. Probe set: a collection of probe-pairs (e.g. 11) targeting the same transcript MGED/MIAME: „probe“ is ambiguous! Reporter: the sequence Feature: a physical patch on the array with molecules intended to have the same reporter sequence (one reporter can be represented by multiple features) Terminology for transcription arrays

  7. Image analysis • about 100 pixels per feature • segmentation • summarisation into one number representing the intensity level for this feature •  CEL file

  8. samples: mRNA from tissue biopsies, cell lines fluorescent detection of the amount of sample-probe binding arrays: probes = gene-specific DNA strands tissue A tissue B tissue C ErbB2 0.02 1.12 2.12 VIM 1.1 5.8 1.8 ALDH4 2.2 0.6 1.0 CASP4 0.01 0.72 0.12 LAMA4 1.32 1.67 0.67 MCAM 4.2 2.93 3.31 marray data

  9. Why do you need ‘normalisation’?

  10. bias From: lymphoma dataset vsn package Alizadeh et al., Nature 2000

  11. MA-plot M A

  12. log2 intensity arrays / dyes

  13. Non-linearity log2 Cope et al. Bioinformatics 2003

  14. ratio compression nominal 3:1 nominal 1:1 nominal 1:3 Yue et al., (Incyte Genomics) NAR (2001) 29 e41

  15. A complex measurement process lies between mRNA concentrations and intensities The problem is less that these steps are ‘not perfect’; it is that they vary from array to array, experiment to experiment.

  16. Why do you need statistics?

  17. tumor-normal log-ratio Which genes are differentially transcribed? same-same

  18. Statistics 101: biasaccuracy  precision variance

  19. Basic dogma of data analysis Can always increase sensitivity on the cost of specificity, or vice versa, the art is to - optimize both, then - find the best trade-off. X X X X X X X X X

  20.  How to compare microarray intensities with each other?  How to address measurement uncertainty (“variance”)?  How to calibrate (“normalize”) for biases between samples? Questions

  21. Systematic Stochastic o similar effect on many measurements o corrections can be estimated from data o too random to be ex-plicitely accounted for o remain as “noise” Calibration Error model Sources of variation amount of RNA in the biopsy efficiencies of -RNA extraction -reverse transcription -labeling -fluorescent detection probe purity and length distribution spotting efficiency, spot size cross-/unspecific hybridization stray signal

  22. Quantile normalisation

  23. Ben Bolstad 2001 Quantile normalisation

  24. data("Dilution") nq = normalize.quantiles(exprs(Dilution)) nr = apply(exprs(Dilution), 2, rank) for(i in 1:4) plot(nr[,i], nq[,i], pch=".", log="y", xlab="rank", ylab="quantile normalized", main=sampleNames(Dilution)[i])

  25. Quantile normalisation is: per array rank-transformation followed by replacing ranks with values from a common reference distribution

  26. + Simple, fast, easy to implement + Always works, needs no user interaction / tuning + Non-parametric: can correct for quite nasty non-linearities (saturation, background) in the data - Always "works", even if data are bad / inappropriate - Conservative: rank transformation looses information - may yield less power to detect differentially expressed genes - Aggressive: when there is an excess of up- (or down) regulated genes, it removes not just technical, but also biological variation Quantile normalisation

  27. loess normalisation

  28. loess (locally weighted scatterplot smoothing): an algorithm for robust local polynomial regression by W. S. Cleveland and colleagues (AT&T, 1980s) and handily available in R "loess" normalisation

  29. Local polynomial regression bandwidth b

  30. Robust regression

  31. C. Loader Local Regression and Likelihood Springer Verlag

  32. loess normalisation • local polynomial regression of M against A • 'normalised' M-values are the residuals before after

  33. local polynomial regression normalisation in >2 dimensions

  34. n-dimensional local regression model for microarray normalisation An algorithm for fitting this robustly is described (roughly) in the paper. They only provided software as a compiled binary for Windows. The method has not found much use.

  35. Estimating relative expression (fold-changes)

  36. 3000 3000 x3 ? 1500 200 1000 0 ? x1.5 A A B B C C But what if the gene is “off” (below detection limit) in one condition? ratios and fold changes Fold changes are useful to describe continuous changes in expression

  37. ratios and fold changes The idea of the log-ratio (base 2) 0: no change +1: up by factor of 21 = 2 +2: up by factor of 22 = 4 -1: down by factor of 2-1 = 1/2 -2: down by factor of 2-2 = ¼ A unit for measuring changes in expression: assumes that a change from 1000 to 2000 units has a similar biological meaning to one from 5000 to 10000. What about a change from 0 to 500? - conceptually - noise, measurement precision

  38. Many data are measured in definite units: time in seconds lengths in meters energy in Joule, etc. Climb Mount Plose (2465 m) from Brixen (559 m) with weight of 78 kg, working against a gravitation field of strength 9.81 m/s2 : What is wrong with microarray data? (2465 - 559) · 78 · 9.81 m kg m/s2 = 1 458 433 kg m2 s-2 = 1 458.433 kJ

  39. vsn normalisation

  40. Robust affine regression normalisation (n-dim.) with the additive-multiplicative error model Error model: Trey Ideker et al. (2000) JCB David Rocke and Blythe Durbin: (2001) JCB, (2002) Bioinformatics Use for robust affine regression normalisation: W. Huber, Anja von Heydebreck et al. (2002) Bioinformatics

  41. bi per-sample gain factor bk sequence-wise probe efficiency hik multiplicative noise ai per-sample offset eik additive noise  The two component model measured intensity = offset + gain  true abundance

  42. “multiplicative” noise “additive” noise The two-component model raw scale log scale B. Durbin, D. Rocke, JCB 2001

  43. Parameterization two practically equivalent forms (h<<1)

  44.  variance stabilizing transformations Xu a family of random variables with E Xu = u, Var Xu = v(u). Define Var f(Xu ) independent of u derivation: linear approximation

  45. variance stabilizing transformation f(x) x

  46. 1.) constant variance (‘additive’) 2.) constant CV (‘multiplicative’) 3.) offset 4.) additive and multiplicative  variance stabilizing transformations

  47. the “glog” transformation - - - f(x) = log(x) ———hs(x) = asinh(x/s) P. Munson, 2001 D. Rocke & B. Durbin, ISMB 2002 W. Huber et al., ISMB 2002

  48. generalized log-ratio difference log-ratio variance: constant part proportional part glog raw scale log glog

More Related