1 / 53

Microarray Data Processing and Quality Control

Learn about RNA isolation, amplification, labeling, hybridization, scanning, and data analysis for microarray experiments. Understand the importance of normalization, balance channels, and spike-in RNA for high-quality results.

taro
Download Presentation

Microarray Data Processing and Quality Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WORKSHOPSPOTTED 2-channel ARRAYSDATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

  2. AIMS Discussion Information Introduction to the use of the webpage for automated normalization interface btw experimentalists and analysts feedback resource allocation

  3. some slides originally provided by: Terry Speed (Berkeley / WEHI) Sandrine Dudoit (Berkeley) Yee Hwa Yang (Berkeley) Natalie Thorne (WEHI) Otto Hagenbuechle Eugenia Migliavacca Darlene Goldstein and others Acknowledgments

  4. Preparation RNA ISOLATION (AMPLIFICATION) AND LABELING WITH FLUORO-DYES

  5. HybridisationBinding labelled samples (targets) to complementary probes on a slide cover slip Mix Hybridise for 5-12 hours Wash

  6. 1 2 Adjust scanner parameters; frequently can adapt: 1. excitation wave (laser) intensity 2. "gain" (amplification) of the photon detection system 1 2 Scanning

  7. How to extract data ? How to recognize problems ? Human 10K cDNA Array

  8. Scanner's Spots Part of the image of one channel false-coloured on a white (v. high) red (high)throughyellowand green(medium)toblue (low)and black scale.

  9. Steps of a Microarray Experiment RNA preparation and Labeling first: DESIGN ! Hybridisation Slide scanning Image analysis Normalization Data for further analysis Why perform an experiment ? What is the aim ? Which conclusions do you want to reach ?

  10. mRNA abundance approx. 300'000 mRNA Molecules/cell approx. 10-20'000 different genes 500+ 50-500 tRNA tRNA tRNA mRNA 1% 1-50 rRNA 80% RNA mass different in different cells What do you want to measure ?

  11. Relative vs Absolute changes 200'000 mRNA Molecules/cell 200 for gene X (0.1%) 400'000 mRNA Molecules/cell 400 for gene X (0.1%) Is gene X differentially expressed ?

  12. Steps of a Microarray Experiment RNA preparation and Labeling What is needed for high quality data ? Which are the critical steps ? Hybridisation Slide scanning 16-bit TIFF files Image analysis (Rfg, Rbg), (Gfg, Gbg), etc Normalization R, G, M, A, etc Data for further analysis

  13. Steps of a Microarray Experiment RNA preparation and Labeling Spike-in RNA in known conc. and ratios Hybridisation Adjust / Balance channels approx.; avoid saturation Slide scanning Image analysis check normalized and unnormalized data of exp RNA and of spiked RNA Normalization Data for further analysis

  14. Why calculate ratios ? Why calculate log ratios ? Why avoid saturation ? Why balance channels ? Why perform "normalization" ? What to check before and after normalization ?

  15. Aim: Gene Expression Data Slides Gene expression data on p genes for n samples slide 1 slide 2 slide 3 slide 4 slide 5 … 1 0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49 0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10 0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.06 1.06 1.35 1.09 -1.09 ... Genes Gene expression level of gene 5 in slide 4j M = Log2(Red intensity / Green intensity) These values are conventionally displayed on a red(>0)yellow (0)green (<0) scale.

  16. Objectives for high quality Important aspects include: • Tentatively separating • systematic sources of variation ("artefacts"), that bias the results, • from random sources of variation ("noise"), that hide the truth. • Removing the former as well as possible and quantifying the latter Only if this is done can we hope to reach good quality and make valid statements about the confidence in the results

  17. Typical Statistical Approach Measured value = real value + systematic errors + noise Corrected value = real value + noise • Analysis of Corrected value => (unbiased) CONCLUSIONS • Estimation of Noise => quality of CONCLUSIONS, statistical significance (level of confidence) of the conclusions

  18. Step 1: a) Background Correctionb) Calculation of (log) ratios Image Analysis => Rfg ; Rbg ; Gfg ; Gbg (fg = foreground, bg = background.) For each spot on the slide calculate: Red intensity = R = Rfg - Rbg Green intensity = G = Gfg - Gbg M =Log2(Red intensity / Green intensity) Subtraction of background values (additive background model assuming to be locally constant …) Sources of background: probe unspecifically sticking on slide, irregular / dirty slide surface, dust, and noise / errors) in the scanner measurement Not included: real cross-hybridisation and unspecific hybridisation to the probe

  19. Comment to Background Correction Subtraction of background has shown frequently not to improve the performance: while making the average of many measurements closer to the true values (reduced bias or systematic error) it causes higher variability (lower reproducibility) average single meas. B. Low variance - Biased Estimator A. High variance - Unbiased Estimator

  20. Which is better ? High variance - Unbiased Estimator when you take many measurements: the average will be closer to the true value more frequently B. Low variance - slightly biased Estimator when you take one or a few measurements: the average will be closer to the true value more frequently DAF Microarrays 2002: we preferred no subtraction, should be re-evaluated with Agilent scanner (and GenePix IAS)

  21. A reminder on logarithms

  22. A numerical example

  23. Step 2: An M vs A (MVA) Plot M = log R/G = logR - logG Lowess curve blanks Positive controls (spotted in varying concentrations) Negative controls A = ( logR + logG ) /2

  24. Why use an M vs A plot ? • Logs stretch out region we are most interested in. • Can more clearly see features of the data such as intensity dependent variation, and dye-bias. • Differentially expressed genes more easily identified. • Intuitive interpretation

  25. MVA plot: looking at data Spot identifier Lowess curve S1.n. Control Slide: Dye Effect, Spread.

  26. Step 3: Normalisation - global median centering Normalisation - Median common median • Assumption: Changes roughly symmetric • First panel: smooth density of log2G and log2R. • Second panel: M vs A plot with median put to zero

  27. Step 4: Normalisation - lowess- local median centering • Assumption: changes roughly symmetric at all intensities.

  28. What is this normalization doing?

  29. Local regression • Classical (global) regression: draws a single line to the entire set of points • Local regression: draws a curve through noisy data by smoothing • Lowess(LOcally WEighted Scatterplot Smoothing) is a type of local regression • Can correct for bothprint-tip and intensity-dependent bias with lowess fits to the data within print-tip groups

  30. Local regression illustrated

  31. Lowess line

  32. Step 5: Normalisation - spatial corrections Log-ratios • After within slide global lowess normalization. • Likely to be a spatial effect. Print-tip groups

  33. Normalization between groups (ctd) normalized values look nice , but ..... Log-ratios • After print-tip location- and scale- normalization. Print-tip groups

  34. Effects of Location Normalisation (example) Before After

  35. Identifying sub-array effects Lowess lines through points from each pin group Boxplots of log ratios by pin group

  36. Step 6: Rescaling (Spread-Normalisation) Taking varying scale into account Assumption: All (print-tip-)groups should have the same spread in M True ratio is ij where i represents different (print-tip)-groups and j represents different spots. Observed is Mij, where Mij = ai * log(ij) Robust estimate of ai is Corrected values are calculated as:

  37. Illustration: print-tip-group - Normalisation Assumption: For every print group: changes roughly symmetric at all intensities. Glass Slide Array of bound cDNA probes 4x4 blocks = 16 pin groups

  38. Which normalization to use? Case 1: A few genes that are likely to change and / or a random large collection of genes (expect as many up as down): Each slide per se: • Location: print-tip-group lowess normalization. • Scale: for all print-tip-groups, adjust MAD to equal the geometric mean for MAD for all print-tip-groups. Case 2:Non-random gene collection and / or many genes do change appreciably: • USE DYE-SWAP APPROACH • Self-normalization: take the difference of the two log-ratios. • Check using controls or known information.

  39. MVA plots: what to look at ? How to use the spikes ? Points: signal intensity background saturation homogeneity , normalizability problem diagnosis

  40. Webpage How to use the plots ? Use of the different options

  41. Quality control before normalization (?) Choice of normalization

More Related