1 / 54

Preprocessing of cDNA microarray data

Preprocessing of cDNA microarray data. Lecture 19, Statistics 246, April 1, 2004. Begin by looking at the data. Was the experiment a success? What analysis tools should be used? Are there any specific problems?. Red/Green overlay images.

calvine
Download Presentation

Preprocessing of cDNA microarray data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004

  2. Begin by looking at the data Was the experiment a success? What analysis tools should be used? Are there any specific problems?

  3. Red/Green overlay images Co-registration and overlay offers a quick visualization, revealing information on color balance, uniformity of hybridization, spot uniformity, background, and artifiacts such as dust or scratches Good: low bg, lots of d.e. Bad: high bg, ghost spots, little d.e.

  4. Always log, always rotate log2R vs log2G M=log2R/G vs A=log2√RG

  5. Histograms Signal/Noise = log2(spot intensity/background intensity)

  6. Boxplots of log2R/G Liver samples from 16 mice: 8 WT, 8 ApoAI KO.

  7. Spatial plots: background from the two slides

  8. Highlighting extreme log ratios Top (black) and bottom (green) 5% of log ratios

  9. Boxplots and highlighting Log-ratios Clear example of spatial bias (here high is red, low green) Print-tip groups pin group #

  10. Pin group (sub-array) effects Lowess lines through points from pin groups Boxplots of log ratios by pin group

  11. Plate effects

  12. KO #8 Probes: ~6,000 cDNAs, including 200 related to lipid metabolism. Arranged in a 4x4 array of 19x21 sub-arrays.

  13. Time of printing effects spot number Green channel intensities (log2G). Printing over 4.5 days. The previous slide depicts a slide from this print run.

  14. Normalization Why? To correct for systematic differences between samples on the same slide, or between slides, which do not represent true biological variation between samples. How do we know it is necessary? By examining self-self hybridizations, where no true differential expression is occurring. We find dye biases which vary with overall spot intensity, location on the array, plate origin, pins, scanning parameters,….

  15. Self-self hybridizations False color overlay Boxplots within pin-groups Scatter (MA-)plots

  16. A series of non self-self hybridizations From the NCI60 data set (Stanford web site)

  17. Early Ngai lab, UC Berkeley

  18. Early Goodman lab, UC Berkeley

  19. From the Ernest Gallo Clinic & Research Center

  20. Early PMCRI, Melbourne Australia

  21. Normalization: methods a) Normalization based on a global adjustment log2 R/G -> log2 R/G - c = log2 R/(kG) Choices for k or c = log2k are c = median or mean of log ratios for a particular gene set (e.g. housekeeping genes). Or, total intensity normalization, where k = ∑Ri/ ∑Gi. b) Intensity-dependent normalization. Here we run a line through the middle of the MA plot, shifting the M value of the pair (A,M) by c=c(A), i.e. log2 R/G -> log2 R/G - c (A) = log2 R/(k(A)G). One estimate of c(A) is made using the LOWESS function of Cleveland (1979): LOcally WEighted Scatterplot Smoothing.

  22. Normalization: methods c) Within print-tip group normalization. In addition to intensity-dependent variation in log ratios, spatial bias can also be a significant source of systematic error. Most normalization methods do not correct for spatial effects produced by hybridization artifacts or print-tip or plate effects during the construction of the microarrays. It is possible to correct for both print-tip and intensity-dependent bias by performing LOWESS fits to the data within print-tip groups, i.e. log2 R/G -> log2 R/G - ci(A) = log2 R/(ki(A)G), where ci(A) is the LOWESS fit to the MA-plot for the ith grid only.

  23. Which spots to use for normalization? The LOWESS lines can be run through many different sets of points, and each strategy has its own implicit set of assumptions justifying its applicability. For example, we can justify the use of a global LOWESS approach by supposing that, when stratified by mRNA abundance, a) only a minority of genes are expected to be differentially expressed, or b) any differential expression is as likely to be up-regulation as down-regulation. Pin-group LOWESS requires stronger assumptions: that one of the above applies within each pin-group. The use of other sets of genes, e.g. control or housekeeping genes, involve similar assumptions.

  24. Use of control spots Lowess curve blanks Positive controls (spotted in varying concentrations) Negative controls M = log R/G = logR - logG A = ( logR + logG) /2

  25. Global scale, global lowess, pin-group lowess; spatial plot after, smooth histograms of M after

  26. MSP titration series(Microarray Sample Pool) Pool the whole library Control set to aid intensity- dependent normalization Different concentrations Spotted evenly spread across the slide

  27. MSP normalization compared to other methods Orange: Schadt-Wong rank invariant set Red line: lowess smooth Yellow:GAPDH, tubulin Light blue: MSP pool / titration

  28. Composite normalization ci(A)=aAg(A)+(1-aA)fi(A) -MSP lowess curve -Global lowess curve -Composite lowess curve (Other colours control spots) Before and after composite normalization

  29. Comparison of Normalization Schemes(courtesy of Jason Goncalves) No consensus on best segmentation or normalization method Scheme was applied to assess the common normalization methods Based on reciprocal labeling experiment data for a series of 140 replicate experiments on two different arrays each with 19,200 spots

  30. DESIGN OF RECIPROCAL LABELING EXPERIMENT Replicate experiment in which we assess the same mRNA pools but invert the fluors used. The replicates are independent experiments and are scanned, quantified and normalized as usual

  31. The following relationship would be observed for reciprocal microarray experiments in which the slides are free of defects and the normalization scheme performed ideally We can measure using real data sets how well each microarray normalization scheme approaches this ideal

  32. Deviation metric to assess normalization schemes We now use the mean array average deviation to compare the normalization methods. Note that this comparison addresses only variance (precision) and not bias (accuracy) aspects of normalization.

  33. ***

  34. Scale normalization: between slides Boxplots of log ratios from 3 replicate self-self hybridizations. Left panel: before normalization Middle panel: after within print-tip group normalization Right panel: after a further between-slide scale normalization.

  35. The “NCI 60” experiments (no bg) Some scale normalization seems desirable

  36. Scale normalization: another data set Log-ratios • Only small differences in spread apparent. No action required. • `

  37. One way of taking scale into account Assumption: All slides have the same spread in M True log ratio is mij where i represents differentslides and j represents different spots. Observed is Mij, where Mij = aimij Robust estimate of ai is MADi = medianj { |yij - median(yij) | }

  38. A slightly harder normalization problem Global lowess doesn’t do the trick here.

  39. Print-tip-group normalization helps

  40. But not completely There is still a lot of scatter in the middle in a WT vs KO comparison.

  41. Effects of previous normalisation Before normalisation After print-tip-group normalization

  42. Within print-tip-group box plots of M afterprint-tip-group normalization

  43. Taking scale into account, cont. Assumption: All print-tip-groups have the same spread in M True log ratio is mij where i represents different print-tip-groups and j represents different spots. Observed is Mij, where Mij = aimij Robust estimate of ai is MADi = medianj { |yij - median(yij) | }

  44. Effect of location & scale normalization Clearly care is needed in making decisions like this one.

  45. A comparison of three MA-plots Print-tip normalization Print tip & scale n. Unnormalized

  46. The same idea on another data set Log-ratios After print-tip location and scale normalization. Print-tip groups

  47. Follow-up experiment On each slide, half the spots (8) are differentially expressed, the other half are not.

  48. Paired-slides: dye-swap Slide 1, M = log2 (R/G) - c Slide 2, M’ = log2 (R’/G’) - c’ Combine bysubtracting the normalized log-ratios: [ (log2 (R/G) - c) - (log2 (R’/G’) - c’) ] / 2  [ log2 (R/G) + log2 (G’/R’) ] / 2  [ log2 (RG’/GR’) ] / 2 provided c = c’. Assumption: the normalization functions are the same for the two slides.

  49. Checking the assumption MA plot for slides 1 and 2: it isn’t always like this.

  50. Result of self-normalization (M - M’)/2 vs. (A + A’)/2

More Related