1 / 39

Low-Level Analysis and QC

Low-Level Analysis and QC. Regional Biases Mark Reimers, NCI. Outline. Regional biases on spotted arrays Relation to background Measures of bias Affy technical variation measures Dynamic range RNA degradation Regional biases on Affymetrix arrays Using bias.display and affyPLM for QC.

lorin
Download Presentation

Low-Level Analysis and QC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

  2. Outline • Regional biases on spotted arrays • Relation to background • Measures of bias • Affy technical variation measures • Dynamic range • RNA degradation • Regional biases on Affymetrix arrays • Using bias.display and affyPLM for QC

  3. The Quality Issue • Frequent outliers in experiments • Lack of agreement between labs • The hybridization process is complex and cannot be observed directly • Many factors cannot be optimized for all reactions • Statistical QC tools attempt to make visible subtle but pervasive effects

  4. What are Regional Biases? • Regions where all genes give consistently higher reading in one dye than other regions, or the same region on other slides • Most spots in images are relatively dark • Region may not appear brighter in one dye or the other • Biases not obvious by image inspection • Barazsi et al (2003), Qian et al (2003) identified high correlation between nearby probes in Spellman cell-cycle data, other data sets • Workman et al (2002), Colantuoni et al (SNOMAD, 2003) identified regional biases in cDNA arrays by fitting loess surfaces to ratios across each slide

  5. Visualizing Bias by Ratios Display ratios for each spot at constant brightness - easier to see biases Some slides show bias toward one color in some areas

  6. A Common Standard • Expression ratios vary from spot to spot • Harder to see patterns • Often a series of experiments on a single tissue, use a common reference • Construct average ratios (tissue typical ratios?) • More informative image: spot ratios compared to typical ratio for that spot across all slides

  7. Common Reference Highlights Difference Red/Green ratios show variation Ratios of ratios on slide to ratios on standard show less variation

  8. Visualizing Bias using Standard Ratio of ratios shows much clearer concentration of red spots on some slides Note non-random but highly irregular concentration of red

  9. Bias and Background • We observe that local background contributes to bias • Does subtracting background remove bias? Local off-spot background may not be the best estimate of spot background (non-specific hyb) Spots BG subtracted

  10. Bias and Background (2) Raw spot ratios show a mild bias relative to average After subtracting a high green bg in the center a red bias results

  11. Other Bias Patterns Processed Raw Spot Background This spotted oligo array shows strong biases at the beginning and end of each print-tip group The background shows a milder version of this effect Subtracting background removes some regional biases while adding bias in other regions

  12. How to Measure Regional Biases? • Correlation between neighboring probes • r = Cor( ri,j, ( ri-1,j + ri+1,j + ri,j-1+ ri,j+1)/4 ), • where ri,j is log ratio relative to standard at row i column j • Red-green ratios: r ~ 0.05-0.1 • Ratio to average: r ~ 0.1 - 0.3 • For some slides r > 0.5

  13. Regional Bias Affects Analysis • A major source of false positives for single slides • In some slides half the apparently most up-regulated genes come from 10% of slide area • In replicated experimental samples, regional bias results in increased variance - false negatives • In clinical samples, regional bias results in serious distortion of exploratory procedures such as clustering

  14. Visualizing Other QC Measures A heat plot of signal/SD ratios shows clearly that some slides and regions are better than others One persistently bad region in a batch was printed poorly S/N ratio Low S/N implies less reliable ratios

  15. Prospects for Normalization • Try to fit smooth (loess) surface to ratios to estimate bias. • Workman (2002) finds modest (20%) improvements in replicates’ variance • Colantuoni (2003) finds moderate improvements • Qian et al (2004) find that SNOMAD does not remove a majority of correlation between neighboring probes

  16. Prospects for Normalization (2) • Are ratios described well by smooth gradient? • Irregular regions are common • Short-range effects • Poor prospects for normalization by smoothing

  17. Regional Bias on Affy Chips

  18. Current Quality Measures • RNA quality • Gel or BioAnalyzer • Affymetrix Microarray Suite: • 3’/5’ ratios • Process of reverse transcription • Scaling factor • Labeling efficiency (and total RNA) • Per cent present calls • PM/MM ratios • Specificity of hybridization • Varies with stringency of wash solution

  19. Types of Problems Undetected • Local Artifacts - scratches, smudges • Regional Bias - large regions shifted • Hybridization differences causing differences in dynamic range • Small differences in RNA degradation

  20. Three Variables • RNA Quality • RNA degrades rapidly in intact samples • cRNA production may be variable • Hybridization conditions • Temperature, salinity • Defects or uneven conditions on chip • Bubbles spend more time in some places • Leading to regional biases

  21. RNA Degradation Plot • MAS5.0 displays 5’/3’ ratios for selected genes • Degradation plot displays relative signal at each position from 5’ to 3’ end of probe sequence • AffyRNADeg function in affy package of bioconductor • Home-crafted plotting function

  22. Amplified RNA Deg. Plot • Doubly amplified cRNA • Fairly even • No great discrepancies

  23. Hybridization Conditions • Variation in thermodynamics of hybridization affects • Background • Ratios of PM to MM • Specificity of hybridization • Distribution of signals from probes • Each of these can be investigated

  24. Visualizing Probe Distribution • Either as signal distribution (log scale works best) or as ratios • Ratios: • Construct reference standard: average each probe over all chips (20% trimmed mean) • Log scale works best • Subtract log standard from log probe signals

  25. Effects of Distribution Changes Distribution of Probe Ratios MDS Plot of Chips 90122, 90123, 90124, 97444 are replicates

  26. Local Artifacts, Regional Bias • Workman et al (2003) identified artifacts by displaying raw data image on log2 scale • Not many scars visible - are the chips that good? • Running means of (log2) intensity show little bias • Dynamic range - neighboring probes vary 10X to 100X • No obvious reference • Need to compensate for large dynamic range

  27. Visualizing Artifacts by Ratio • Construct a standard (virtual) chip: • Trimmed (20%) mean of each probe across all chips • Roughly estimates ‘typical’ level • Robust: genes highly expressed in few samples don’t affect • Compute ratio of each probe on any chip to corresponding probe on standard chip

  28. Visualizing Artifacts, Bias Image of raw data on a log2 scale shows striations but no obvious artifacts Image of ratios of probes to standard shows a smudge Non-coding probes

  29. Background and Scale • For each region: fit regression lines to probes on this chip vs corresponding probes on standard y=x Intercept and slope may be interpreted as local minimum intensity (background) and sensitivity (scale factor) Slope ~ 1.4 Background ~ +10

  30. Visualizing Bias as BG and Scale

  31. A Good Chip Probe ratio image shows small (<5%) elevated region Background plot shows this artifact mostly in background

  32. An Acceptable Chip Less than 10% of chip area affected in both background and scale

  33. A Bad Chip Half of this chip shows strong biases in background

  34. Quantifying Bias • Compute correlation over the chip between probe log-intensities and the averages of the 4 nearest neighbors • Typical ‘good’ Affy chip has correlation of ratios ~.2 • Some chips have correlations near 0.8 • Horizontal correlation > vertical correlation

  35. Does Bias Affect Measures? • Affymetrix distributes probes - robust? • Experiment: distort a chip in software • 10,000 probes raised 2X RMA MAS5 4% of genes distorted > 0.2 in MAS5 (log2 scale) 0.2 % show distortions > 0.2 by RMA (log2 scale)

  36. Bias Affects Measures - II Experiment: 50% of probes raised 2X

  37. Consequences for Analysis • A study with 41 chips founders on quality • Six groups - color coded in plot at right Several chips seem very atypical for their groups

  38. QC by affyPLM • Robust Multi-chip Analysis (RMA) • fits a linear model to each probe set High residuals show regional patterns Mean residuals a global indicator of quality Available in affyPLM package at www.bioconductor.org High residuals in green

  39. Current Affy Pipeline • Construct standard chip • if few samples, add samples of similar tissues • Compute ratios of probes to standard • Compute correlations of ratios • Examine images • Decide to accept/reject

More Related