140 likes | 153 Views
Learn about calculating expression values for gene probes and choosing between MAS 4, MAS 5, RMA, and GC-RMA methods for Affymetrix data pre-processing. Understand background correction, normalization techniques, and the application of MEDIAN POLISH for robust analysis.
E N D
Lecture Topic 5 Pre-processing AFFY data
Probe Level Analysis • The Purpose • Calculate an expression value for each probe set (gene) from the 11-25 PM and MM intensities • Critical for later analysis. Avoiding GIGO
Difficulties • Large variability • Few measurements (11-25) at most • MM is very complex, it is signal plus background • Signal has to be SCALED • Probe-level effects
Different Methods • MAS 4 Affymetrix 1996 • MAS 5 Affymetrix 2002 • Robust Multichip Analysis (RMA) 2002 • GC-RMA 2004
MAS 4 A- probe pairs selected
Avg Diff • Calculated using differences between MM and PM of every probe pair and averaging over the probe pair • Excluded OUTLIER pairs if PM-MM > 3 SD • Was NOT a robust average • NOT log-transformed • COULD be negative (about 1/3 of the times)
MAS 5 • Signal=TukeyBiweight{log2(PMj-IMj) • Discussed this earlier. • Requires calculating IM • Adjusted PM-MM are log transformed and robust for outlying observations using Tukey Biweight.
Robust Multichip Analysis ONLY uses PM and ignores MM SACRIFICES Accuracy but major gains in PRECISION • Basic Steps: • 1. Calculate chip background (*BG) and subtract from PM • 2. Carry out intensity dependent normalization for PM-*BG • Lowess • Quantile Normalization (Discussed before) • Normalized PM-*BG are log transformed • Robust multichip analysis of all probes in the set and using Tukey median polishing procedure. Signal is antilog of result.
RMA- Step 1: Background Correction • Irrizary et al(2003) • Looks at finding the conditional expectation of the TRUE signal given the observed signal (which is assumed to be the true signal plus noise) • E(si | si+bi) • Here, si assumed to follow Exponential distribution with parameter q. • Bi assumed to follow N(me, s2e) • Estimate me and se as the mean and standard deviation of empty spots
RMA-Normalization Use the background corrected intensities B(PM) to carry out normalization • Lowess (for Spatial effects) • Quantile Normalization (to allow comparability amongst replicate slides) • Normalized B(PM) are log transformed
RMA summarization • Use MEDIAN POLISH to fit a linear model • Given a MATRIX of data: • Data= overall effects+row effects + column effects + residual • Find row and column effects by subtracting the medians of row and column successively till all the medians are less than some epsilon • Gives estimated row, column and overall effect when done
Median Polish of RMA • For each probe set we have a matrix (probes in rows and arrays in columns) • We assume: • Signal=probe affinity effect + logscale for expression + error • Also assume the sum of probe affinities is 0 • Use MEDIAN polish to estimate the expression level in each array