300 likes | 493 Views
Filtering and Normalization of Microarray Gene Expression Data. Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway. Outline. Filtering: spots removal of spots based on quality measures Normalization compensation for measurement errors
E N D
Filtering and Normalizationof Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and TechnologyTrondheim, Norway
Outline • Filtering: spots • removal of spots based on quality measures • Normalization • compensation for measurement errors • Examples of common problems
Channel - channel plot (CC) Intensity - ratio plot (AM or IR) Useful plots
Filtering: Spots • Criteria used to remove spots • spot area [pixels] • signal/noise ratio (spot intensity vs. background intensity) • other quality measures (e.g. based on quality scores from image analysis software) • morphological criteria • pixel-level variability
Filtering: Spots • Spot area
Filtering: Spots • Spot area based filtering • keep spots with area >threshold in both channels • problem: setting the appropriate threshold • dependent on the definition of the spot (image analysis software), and the distribution of the spot area • typical value: 10 pixels
Filtering: Spots • Signal and background
Filtering: Spots • Signal/noise based filtering • keep spots with signal / background > threshold in both channels • problem: setting the appropriate threshold • dependent on the spot and background definition (image analysis software) • typical value: sgn/bkg > 2 (or, equivalent, sgn - bkg > bkg)
Filtering: Spots • Signal/noise based filtering (alternative) • flag spots if Sij< Bij+θσBij, where:Sij: ith spot intensity in jth channel (not corrected)Bij: ith spot background in jth channelσBij: ith spot background deviation in jth channelθ: user defined threshold
Filtering: Spots • Other criteria • Intensity threshold on background corrected intensity (for each channel separately) • Spot quality measures (pixelwise distributional properties of spot and background intensities, manual morphology-based spot flagging etc.) • Replicate-based spot filtering (adaptive threshold selection based on a repeatability coefficient, coefficient of variation etc.)
Filtering: Spots • Total intensity (log2) threshold
Filtering: Spots • Morphology based filtering
Normalization • Analysis of systematic errors • adjustment for bias coming from variation in the technology rather than from biology • Different sources of non-linearity • Print-tip differences • Efficiency of dye incorporation (labelling) • Non-uniformity in hybridisation • Scanning • Between slide variation (print quality, ambient conditions)
Normalization • Selection of elements • Housekeeping genes, spike controls, tip-dependence, raw data, between array normalization • Method • Constant subtraction (shift)(mean/median log2 ratio, iterative c estimation, ANOVA) • Locally weighted mean(intensity or location dependent) • Other recently proposed methods
Normalization (example 1) • Intensity independent normalization with median ratio subtraction
Normalization (example 1) • Intensity independent normalization with median ratio subtraction
Normalization (example 1) • Intensity dependent normalization with locally weighted mean, global
Normalization (example 1) • Intensity dependent normalization with locally weighted mean, print-tip dependent
Normalization (example 1) • Intensity dependent normalization with locally weighted mean, global vs. print-tip dependent
Normalization (example 2) • Intensity dependent normalization with locally weighted mean, print-tip dependent
Normalization • Location dependent normalization with locally weighted mean (from SNOMAD web page)
Acknowledgments Mette Langaas Department of Mathematical Sciences, Norwegian Institute of Science and Technology Astrid Lægreid, Kristin Nørsett Department of Physiology and Biomedical Engineering, Norwegian Institute of Science and Technology Per Kristian Lehre Department of Computer and Information Science,Norwegian Institute of Science and Technology