Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by

Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

Rat mesothelioma cells control Rat mesothelioma cells treated with KBrO2

Normalization • Method to be improved: • Assume that some genes will not change under the treatment under investigation. • Identify these core genes in advance of the experiment. • Normalize all genes against these genes assuming they do not change

Normalization • New Method: • Assume that some genes will not change under the treatment under investigation. • Choose these core genes arbitrarily. • Normalize (provisionally) all genes against these genes assuming they do not change. • Determine which genes do not change under this normalization. • Make this set the new core. If this core differs from the previous core, go to 3. Else, done.

Error Model I = spot intensity [mRNA] = concentration of specific mRNA c = normalization constant

Error Model I = spot intensity [mRNA] = concentration of specific mRNA c = normalization constant  = lognormal multiplicative error

Error Model I = spot intensity [mRNA] = concentration of specific mRNA c = normalization constant  = lognormal multiplicative error index 1, i: treatment group index 2, j: replicate within treatment index 3, k: spot (gene)

Y = log spot intensity  = mean log concentration of specific mRNA  = treatment effect (conc. specific mRNA)  = normalization constant  = normal additive error index 1, i: treatment group index 2, j: replicate within treatment index 3, k: spot (gene)

Model: Identifiability constraints: Estimate by ordinary least squares:

Model: Identifiability constraints: But note: cannot identify between a and d

Self-consistency: The weight wk(d) is small if the kth gene is judged to be changed; close to one if it is judged to be unchanged. Procedure is iterative.

Failure of Model

Generalized Model The normalization aij(xk) and the heteroscedasticity function gij(xk) are slowly varying functions of the intensity, x. Estimate by Local Regression

Local Regression data

Predict value at x=50: weight, linear regression

Predict whole function similarly

Compare to known true function

Simulation-based Validation 1. Reproduce observed bias.

Simulation-based Validation 2. Reproduce observed heteroscedasticity.

Test based on z statistic:

Choice of significance level: expected number of false positives: E(false positives) = a N But minimum detectable difference increases as a gets smaller

a E(fp) min diff min ratio 0.05 250 0.916 2.5 0.01 50 1.09 3 0.001 5 1.29 3.6 0.0001 0.5 1.61 5

Validation of method against simulated data 3. Hypothesis testing: Simulated from stated model bias “-fold change” Proportion changed spots “rate false pos.” = mean observed / expected

Simulated data: mis-specified model — multiplicative + additive noise

Validation of method against simulated data 4. Hypothesis testing: Simulated from “wrong” model: additive + multiplicative noise. bias “-fold change” Proportion changed spots

Acknowledgments Lynn Crosby North Carolina State University Kevin Morgan Strategic Toxicological Sciences GlaxoWellcome

Santa Fe Institute www.santafe.edu postdoctoral fellowships available (apply before the end of the year) kepler@santafe.edu

Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by

Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by

Presentation Transcript

Microarray Data Analysis

Normalization of Microarray Data

Analysis of microarray data

Normalization for cDNA Microarray Data

Lecture 9 Microarray experiments MA plots Normalization of microarray data

Chipster Analysis Software for DNA Microarray Data

Analysis of Microarray Data

Microarray data analysis

Lecture 8 Microarray experiments MA plots Normalization of microarray data

DNA Microarray Data Acquisition and Analysis - Introduction to Stanford Microarray Database

Normalization for cDNA Microarray Data

Microarray Data Analysis Normalization

Microarray Data Normalization and Analysis

Statistical Analysis of DNA Microarray.

Analysis of Microarray Data

Filtering and Normalization of Microarray Gene Expression Data

Microarray data normalization and data transformation

Microarray Normalization

Microarray Data Analysis

DNA Microarray Data Acquisition and Analysis - Introduction to Stanford Microarray Database

Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by