Two Color Microarrays

Two Color Microarrays SPH 247 Statistical Analysis of Laboratory Data

Two-Color Arrays • Two-color arrays are designed to account for variability in slides and spots by using two samples on each slide, each labeled with a different dye. • If a spot is too large, for example, both signals will be too big, and the difference or ratio will eliminate that source of variability SPH 247 Statistical Analysis of Laboratory Data

Dyes • The most common dye sets are Cy3 (green) and Cy5 (red), which fluoresce at approximately 550 nm and 649 nm respectively (red light ~ 700 nm, green light ~ 550 nm) • The dyes are excited with lasers at 532 nm (Cy3 green) and 635 nm (Cy5 red) • The emissions are read via filters using a CCD device SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data

File Format • A slide scanned with Axon GenePix produces a file with extension .gpr that contains the results:http://www.axon.com/gn_GenePix_File_Formats.html • This contains 29 rows of headers followed by 43 columns of data (in our example files) • For full analysis one may also need a .gal file that describes the layout of the arrays SPH 247 Statistical Analysis of Laboratory Data

Analysis Choices • Mean or median foreground intensity • Background corrected or not • Log transform (base 2, e, or 10) or glog transform • Log is compatible only with no background correction • Glog is best with background correction SPH 247 Statistical Analysis of Laboratory Data

Array normalization • Array normalization is meant to increase the precision of comparisons by adjusting for variations that cover entire arrays • Without normalization, the analysis would be valid, but possibly less sensitive • However, a poor normalization method will be worse than none at all. SPH 247 Statistical Analysis of Laboratory Data

Possible normalization methods • We can equalize the mean or median intensity by adding or multiplying a correction term • We can use different normalizations at different intensity levels (intensity-based normalization) for example by lowess or quantiles • We can normalize for other things such as print tips SPH 247 Statistical Analysis of Laboratory Data

Example for Normalization SPH 247 Statistical Analysis of Laboratory Data

> normex <- matrix(c(1100,110,80,900,95,65,425,85,55,550,110,80),ncol=4) > normex [,1] [,2] [,3] [,4] [1,] 1100 900 425 550 [2,] 110 95 85 110 [3,] 80 65 55 80 > group <- as.factor(c(1,1,2,2)) > anova(lm(normex[1,] ~ group)) Analysis of Variance Table Response: normex[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 262656 262656 18.888 0.04908 * Residuals 2 27812 13906 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 SPH 247 Statistical Analysis of Laboratory Data

> anova(lm(normex[2,] ~ group)) Analysis of Variance Table Response: normex[2, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 25.0 25.0 0.1176 0.7643 Residuals 2 425.0 212.5 > anova(lm(normex[3,] ~ group)) Analysis of Variance Table Response: normex[3, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 25.0 25.0 0.1176 0.7643 Residuals 2 425.0 212.5 SPH 247 Statistical Analysis of Laboratory Data

Additive Normalization by Means SPH 247 Statistical Analysis of Laboratory Data

> cmn <- apply(normex,2,mean) > cmn [1] 430.0000 353.3333 188.3333 246.6667 > mn <- mean(cmn) > normex - rbind(cmn,cmn,cmn)+mn [,1] [,2] [,3] [,4] cmn 974.58333 851.25 541.25 607.9167 cmn -15.41667 46.25 201.25 167.9167 cmn -45.41667 16.25 171.25 137.9167 > normex.1 <- normex - rbind(cmn,cmn,cmn)+mn SPH 247 Statistical Analysis of Laboratory Data

> anova(lm(normex.1[1,] ~ group)) Analysis of Variance Table Response: normex.1[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 114469 114469 23.295 0.04035 * Residuals 2 9828 4914 > anova(lm(normex.1[2,] ~ group)) Analysis of Variance Table Response: normex.1[2, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 28617.4 28617.4 23.295 0.04035 * Residuals 2 2456.9 1228.5 > anova(lm(normex.1[3,] ~ group)) Analysis of Variance Table Response: normex.1[3, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 28617.4 28617.4 23.295 0.04035 * Residuals 2 2456.9 1228.5 SPH 247 Statistical Analysis of Laboratory Data

Multiplicative Normalization by Means SPH 247 Statistical Analysis of Laboratory Data

> normex*mn/rbind(cmn,cmn,cmn) [,1] [,2] [,3] [,4] cmn 779.16667 775.82547 687.33407 679.13851 cmn 77.91667 81.89269 137.46681 135.82770 cmn 56.66667 56.03184 88.94912 98.78378 > normex.2 <- normex*mn/rbind(cmn,cmn,cmn) > anova(lm(normex.2[1,] ~ group)) Response: normex.2[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 8884.9 8884.9 453.71 0.002197 ** Residuals 2 39.2 19.6 > anova(lm(normex.2[2,] ~ group)) Response: normex.2[2, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 3219.7 3219.7 696.33 0.001433 ** Residuals 2 9.2 4.6 > anova(lm(normex.2[3,] ~ group)) Response: normex.2[3, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 1407.54 1407.54 57.969 0.01682 * Residuals 2 48.56 24.28 SPH 247 Statistical Analysis of Laboratory Data

Multiplicative Normalization by Medians SPH 247 Statistical Analysis of Laboratory Data

> cmd <- apply(normex,2,median) > cmd [1] 110 95 85 110 > normex.3 <- normex*md/rbind(cmd,cmd,cmd) > normex.3 [,1] [,2] [,3] [,4] cmd 1000.00000 947.36842 500.00000 500.00000 cmd 100.00000 100.00000 100.00000 100.00000 cmd 72.72727 68.42105 64.70588 72.72727 > anova(lm(normex.3[1,] ~ group)) Response: normex.3[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 224377 224377 324 0.003072 ** Residuals 2 1385 693 > anova(lm(normex.3[2,] ~ group)) Response: normex.3[2, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 0 0 Residuals 2 0 0 > anova(lm(normex.3[3,] ~ group)) Response: normex.3[3, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 3.451 3.451 0.1665 0.7228 Residuals 2 41.443 20.722 SPH 247 Statistical Analysis of Laboratory Data

Intensity-based normalization • Normalize by means, medians, etc., but do so only in groups of genes with similar expression levels. • lowess is a procedure that produces a running estimate of the middle, like a robustified mean • If we subtract the lowess of each array and add the average of the lowess’s, we get the lowess normalization SPH 247 Statistical Analysis of Laboratory Data

norm <- function(mat1) { mat2 <- as.matrix(mat1) p <- dim(mat2)[1] n <- dim(mat2)[2] cmean <- apply(mat2,2,mean) cmean <- cmean - mean(cmean) mnmat <- matrix(rep(cmean,p),byrow=T,ncol=n) return(mat2-mnmat) } SPH 247 Statistical Analysis of Laboratory Data

lnorm <- function(mat1,span=.1) { mat2 <- as.matrix(mat1) p <- dim(mat2)[1] n <- dim(mat2)[2] rmeans <- apply(mat2,1,mean) rranks <- rank(rmeans,ties.method="first") matsort <- mat2[order(rranks),] r0 <- 1:p lcol <- function(x) { lx <- lowess(r0,x,f=span)$y } lmeans <- apply(matsort,2,lcol) lgrand <- apply(lmeans,1,mean) lgrand <- matrix(rep(lgrand,n),byrow=F,ncol=n) matnorm0 <- matsort-lmeans+lgrand matnorm1 <- matnorm0[rranks,] return(matnorm1) } SPH 247 Statistical Analysis of Laboratory Data

Two Color Microarrays