Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the

Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical Science

The Experimental Design The Experimental Design dictates a good deal of what you can do with the data Good normalization and processing reflects the experimental design The design also facilitates certain comparisons between samples and provides the statistical power you need for assigning confidence limits to individual measurements The design must reflect experimental reality The most straight-forward designs compare expression in two classes of samples to look for patterns that distinguish them.

Sample Pairing for Co-Hybridization Experiments Direct Comparison with Dye Swap: A1 B1 A2 B2 A3 B3 A4 B4 A1 B1 A2 A3 B2 A4 B3 B4 • RNA sample is not limiting (e.g. plenty of sample) • Flip dyes account for any gene-dye effects Balanced Block Design: A2 B2 A1 B1 A4 B4 A3 B3 • RNA sample is limiting • Balanced blocking accounts for any gene-dye effects

A B C D D A R A B B F E C D C Multiple Sample Pairings Reference Design (Indirect Comparison): • More than two samples are compared • (e.g. tumor classification, time course) • Flip dyes are not necessary but can be done to increase precision • Ratio values are inferred (indirect) • Suited for cluster analysis – need common reference Loop Design:

Loop vs. Reference Designs • Loop design • Can provide direct measurements • Give more data on each experimental sample with the same number of hybs • Require more RNA per sample • Can “unwind” with a bad sample or for a gene with bad data • Reference design • Easily extensible • Simple interpretation of all results • Requires less RNA per sample • Less sensitive to bad RNA samples and bad array elements

Genotype Environment Reference Sample Assay Variation One Possible Experimental Paradigm:Examining Genotype, Phenotype, and Environment Parental - stressed Derived - stressed Parental - unstressed Derived - unstressed

Basic Design Principles Biological replicas are more informative than correlated replicas (independent RNA, independent slides) More replicas are better – higher statistical power For loops, hybridizations of individual samples should be “balanced” (as many Cy3 as Cy5 labelings) Self-self hybs add data on reproducibility and can be used to produce error models At a minimum, should use dye swap replicates to compensate for any dye biases in labeling or detection

(Simon et al., Genetic Epidemiology 23: 21-36, 2002) n = [4(za/2 + zb)2] / [(d/1.4s)2] Where za/2 and zb are normal percentile values at significance level a and false negative rate b; parameter d represents the minimum detectable log2 ratio; and s represents the SD of log ratio values. For a = 0.001 and b = 0.05, then za/2 = -3.29 and zb = -1.65. Assume d = 1.0 (2-fold change) and s = 0.25, Therefore n = 12 samples (6 query and 6 control). How Many Replicates?

Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the