Design of experiments and basic analysis: estimating and testing for differential expression.

Design of experimentsand basic analysis:estimating and testing for differential expression. Statistics for Microarray Data Analysis – Lecture 3 The Fields Institute for Research in Mathematical Sciences May 25, 2002

Design of cDNA microarray experiments

Some aspects of design Layout of the array • Which cDNA sequences to print? • Library • Controls • Spatial positions Allocation of samples to the slides • Different design layout • A vs B : Treatment vs control • Multiple treatments • Time series • Factorial • Replication • number of hybridizations • use of dye swap in replication • Different types replicates (e.g pooled vs unpooled material (samples)) • Other considerations • Physical limitations: the number of slides and the amount of material • Extensibility - linking

Graphical representation

Case 1: Meaningful biological control (C) Samples: Liver tissue from four mice treated by cholesterol modifying drugs. Question 1: Genes that respond differently between the Ti and the C. Question 2: Genes that responded similarly across two or more treatments relative to control. Case 2: Use of universal reference. Samples: Different tumor samples. Question: To discover tumor subtypes. T2 T3 T4 T1 Natural design choice T2 Tn Tn-1 T1 C Ref

Finding differentially expressed genes Wanted: tools to identify the genes whose expression levels are associated with a covariate or a response of interests. Examples include: Qualitative covariates or factors: e.g. treatment, cell type, tumor class; Quantitative covariates: e.g. dose, time; Responses: e.g. survival, cholesterol level, weight; Any combination of the above.

The simplest design question:Direct versus indirect comparisons Two samples e.g. KO vs. WT or mutant vs. WT Indirect Direct T Ref T C C average (log (T/C)) log (T / Ref) – log (C / Ref ) 2 /2 22 These calculations assume independence of replicates: the reality is not so simple.

Identifying differentially expressed genes with one slide • This is a common enough hope • Efforts are frequently successful • It is not hard to do by eye • The problem is probably beyond formal statistical inference (valid p-values, etc) for the foreseable future….why?

Single-slide methods • Existing methods. Model dependent rules for deciding whether (R,G) corresponds to a differentially expressed gene. • Amount to drawing two curves in the (R,G)-plane and calling a gene differentially expressed if its (R,G) falls outside the region between the two curves. • At this time we do not know enough about the systematic and random variation within a microarray experiment to justify such strong modeling assumptions. • n=1 may not be enough.

Single-slide methods, cont • Existing methods differ in the distributional assumptions they make regarding (R,G). • Chen et al. Each (R,G) is assumed to be normally and independently distributed with constant CV. Decision based on R/G only. • Newton et al. Gamma-Gamma-Bernoulli hierarchical model for each (R,G). • Roberts et al. Each (R,G) is assumed to be normally and independently distributed with variance depending linearly on the mean. • Sapir & Churchill. Each log R/G is assumed to be distributed according to a mixture of normal and uniform distributions. Decision based on R/G only.

Matt Callow’s Srb1 dataset (#5). Newton’s and Chen’s single slide method

Identifying differentially expressed genes with replicated slides Some aspects: • Between-slide normalization. • Summaries: • Averages and SDs, • t, Mann-Whitney, Cox model score and F statistics, regression coefficients, and others • How should we look at them? • Can we make valid probability statements?

Apo AI experiment Goal. To identify genes with altered expression in the livers of Apo AI knock-out mice (K) compared to inbred C57Bl/6 control mice (C). • 8 treatment mice (Ki) and 8 control mice (Ci) • 16 hybridizations: liver mRNA from each of the 16 mice (Ki , Ci ) is labelled with Cy5, while pooled liver mRNA from the control mice (C*) is labelled with Cy3. • Probes: ~ 6,000 cDNAs (genes), including 200 related to lipid metabolism. K 8 C* 8 C Data provided by Matt Callow, LBNL

Identifying differentially expressed genes, cont For each slide, we summarize each spot with M=log2(R/G). For each spot, call these k1, k2, … k8, c1, c2, …, c8. Statistics a) average difference: b) t statistic: c) B statistics: later… d) Robust t: not today. To identify differentially expressed genes: a) Diagnostic plots: q-q plot, histogram. b) Testing: p-values, adjusted p-values.

Histogram & normal q-q plot of t-statistics ApoA1

Why a normal q-q plot? One of the things we want to do with our t-statistics is roughly speaking, to identify the extreme ones. It is natural to rank them, but how extreme is extreme? Since the sample sizes here are not too small ( two samples of 8 each gives 16 terms in the difference of the means), approximate normality is not an unreasonable expectation for the null marginal distribution. Converting ranked t’s into a normal q-q plot is a great way to see the extremes: they are the ones that are “off the line”, at one end or another. This technique is particularly helpful when we have thousands of values. Of course we can’t expect all differentially expressed genes to stand out as extremes: many will be masked by more extreme random variation, which is a big problem in this context.

Useful plots of t-statistics

A more cautious approach These plots are useful, but we need to look at them more closely. Can we trust average effect sizes alone? Can we trust the t statistic alone? Here is evidence that the answer is no.

Results from 4 replicates of a different experiment

Points to note One set (green) has a high average M but also a high variance and a low t. Another (pale blue) has an average M near zero but a very small variance, leading to a large negative t. A third (dark blue) has a modest average M and a low variance, leading to a high positive t. A fourth (purple) has a moderate average M and a moderate variance, leading to a small t. Another pair (yellow, red) have moderate average Ms and middling variances, and moderately large ts. Does this happen with our Apo AI experiment?

M\t • t\M • t M Sets defined by cut-offs: from the Apo AI ko experiment

M\t • t\M • t M Results from the Apo AI ko experiment

M\t • t\M • t M Apo AI experiment: t vs average A.

An empirical Bayes story Using average M alone, we ignore useful information in the SD across replicated. Some large values are large because of outliers. Using t alone, we are liable to be misled by very small SDs. With thousands of genes, some SDs will be very small. Formal testing can sort out these issues for us, but if we simply want to rank, what should we rank on? One approach (SAM) is to inflate the SDs slightly. Another approach can be based on the following empirical Bayes story. There are a number of variants. Suppose that our M values are independently and normally distributed, and that a proportion p of genes are differentially expressed, i.e. have M’s with non-zero means. Further, suppose that the variances and means of these are chosen jointly from inverse chi-square and normal conjugate priors, respectively. Genes not differentially expressed have zero means and variances from the same inverse chi-squared distribution. The scale and d.f. parameters in the inverse chi-square are estimated from the data, as is a parameter c connecting the prior for the mean with that for the variances. We then look for the posterior probability that a given gene is differentially expressed, and find it is an increasing function of B over the page.

Empirical Bayes log posterior odds ratio (LOR) Notice that for large n this approximately t=M./s .

Comparison of different criteria These data come from the Srb1 transgenic mouse experiment with 8 replicates. See Table on next page.

M. T B Comment . 0 0 0 Not differentially expressed genes. 0 0 1 False negatives in M. And T (high but not extreme) - detected by B. False positives in T - small M. but tiny variance. 0 1 0 0 1 1 False negatives in M., but detected by T and B. False positives in M. - Large M. but too large variance to be trusted. 1 0 0 1 0 1 False negatives in T - large M. and true moderately high variance. 1 1 0 No genes here - extreme for M. and T => extreme for B! 1 1 1 High in all three statistics - clearly differentially expressed. Table Sets of genes. ”1” indicates that the genes in the set are extreme* for that statistic. *|M.|>0.5 |T|>4.5 B>-2 These limits are chosen for illustration. Normally they would be slightly higher.

Summary • Microarray experiments typically have thousands of genes, but only few (1-10) replicates for each gene. • Averages can be driven by outliers. • ts can be driven by tiny variances. • B = LOR will, we hope • use information from all the genes • combine the best of M. and t • avoid the problems of M. and t Ranking on B could be helpful.

Identifying differentially expressed genes K samples: Single factor experiment

Linear models In many situations we want to combine data from different experiments in a slightly more elaborate manner than simply averaging. One way of doing so is via (fixed effects) linear models, where we estimate certain quantities of interest which we call effects for each gene on our slide. Typically these estimates may be regarded as approximately normally distributed with common SD, and mean zero in the absence of any relevant differential expression. In such cases, the preceding two strategies: q-q plots, and various combinations of estimated effect (cf M.), standardized estimate (cf. t) both apply. We illustrate in a couple of cases.

Design I: Design II: A A P L L P w

Linear model analysis A Log ratios: y Parameters: b = ( a-p, l-p ), where a = log2A, p = log2P and l = log2L Model: E(y1) = p – a E(y2) = l – p E(y3) = a – l L P

A P L A P L A 2 2 2 L P w w For k = 3, efficiency ratio (Design I(a) / Design II) = 3 In general, efficiency ratio = 2k / (k-1)

A P L A P L A 2 2 2 L P w w For k = 3, efficiency ratio (Design I(b) / Design II) = 1.5 In general, efficiency ratio = k / (k-1)

Targets samples: K=6 A D P L V M

Estimation A P Multiple direct comparisons between different samples (no common reference) Different ways of estimating the same contrast: e.g. A compared to P Direct = A-P Indirect = A-M + (M-P) or A-D + (D-P) or -(L-A) - (P-L) 2 L D 2 V M

Analysis using a linear model Log ratios: y Parameters: b = ( a–l, p-l, d-l, v-l, m-l ), where a = log2A, p = log2P, d = log2D, v = log2V, m = log2M, l = log2L Model: Ordinary least squares: In practice, we use robust regression. Estimates for other estimable contrasts follow in the usual way.

Pairwise comparisons: effects vs average intensity red: genes used in clusters; blue: genes used to normalize

Contrasts Because of the connectivity of our experiment, we can estimate all 15 different pairwise comparisons directly and/or indirectly. For every gene we thus have a pattern based on the 15 pairwise comparisons. Gene #15,228

Contrasts in another way Instead of estimating pairwise comparisons between each of the six effects, we can come closer to estimating the effects themselves by doing so subject to the standard zero sum constraint (6 parameters, 5 d.f.). What we estimate for a, say, subject to this constraint, is in reality an estimate of a - 1/6(a + p + d + v + m + l). In effect we have created the whole-bulb reference in silico. Gene # 15,228

Single factor experiment –time course T1 T2 T3 T4 T5 T6 T7 Ref • Possible designs: • All sample vs common pooled reference • All sample vs time 0 • Direct hybridization between times. Pooled reference Compare to T1 t vs t+1 t vs t+2 t vs t+3

T2 T4 T1 T3 T2 T4 T1 T3 T2 T4 T1 T3 Ref T1 T2 T3 T4 T1 T2 T3 T4 T2 T4 T1 T3

Identifying differentially expressed genes 2 x 2 factorial experiment:

Examples of two factors, each with two levels Example 1: Suppose we wish to study the joint effect of two drugs, A and B. 4 possible treatment combinations: C: No treatment A: drug A only. B: drug B only. A.B: both drug A and B. Example 2: Our interest in comparing two strain of mice (mutant and wild-type) at two different times, postnatal and adult. 4 possible samples: C: WT at postnatal A: WT at adult (effect of time only) B: MT at postnatal (effect of the mutation only) A.B : MT at adult (effect of both time and the mutation).

A B A.B y2 y3 y1 C One possible design: Use C as a common reference y1 = log (A / C) =a + error y2 = log (B / C) =b + error y3 = log (AB / C) =a + b + ab + error Estimate (ab)with y3 - y2 - y1.

C A 2 4 1 3 5 AB B 6 Statisticians recognise a factorial design m m+a m+a+b+ab m+b

Analysis using a linear model Log ratios: y Parameters: b = (a, b, ab), where main effect a, main effect b and interaction effect ab. Model: C A A.B B Ordinary least squares: In practice, we use robust regression. Estimates for other estimable contrasts follow in the usual way.

Estimates of a effect log2(A/C) vs ave A gene A gene B = average log√(R*G)

Estimates of a effect vs SE a effect •  • t =  / SE •  t Log2(SE)

C A C A C A A B A.B A.B A.B B B A.B B C 2 x 2 factorial: design options Table entry: variance

Design of experiments and basic analysis: estimating and testing for differential expression.