Design of Statistical Investigations

Design of Statistical Investigations 3. Design of Experiments 1 Some Basic Ideas Stephen Senn SJS SDI_3

Elements of an ExperimentThe “Nouns” • Experimental material • Basic units • Blocks • Replications • Treatments • Orderings • Dimensions • Combinations SJS SDI_3

Elements of an ExperimentThe “Verbs” • Allocation • Which material gets which treatment • For example using some form of randomisation • Conduct • How will it all be carried out? • Measuring • When to measure what • Analysis SJS SDI_3

Exp_1Rat TXB2 • Experimental material • 36 Rats • Treatments to be studied • 6 in a ‘one-way layout’ • 4 new chemical entities • 1 vehicle • 1 marketed product SJS SDI_3

Caution!!!!! • In practice such things are not given • Material • Why rats and not mice, dogs, or guinea-pigs? • Why 36? • Treatments • Why these 6? • In practice the statistician can be involved in such decisions also SJS SDI_3

Exp_1Rat TXB2Allocation • If rats are not differentiable in any way we can determine, we might as well allocate at random? • Unconstrained randomisation not a good idea, however. Some treatments will be allocated to few rats. • So constrain to have 6 rats per group SJS SDI_3

S-Plus Randomisation • #M2 Rat TXB2 Randomisation • #Vector of treatments • treat<-c(rep("V",6),rep("M",6),rep("a",6), • rep("b",6),rep("c",6),rep("d",6)) • #Random number for each rat • rnumb<-runif(36,0,1) • #Sort rats by random number • rat<-sort.list(rnumb) • #Join rats and treatments • temp.frame<-data.frame(rat,treat) • #Sort rows by rat • des.frame<-sort.col(temp.frame, • c("rat","treat"),"rat") • #Print design • des.frame We shall illustrate an alternative using the sample function later in the course SJS SDI_3

Result of Randomisation • rat treat • 12 14 M • 17 15 a • 20 16 b • 24 17 b • 34 18 d • 26 19 c • 23 20 b • 30 21 c • 16 22 a • 21 23 b • 32 24 d • 28 25 c • 8 26 M • rat treat • 9 1 M • 22 2 b • 4 3 V • 33 4 d • 13 5 a • 11 6 M • 10 7 M • 31 8 d • 7 9 M • 19 10 b • 3 11 V • 25 12 c • 18 13 a • rat treat • 14 27 a • 1 28 V • 29 29 c • 36 30 d • 6 31 V • 5 32 V • 35 33 d • 15 34 a • 2 35 V • 27 36 c SJS SDI_3

Exp_1Rat TXB2Conduct • We will not cover this in this course • This does not mean that this is not important • In the Exp_1 example precise instructions might be necessary for treating the rats. SJS SDI_3

Exp_1Rat TXB2Measurement • Obviously we have to decide what it is important to measure • Here it has been decided to measure TXB2 a marker of Cox-1 activity • Cox = cyclooxygenase • Analgesics are designed to inhibit Cox-2, which is involved in synthesis of inflammatory prostaglandins SJS SDI_3

Measurement (Cont) • However they also tend to inhibit Cox-1 which is involved in synthesis of the prostaglandins that help maintain gastric mucosa • Cox-1 inhibition can lead to ulcers • Ulcers are an unwanted side-effect of Non Steroidal Anti-inflammatory Drugs (NSAIDs) SJS SDI_3

The Moral • Even ‘simple’ experiments may involve complex subject matter-knowledge • It may be dangerous for the statistician to assume that all that is being produced is sets of numbers, details being irrelevant • Team work may be necessary SJS SDI_3

Analysis • One-way layout • Six treatments • Balanced design • “No-brainer” is one-way ANOVA • We shall look at the maths of one-way ANOVA in more detail later. • For the moment take this as understood SJS SDI_3

S-PLUS ANOVA Code • #Analysis of TXB2 data • #Set contrast options • options(contrasts=c(factor="contr.treatment", • ordered="contr.poly")) • #Input data • treat<-factor(c(rep(1,6),rep(2,6), • rep(3,6),rep(4,6),rep(5,6),rep(6,6)), • labels=c("V","M","a","b","c","d")) • TXB2<-c(196.85,124.40,91.20,328.05,268.30,214.70, • 2.08,1.97,4.80,5.01,2.52,9.35, • 315.85,75.60,322.80,212.15,42.95, 111.90, • 127.95,81.75,52.70,352.85,198.80,107.65, • 83.19,66.80,81.15,39.00,61.96,87.00, • 74.48,60.00,77.00,42.00,48.95,66.30) • fit1<-aov(TXB2~treat)#ANOVA • summary(fit1) SJS SDI_3

S-PLUS Output • summary(fit1) • Df Sum of Sq Mean Sq F Value Pr(F) • treat 5 184595.5 36919.11 6.313142 0.000409356 • Residuals 30 175439.3 5847.98 So there is highly significant difference between treatments but this does not make this an adequate analysis SJS SDI_3

S-PLUS Diagnostic Code • #Diagnostic plot data • par (mfrow=c(2,2)) • plot(treat~TXB2) • hist(resid(fit1),xlab="residual") • plot(fit1$fitted.values,resid(fit1),xlab="fitted",ylab="residual") • abline(h=0) • qqnorm(resid(fit1),xlab="theoretical",ylab="empirical") • qqline(resid(fit1)) SJS SDI_3

SJS SDI_3

Model Failure • Histogram of residuals has heavy tails • QQ Plot shows clear departure from Normality • Variance increases with mean • Suggests log-transformation SJS SDI_3

SJS SDI_3

Exp_2: A Simple Design Problem(The simplest) • You have N experimental units in total • They are completely exchangeable • You have two treatments A and B • with no prior knowledge of their effects • You wish to compare A and B • continuous outcome assumed Normal • How many units for A and for B? SJS SDI_3

Solution is obvious • Allocate half the units to one treatment and half to the other • Assuming that there is an even number of units • However, we should go through the design cycle • What sort of data will we collect? • What will we do with them? SJS SDI_3

Basic Design Cycle Objective Possible Conclusions Tentative Design Potential Data Possible Analysis Relevant factors SJS SDI_3

The Anticipated Data • Two mean outcomes • Variances expected to be the same • Assumption but • Reasonable under null hypothesis • No other assumption is more reasonable given that we know nothing about the treatments • We will calculate the contrast between these means SJS SDI_3

SJS SDI_3

Now set the derivatives equal to zero From (2) and (3) we have SJS SDI_3

So What!!?? • Solution is obvious • Statistical theory does not seem to have helped us very much • However, this was a trivial problem • We now try a slightly more complicated experiment • This leads to a non-trivial problem SJS SDI_3

Exp_3A More Complicated Case • Now suppose that we are comparing k experimental treatments to a single control. • The treatments will not be compared to each other. • How many units should we allocate to each treatment? • We assume that variances do not vary with treatment: homoscedasticity SJS SDI_3

Exp_3 Continued • Arguments of symmetry suggest the active treatments be given to the same number of units, say n. • Suppose that m units will be allocated the control. • With N units in total we have N = m + kn SJS SDI_3

We consider the variance of a typical contrast Incorporating the necessary constraint using a Lagrange multiplier we obtain the following objective function And proceed to minimise this by setting the partial derivatives with respect to m, n and l equal to zero. (Note that we assume that k and N are fixed in the design specification.) SJS SDI_3

Set derivatives equal to zero. Solution gives Setting equal to zero we have SJS SDI_3

From (4) and (5) we have Substituting in (4) we have SJS SDI_3

Check • Exp_2 was a special case of Exp_3 with k = 1 • So our general solution must give the same answer as the special case when k = 1 • But when k = 1 the formula yields m = N/2, which is the solution we reached before SJS SDI_3

SJS SDI_3

Exp_3 Concluded • The “optimal solution” was not easy to guess • It consists of more units to the control than to the experimental treatment • Lesson: be careful! SJS SDI_3

Questions • What are the practical problems in implementing the solution we found for Exp_3? • Why might this not be a good solution after all? • Are there any implications for the design of Exp_1? SJS SDI_3

Design of Statistical Investigations