Latin Squares: Design, Analysis, and Estimation

V. Latin squares designs (LS) V.A Design of Latin squares V.B Indicator-variable models and estimation for a Latin square V.C Hypothesis testing using the ANOVA methodfor a Latin square V.D Diagnostic checking V.E Treatment differences V.F Design of sets of Latin squares V.G Hypothesis tests for sets of Latin squares Statistical Modelling Chapter V

V.A Design of Latin squares • Definition V.1: A Latin square design is one in which • each treatment occurs once and only once in each row and each column • so that the numbers of rows, columns and treatments are all equal. • Clearly, the total number of observations is n=t2. • Suppose in a field trial moisture is varying across the field and the stoniness down the field. • A Latin square can eliminate both sources of variability. Statistical Modelling Chapter V

Example V.1 Fertilizer experiment Statistical Modelling Chapter V

Notes • Even if one has not identified trends in two directions, a LS may be employed to guard against the problem of putting the blocks in the wrong direction. • LSs may also be used when there are two different kinds of blocking variables — for example animals and times. • General principle is to maximize row and column differences so as to minimize uncontrolled variation affecting treatment differences. • Problem is restriction that no. replicates = no. treats • Several fundamentally different LSs exist for a particular t • for t= 4 there are three different squares. • Latin squares for t= 3,4, ..., 9 given in Appendix 8A of Box, Hunter and Hunter. • To randomize these designs appropriately involves: • randomly select one of the Latin squares available for t; • randomly permute the rows and then the columns; • randomly assign letters to treatments. Statistical Modelling Chapter V

a) Obtaining a layout for a Latin Square in R • General instructions given in Appendix B, Randomized layouts and sample size computations in R. Example V.2 Pollution effects of petrol additives • 4 cars and 4 drivers in a study of effects of 4 petrol additives on pollution. • Desirable to isolate both car-to-car and driver-to-driver differences. • A 4  4 Latin square would enable this to be done. • Names for rows, columns and treats for this example are Cars, Drivers and Additives, respectively. • Also, t = 4 and a design obtained from BH2. Statistical Modelling Chapter V

Expressions to be used for example > t <- 4 > n <- t*t > LSPolut.unit <- list(Drivers=t, Cars=t) > Additives <- factor(c(1,2,3,4, 4,3,2,1, 2,4,1,3, 3,1,4,2), + labels=c("A","B","C","D")) > LSPolut.lay <- fac.layout(unrandomized=LSPolut.unit, + randomized=Additives,seed=941) > remove("Additives") > LSPolut.lay • Note: no nested.factors as Drivers and Cars are to be randomized independently • Hence they are not nested (are crossed) Statistical Modelling Chapter V

Randomized layout > LSPolut.lay Units Permutation Drivers Cars Additives 1 1 11 1 1 B 2 2 12 1 2 D 3 3 10 1 3 C 4 4 9 1 4 A 5 5 7 2 1 A 6 6 8 2 2 B 7 7 6 2 3 D 8 8 5 2 4 C 9 9 15 3 1 D 10 10 16 3 2 C 11 11 14 3 3 A 12 12 13 3 4 B 13 13 3 4 1 C 14 14 4 4 2 A 15 15 2 4 3 B 16 16 1 4 4 D Statistical Modelling Chapter V

V.B Indicator-variable models and estimation for a Latin square • Have to decide whether each of the factors Rows, Columns and Treatments are to be regarded as fixed or random. • As for the RCBD, it happens that the analysis of the Latin square is essentially unaffected by which model is used. • Generally, the Latin square involves t rows and columns so that there are n=t2 observations in all. Statistical Modelling Chapter V

a) Maximal model where Y is the t2-vector of random variables for the response variable observations, b is the t-vector of parameters specifying a different mean response for each row, XR is the t2t matrix indicating the row from which an observation came, d is the t-vector of parameters specifying a different mean response for each column, XC is the t2t matrix indicating the column from which an observation came, t is the t-vector of parameters specifying a different mean response for each treatment, XT is the t2tmatrix indicating the observations that received each of the treatments. • The maximal model when all are fixed, is • Our model also assumes Y ~ N(yR+C+T, V) Statistical Modelling Chapter V

Example V.3 A 33 Latin square • Suppose that a 33 Latin square with the following arrangement of treatments was being considered: • Then, for this example, • Note for general systematic layout XR=It1t and XC=1tIt but XTcannot be written as a direct product. Statistical Modelling Chapter V

are the t2-vectors of row, column, treatment and grand means, respectively. Estimators of expected values under max. model Also, note that • That is, MR, MC, MT and MG are the row, column, treatment and grand mean operators, respectively. • So once again the estimators are functions of means. • Further, if the data in the vector Y has been arranged in standard order for Rows then Columns, the operators are: In this case it is not possible to write MT as a direct product of I and J matrices as the treatments will not be in a systematicorder expressible in this form. Statistical Modelling Chapter V

Example V.3 A 33 Latin square (continued) Statistical Modelling Chapter V

b) Alternative expectation models • 8 possible different models for the expectation when Rows, Columns and Treatments are considered fixed: Statistical Modelling Chapter V

Marginality relations between the models • Estimators all are functions of the four mean vectors for this design Statistical Modelling Chapter V

V.C Hypothesis testing using the ANOVA method for a Latin square • An ANOVA will be used to choose between the 8 alternative expectation models for a Latin square. Statistical Modelling Chapter V

a) Analysis of an example • Example V.2 Pollution effects of petrol additives (continued) Statistical Modelling Chapter V

Hypothesis test for the example Step 1: Set up hypotheses a) H0: tA=tB=tC=tD (or XAt not required in model) H1: not all population Additives means are equal b) H0: bI=bII=bIII=bIV (or XDb not required in model) H1: not all population Drivers means are equal c) H0: d1=d2=d3=d4 (or XCd not required in model) H1: not all population Cars means are equal Set a= 0.05. Statistical Modelling Chapter V

Hypothesis test for the example (continued) Step 2: Calculate test statistics • Note Drivers#Cars refers to the "interaction between Drivers and Cars" • contrasts with Cars[Drivers] or Drivers[Cars]; • explained in chapter VII; • R does not distinguish as all are Drivers:Cars. Statistical Modelling Chapter V

Hypothesis test for the example (continued) Step 3: Decide between hypotheses Differences between drivers but not cars and differences between the additives. The model that best describes the data would appear to be yD+A = XDb + XAt, an additive model for Driver and Additive effects. Statistical Modelling Chapter V

b) Sums of squares for the analysis of variance • In this section we will use the generic names of Rows, Columns and Treatments for the factors in a Latin square. • The estimators of the SSqs for the Latin square ANOVA are the SSqs of the following vectors: • where • Ds are n-vectors of deviations from Y and • vectors with the e subscripts are n-vectors of effects. Statistical Modelling Chapter V

Ssq for the ANOVA (continued) • From section V.B, Models and estimation for a Latin square, • Can be shown that All the Ms and Qs are symmetric and idempotent. Statistical Modelling Chapter V

ANOVA table is constructed as follows: • See notes for example of computation of vectors and geometrical interpretation Statistical Modelling Chapter V

c) Expected mean squares • To justify our choice of test statistics, we want to work out the E[MSq]s in the ANOVA table under the 8 alternative expectation models. • However, to save space work out E[MSq]s under the maximal model and identify which terms in E[MSq]s go to zero under alternative models. Statistical Modelling Chapter V

E[MSq]s with fixed Rows and Columns effects • Given the expressions in the above table, the population means of the mean squares could be computed if knew the bis, djs, tks and s2. • Each of qR(y), qC(y) and qT(y) equal 0 when the terms XRb, XCd and XTt, respectively, removed from the model. • Hence a significant F value for a line indicates that the corresponding term should be included in the model. Statistical Modelling Chapter V

Alternative analysis • Both Rows and Columns are random • The model in this case would be that • It allows for equal covariance between units from the same row and also between units from the same column. Statistical Modelling Chapter V

Example V.3 A 33 Latin square (continued) Shows: Statistical Modelling Chapter V

Alternative variance models involve setting and/or and this will result in the one(s) set to zero being dropped from the expected mean square. E[MSq]s under alternative model • Alternative expectation model is yG=XGm and under this model qT(y) = 0. • This exactly parallelswhat happens when both are fixed. Statistical Modelling Chapter V

d) Summary of the hypothesis test • See notes e) Comparison with traditional Latin-square ANOVA table • Differences symbolic – see notes for details Statistical Modelling Chapter V

f) Computation of ANOVA and diagnostic checking in R • Diagnostic checking is the same as for the RCBD Example V.2 Pollution effects of petrol additives (continued) • First set up and attach data.frame and do initial boxplots. • Then, use the aov function, either with or without the Error as part of the model. • In this experiment uncontrolled variation made up of Drivers, Cars and Drivers:Cars. • R shorthand for this: Drivers*Cars that expands to Drivers + Cars + Drivers:Cars, the latter being equivalent to Drivers#Cars. • Outputs for analysis with Error & diagnostic checking are given below Statistical Modelling Chapter V

R output > load("LSPolut.dat.rda") > attach(LSPolut.dat) > boxplot(split(Reduct.NO, Drivers), xlab="Drivers", ylab="Reduction in NO") > boxplot(split(Reduct.NO, Cars), xlab="Cars", ylab="Reduction in NO") > boxplot(split(Reduct.NO, Additives), xlab="Additives", ylab="Reduction in NO") Statistical Modelling Chapter V

Boxplots for initial graphical exploration of the data Statistical Modelling Chapter V

R output (continued) < LSPolut.aov <- aov(Reduct.NO ~ Drivers + Cars + Additives + + Error(Drivers*Cars), LSPolut.dat) > summary(LSPolut.aov) Error: Drivers Df Sum Sq Mean Sq Drivers 3 216 72 Error: Cars Df Sum Sq Mean Sq Cars 3 24 8 Error: Drivers:Cars Df Sum Sq Mean Sq F value Pr(>F) Additives 3 40.000 13.333 5 0.0452 Residuals 6 16.000 2.667 > #Compute Drivers and Cars Fs and p-values > Drivers.F <- 72/2.667 > Drivers.p <- 1-pf(Drivers.F, 3, 6) > Cars.F <- 8/2.667 > Cars.p <- 1-pf(Cars.F, 3, 6) > data.frame(Drivers.F,Drivers.p,Cars.F,Cars.p) Drivers.F Drivers.p Cars.F Cars.p 1 26.99663 0.0006989578 2.999625 0.1169842 Statistical Modelling Chapter V

R output (continued) > # > # Diagnostic checking > # > res <- resid.errors(LSPolut.aov) > fit <- fitted.errors(LSPolut.aov) > data.frame(Drivers,Cars,Additives,Reduct.NO,res,fit) Drivers Cars Additives Reduct.NO res fit 1 1 1 B 20 1 19 2 1 2 D 20 1 19 3 1 3 C 17 -1 18 4 1 4 A 15 -1 16 5 2 1 A 20 -1 21 6 2 2 B 27 -1 28 7 2 3 D 23 1 22 8 2 4 C 26 1 25 9 3 1 D 20 -1 21 10 3 2 C 25 -1 26 11 3 3 A 21 1 20 12 3 4 B 26 1 25 13 4 1 C 16 1 15 14 4 2 A 16 1 15 15 4 3 B 15 -1 16 16 4 4 D 13 -1 14 Statistical Modelling Chapter V

R output (continued) > plot(fit, res, pch=16) > qqnorm(res, pch=16) > qqline(res) > tukey.1df(LSPolut.aov, LSPolut.dat, error.term = "Drivers:Cars") $Tukey.SS [1] 4.54224 $Tukey.F [1] 1.982167 $Tukey.p [1] 0.2181923 $Devn.SS [1] 11.45776 Statistical Modelling Chapter V

Hypothesis test for the example Step 1: Set up hypotheses (as before) a) H0: tA=tB=tC=tD (or XAt not required in model) H1: not all population Additives means are equal b) H0: bI=bII=bIII=bIV (or XDb not required in model) H1: not all population Drivers means are equal c) H0: d1=d2=d3=d4 (or XCd not required in model) H1: not all population Cars means are equal Set a= 0.05. Statistical Modelling Chapter V

Hypothesis test for the example (continued) Step 2: Calculate test statistics • Note inclusion of Nonadditivity Statistical Modelling Chapter V

Hypothesis test for the example (continued) Step 3: Decide between hypotheses As before, differences between drivers but not cars and differences between the additives. The model that best describes the data would appear to be yD+A = XDb + XAt, the additive model for Driver and Additive effects. The test for transformable nonadditivity is nonsignificant. Statistical Modelling Chapter V

Diagnostic checking • The residuals-versus-fitted-values plot indicates that the residuals are either -1 or 1 (must be artificial data). • Normal Probability Plot indicates that the data are not normal. • Clearly example can only be considered illustrative. Statistical Modelling Chapter V

IV.D Diagnostic checking • Again, we have assumed Y ~ N(y, s2I) where, for the maximal model, yR+C+T= E[Y] =XRb + XCd + XTt • For this model to be appropriate requires a similar set of behaviours as for the RCBD: • response is operating additively: a treatment has about the same additive effect on each unit; • variability of the units is the same for all row-column combinations; • each observation displays the covariance implied by the model (independence for Rows and Columns fixed; equal correlation within rows (columns) for Rows (Columns) random); and • that the response of the units is normally distributed. • As noted before, diagnostic checking same as for RCBD Statistical Modelling Chapter V

V.E Treatment differences • For the purposes of the scientist the effects of rows and columns are not of primary interest • Rather, focus on treatment differences. • Same as for CRD and RCBD. Example V.2 Pollution effects of petrol additives (continued) • As Additives significant, use Tukey's HSD procedure. Statistical Modelling Chapter V

Example V.2 Pollution effects of petrol additives (continued) > # multiple comparisons > # > model.tables(LSPolut.aov, type="means") Tables of means Grand mean 20 Drivers Drivers 1 2 3 4 23 24 15 18 Cars Cars 1 2 3 4 19 19 22 20 Additives Additives A B C D 18 22 21 19 > q <- qtukey(0.95, 4, 6) > q [1] 4.895599 • Comparing the differences in the additive means with Tukey’s HSD, it is concluded that only the difference between A and B are significant. Statistical Modelling Chapter V

Bar chart of Additives differences > # Plotting Treat means > LSPolut.tab <- model.tables(LSPolut.aov, type="means") > LSPolut.Adds.Mean <- data.frame(Adds.lev = levels(Additives), + Adds.Mean = as.vector(LSPolut.tab$tables$Additives)) > LSPolut.Adds.Mean <- LSPolut.Adds.Mean[order(LSPolut.Adds.Mean$Adds.Mean),] > LSPolut.Adds.Mean$Adds.lev <-factor(LSPolut.Adds.Mean$Adds.lev, + levels=LSPolut.Adds.Mean$Adds.lev) > barchart(Adds.Mean ~ Adds.lev, xlab="Additives", + ylim=c(0,25), ylab="NO Reduction", + main="Fitted values for Nitrous Oxide Reduction", + data=LSPolut.Adds.Mean) Note use of ylim to include 0 on y-axis Statistical Modelling Chapter V

V.F Design of sets of Latin squares • To overcome the small residual df problem several squares can be used. • In the case Example V.2, Pollution effects of petrol additives, Latin Square could be repeated using: • using the same drivers and cars in each replicate; • using the same drivers but new cars (or the same cars but new drivers); or • using new cars and drivers. • In general, one can have as many (r) squares as one likes. • However, will only present layouts for 2 squares. • General expressions for randomizing the various cases are given in Appendix B, Randomized layouts and sample size computations in R. Statistical Modelling Chapter V

Case 1 — same Drivers and Cars • This case involves a complete repetition of the experiment, say on consecutive mornings, with the same 4 Drivers and 4 Cars on the two occasions. • There is no re-randomization of the square for the second occasion — preserves crossed relationships between Occasions and other factors. • Layout (r=2) Statistical Modelling Chapter V

Case 2 — same cars different drivers • Experiment repeated on a different occasion with • same 4 cars on both occasions, • but with different drivers on second occasion. • As a result the rows of the square, but not the columns, are rerandomized on the second occasion. • Layout (r=2) • Note order in which additives are tested by second driver on occasion 1 is same as for fourth driver on occasion 2. • That is, the second row of the square on occasion 1 is the same as the fourth row on occasion 2. Statistical Modelling Chapter V

Case 3 — different drivers and cars • In this case, • not only are the drivers on different occasions unconnected, • but so are the cars as the cars used on the second occasion are completely different to those used on the first occasion. • As a result the rows and columns of the square are rerandomized on the second occasion. • Layout (r=2) Statistical Modelling Chapter V

V.G Hypothesis tests for sets of Latin squares • In previous section discussed the use of several squares to overcome the residual df problem. • e.g. 4  4 Latin square has 6 (< 10) Residual df • Gave 3 cases for Example V.2, Pollution effects of petrol additives: • using the same drivers and cars in each replicate; • using the same drivers but new cars (or the same cars but new drivers); or • using new cars and drivers. • Shall determine ANOVA for each of these cases. • In determining the E[MSq]s will be assumed that • unrandomized factors are to be classified as random factors • randomized factors as fixed factors. • While layouts were for 2 squares will give DF for the general case of r squares. Statistical Modelling Chapter V

a) Case 1 — same Drivers and Cars • no re-randomization of the square for the second occasion • Layout (r=2) Statistical Modelling Chapter V

A. Description of pertinent features of the study • Observational unit • a car with a driver on an occasion • Response variable • Reduction • Unrandomized factors • Occasions, Drivers, Cars • Randomized factors • Additives • Type of study • Sets of Latin Squares Statistical Modelling Chapter V

B. The experimental structure • For this structure to be appropriate requires that the same square without re-randomization be used for each occasions; otherwise, some factors would be nested (as would be randomizing within Occasions). C. Sources derived from the structure formulae Occasions*Drivers*Cars = (Occasions + Drivers + Occasions#Drivers)*Cars = Occasions + Drivers + Occasions#Drivers + Cars + Occasions#Cars + Drivers#Cars + Occasions#Drivers#Cars Additives = Additives Statistical Modelling Chapter V

Latin Squares: Design, Analysis, and Estimation

Latin Squares: Design, Analysis, and Estimation

Presentation Transcript

Case study 3: orthogonal Latin squares

Randomized Blocks, Latin Squares, and Related Designs

Latin Square Designs ( § 15.4)

Latin and Graeco -Latin Squares

Latin Squares (Kirk, chapter 8)

2.4 Estimation by least squares

Latin Squares

Linear Least Squares and its applications in distance matrix methods

Least Squares SVM Ensemble via Diversity(Relationship) Learning

Minimax Estimators Dominating the Least-Squares Estimator

Three or More Factors: Latin Squares

V. Latin squares designs (LS)

Latin Square Designs

Other Analytic Designs

Linear Least Squares and its applications in distance matrix methods

Latin Square Designs