Chris Brien 1 , Bronwyn Harch 2 , Ray Correll 2 & Rosemary Bailey 3

Principles in the design of multiphase experiments with a later laboratory phase: orthogonal designs • Chris Brien1, Bronwyn Harch2, Ray Correll2 & Rosemary Bailey3 • 1University of South Australia, 2CSIRO Mathematics, Informatics & Statistics, 3Queen Mary University of London http://chris.brien.name/multitier Chris.brien@unisa.edu.au

Outline Primary experimental design principles. Factor-allocation description for standard designs. Single set description. Principles for simple multiphase experiments. Principles leading to complications, even with orthogonality. Summary.

1) Primary experimental design principles • Principle 1 (Evaluate designs with skeleton ANOVA tables) • Use whether or not data to be analyzed by ANOVA. • Principle 2 (Fundamentals): Use randomization, replication and blocking (local control). • Principle 3 (Minimize variance): Block entities to form new entities, within new entities being more homogeneous; assign treatments to least variable entity-type. • Principle 4 (Split units): confound some treatment sources with more variable sources if some treatment factors: • require larger units than others, • are expected to have a larger effect, or • are of less interest than others.

A standard athlete training example Peeling et al. (2009) • 9 training conditions — combinations of 3 surfaces and 3 intensities of training — to be investigated. • Assume the prime interest is in surface differences • intensities are only included to observe the surfaces over a range of intensities. • Testing is to be conducted over 4 Months: • In each month, 3 endurance athletes are to be recruited. • Each athlete will undergo 3 tests, separated by 7 days, under 3 different training conditions. • On completion of each test, the heart rate of the athlete will be measured. • Randomize 3 intensities to 3 athletes in a month and 3 surfaces to 3 tests in an athlete. • A split-unit design, employing Principles 2, 3 and 4(iii).

2) Factor-allocation description for standard designs (Nelder, 1965; Brien, 1983; Brien & Bailey, 2006) • Standard designs involve a single allocation in which a set of treatments is assigned to a set of units: • treatments are whatever are allocated; • units are what treatments are allocated to; • treatments and units each referred to as a set of objects; • Often do by randomization using a permutation of the units. • More generally treatments are allocated to units e.g. using a spatial design or systematically • Each set of objects is indexed by a set of factors: • Unit or unallocated factors (indexing units); • Treatment or allocated factors (indexing treatments). • Represent the allocation using factor-allocation diagrams that have apanel for each set of objects with: • a list of the factors; their numbers of levels; their nesting relationships.

4 Months 3 Athletesin M 3 Testsin M, A 3 Intensities 3 Surfaces 9 training conditions 36 tests Factor-allocation diagram for the standard athlete training experiment • One allocation (randomization): • a set of training conditions to a set of tests. • The set of factors belonging to a set of objects forms a tier: • they have the same status in the allocation (randomization): • {Intensities, Surfaces} or {Months, Athletes, Tests} • Textbook experiments are two-tiered. • A crucial feature is that diagram automatically shows EU and restrictions on randomization/allocation.

4 Months 3 Athletesin M 3 Testsin M, A 3 Intensities 3 Surfaces 9 training conditions 36 tests Some derived items • Sets of generalized factors (terms in the mixed model): • Months, MonthsAthletes, MonthsAthletesTests; • Intensities, Surfaces, IntensitiesSurfaces. • Corresponding types of entities (groupings of objects): • month, athlete, test (last two are Eus); • intensity, surface, training condition (intensity-surface combination). • Corresponding sources (in an ANOVA): • Months, Athletes[M], Tests[MA]; • Intensities, Surfaces, Intensities#Surfaces.

4 Months 3 Athletesin M 3 Testsin M, A 3 Intensities 3 Surfaces 9 training conditions 36 tests Skeleton ANOVA Intensities is confounded with the more-variable Athletes[M] & Surfaces with Tests[M^A].

Mixed model Brien & Demétrio (2009). • This is an ANOVA model, equivalent to the randomization model, andis also written: • Y = XIqI + XSqS + XISqIS + ZMuM+ ZMAuMA+ e. • To fit in SAS must set the DDFM option of MODEL to KENWARDROGER. • Take generalized factors derived from factor-allocation diagram and assign to either fixed or random model: Intensities + Surfaces + IntensitiesSurfaces | Months + MonthsAthletes + MonthsAthletesTests • Corresponds to the mixed model: Y = XIqI + XSqS + XISqIS + ZMuM+ ZMAuMA+ ZMATuMAT. where the Xs and Zs are indicator variable matrices for the generalized factors in its subscript, and qs and us are fixed and random parameters, respectively, with

3) Single-set description e.g. Searle, Casella & McCulloch (1992); Littel et al. (2006). • Single set of factors that uniquely indexes observations: • {Months, Intensities, Surfaces} (Athletes and Tests omitted). • What are the EUs in the single-set approach? • A set of units that are indexed by Months-Intensities combinations and another set by the Months-Intensities-Surfaces combinations. • Of course, Months-Intensities(-Surfaces) are not actual EUs, as Intensities (Surfaces) are not randomized to those combinations. • They act as a proxy for the unnamed units. • Mixed model is: I + S + IS | M + MI + MIS. • Previous model: I + S + IS | M + MA + MAT. • Former more economical as A and T not needed. • In SAS, default DDFM option of MODEL (CONTAIN)works. • However, MA and MI are different sources of variability: • inherent variability vs block-treatment interaction. • This "trick" is confusing, false economy and not always possible.

Single-set description ANOVA • Confounding not exhibited, and need E[MSq] to see that M#I is denominator for Intensities. • Single set of factors that uniquely indexes observations: • {Months, Intensities, Surfaces} • Use factors to derive skeleton ANOVA

Summary of factor-allocation versus single-set description • Factor-allocation description is based on the tiers and so is multi-set. • It has a specific factor for the EUs and so their identity not obscured. • Single-set description factors are a subset of those identified in the factor-allocation description and so more economical. • Skeleton ANOVA from factor-allocation description shows: • The confounding resulting from the allocation; • The origin of the sources of variation more accurately (Athletes[Months] versus Months#Intensities).

4) Principles for simple multiphase experiments • Suppose in the athlete training experiment: • in addition to heart rate taken immediately upon completion of a test, • the free haemoglobin is to be measured using blood specimens taken from the athletes after each test, and • the specimens are transported to the laboratory for analysis. • The experiment is two phase: testing and laboratory phases. • The outcome of the testing phase is heart rate and a blood specimen. • The outcome of the laboratory phase is the free haemoglobin. • How to process the specimens from the first phase in the laboratory phase?

Some principles • Principle 5 (Simplicity desirable): assign first-phase units to laboratory units so that each first-phase source is confounded with a single laboratory source. • Use composed randomizations with an orthogonal design. • Principle 6 (Preplan all): if possible. • Principle 7 (Allocate all and randomize in laboratory): always allocate all treatment and unit factors and randomize first-phase units and lab treatments. • Principle 8 (Big with big): Confound big first-phase sources with big laboratory sources, provided no confounding of treatment with first-phase sources.

4 Months 3 Athletesin M 3 Testsin M, A 36 Locations 3 Intensities 3 Surfaces 36 locations 9 training conditions 36 tests A simple two-phase athlete training experiment Composed randomizations Simplest is to randomize specimens from a test to locations (in time or space) during the laboratory phase.

A simple two-phase athlete training experiment (cont’d) No. tests = no. locations = 36 and so tests sources exhaust the locations source. Cannot separately estimate locations and tests variability, but can estimate their sum. But do not want to hold blood specimens for 4 months.

4 Months 3 Athletesin M 3 Testsin M, A 4 Batches 9 Locations in B 3 Intensities 3 Surfaces 9 training conditions 36 tests 36 locations A simple two-phase athlete training experiment (cont’d) Composed randomizations Simplest is to align lab-phase and first-phase blocking. • Note Months confounded with Batches (i.e. Big with Big).

4 Months 3 Athletesin M 3 Testsin M, A 4 Batches 9 Locations in B 3 Intensities 3 Surfaces 9 training conditions 36 tests 36 locations A simple two-phase athlete training experiment – mixed model • Form generalized factors and assign to fixed or random: • I + S + IS | M + MA + MAT + B + BL. • ANOVA shows us i) there will be aliasing and ii) model without lab terms will fit and be sufficient – a “model of convenience”.

The multiphase law • DF for first phase sources unaffected. DF for sources from a previous phase can never be increased as a result of the laboratory-phase design. However, it is possible that first-phase sources are split into two or more sources, each with fewer degrees of freedom than the original source.

Factor-allocation in multiphase experiments • While multiphase experiments will often involve multiple allocations, not always: • A two-phase experiment will not if the first phase involves a survey i.e. no allocation e.g. tissues sampled from animals of different sexes. • A two-phase experiment may include more than two allocations: e.g. a grazing trial in the first phase that involves two composed randomizations. • Factor-allocation description is particularly helpful in understanding multiphase experiments with multiple allocations.

5) Principles leading to complications, even with orthogonality • Principle 9 (Use pseudofactors): • An elegant way to split sources (as opposed to introducing grouping factors unconnected to real sources of variability). • Principle 10 (Compensating across phases): • Sometimes, if something is confounded with more variable first-phase source, can confound with less variable lab source. • Principle 11 (Laboratory replication): • Replicate laboratory analysis of first-phase units if lab variability much greater than 1st-phase variation; • Often involves splitting product from the first phase into portions (e.g. batches of harvested crop, wines, blood specimens into aliquots, drops, lots, samples and fractions). • Principle 12 (Laboratory treatments): • Sometimes treatments are introduced in the laboratory phase and this involves extra randomization.

4 Months 2 Fractions in M, A, T 3 Athletesin M 3 Testsin M, A 4 Batches 2 Rounds in B 9 Locations in B, R 3 Intensities 3 Surfaces 9 training conditions 72 locations 72 fractions A replicated two-phase athlete training experiment 2 F1 in M • Problem: 18 fractions in a month to assign to 2 rounds in a batch: • Use F1 to group the 9 fractions to be analyzed in the same round. • An alternative is to introduce the grouping factor FGroups, • but, while in the analysis, not an anticipated variability source. • Suppose duplicates of free haemoglobin to be done: • 2 fractions taken from each specimen; • One fraction taken from 9 specimens for the month and analyzed; • Then, the other fraction from 9 specimens analyzed.

4 Months 2 Fractions in M, A, T 3 Athletesin M 3 Testsin M, A 4 Batches 2 Rounds in B 9 Locations in B, R 3 Intensities 3 Surfaces 9 training conditions 72 locations 72 fractions A replicated two-phase athlete training experiment (cont’d) 2 F1 in M Note split source

A replicated two-phase athlete training experiment (cont’d) • Last line measures laboratory variation. • Again, no. fractions = no. locations = 72 and so fractions sources exhaust locations sources. • Consequently, not all terms could be included in a mixed model. • Pseudofactors not needed in mixed model (cf. grouping factors).

A replicated two-phase athlete training experiment – mixed model • Form generalized factors and assign to fixed or random: • I + S + IS | M + MA + MAT + MATF + B + BR + BRL. • Removing aliased terms gives one “model of convenience”: • I + S + IS | MA + MAT + B + BR + BRL (M, MATF omitted). • Another model would result from removing B and BRL (need BR).

4 Months 2 Fractions in M, A, T 3 Athletesin M 3 Testsin M, A 4 Batches 2 Rounds in B 9 Locations in B, R 3 Intensities 3 Surfaces 9 training conditions 72 locations 72 fractions Single-set description for example 2 F1 in M • How do: • the locations sources; • Fractions; • affect the response? • Single set of factors that uniquely indexes observations: • {4 Months x 2 Rounds x 3 Intensities x 3 Surfaces}. • Similar to model of convenience

6)Summary • Factor-allocation, rather than single-set, description used, being more informative and particularly helpful in multiphase experiments. • Multiphase experiments usually have multiple allocations. • Have provided 4 standard principles and 8 principles specific to orthogonal, multiphase designs. • In practice, will be important to have some idea of likely sources of laboratory variation. • Are laboratory treatments to be incorporated? • Will laboratory replicates be necessary?

References • Brien, C. J. (1983). Analysis of variance tables based on experimental structure. Biometrics,39, 53-59. • Brien, C.J., and Bailey, R.A. (2006) Multiple randomizations (with discussion). J. Roy. Statist. Soc., Ser. B, 68, 571–609. • Brien, C.J. and Demétrio, C.G.B. (2009) Formulating mixed models for experiments, including longitudinal experiments. J. Agr. Biol. Env. Stat., 14, 253-80. • Brien, C.J., Harch, B.D., Correll, R.L. and Bailey, R.A. (2010) Multiphase experiments with laboratory phases subsequent to the initial phase. I. Orthogonal designs. Journal of Agricultural, Biological and Environmental Statistics, submitted. • Littell, R. C., G. A. Milliken, et al. (2006). SAS for Mixed Models. Cary, N.C., SAS Press. • Nelder, J. A. (1965). The analysis of randomized experiments with orthogonal block structure. Proceedings of the Royal Society of London, Series A, 283(1393), 147-162, 163-178. • Peeling, P., B. Dawson, et al. (2009). Training Surface and Intensity: Inflammation, Hemolysis, and Hepcidin Expression. Medicine & Science in Sports & Exercise,41, 1138-1145. • Searle, S. R., G. Casella, et al. (1992). Variance components. New York, Wiley.

Web address for link to Multitiered experiments site http://chris.brien.name/multitier

Chris Brien 1 , Bronwyn Harch 2 , Ray Correll 2 & Rosemary Bailey 3