270 likes | 470 Views
Design and analysis of experiments with a laboratory phase subsequent to an initial phase. Chris Brien University of South Australia Bronwyn Harch Ray Correll CSIRO Mathematical and Information Sciences. Chris.brien@unisa.edu.au. Outline. Designing two-phase experiments
E N D
Design and analysis of experiments with a laboratory phase subsequent to an initial phase Chris Brien University of South Australia Bronwyn Harch Ray Correll CSIRO Mathematical and Information Sciences Chris.brien@unisa.edu.au
Outline • Designing two-phase experiments • A biodiversity example • When first-phase factors do not divide lab factors • Trend adjustment in the biodiversity example • Taking trend into account in design • Duplicates
Notation Factor relationships A*B factors A and B are crossed A/B factor B is nested within A Generalized factor AB is the ab-level factor formed from the combinations of A with a levels and B with b levels Symbolic mixed model Fixed terms : random terms (A*B : Blocks/Runs) A*B = A + B + AB A/B/C = A + AB + ABC Sources in ANOVA table A#B a source for the interaction of A and B B[A] a source for the effects of B nested within A
1. Designing two-phase experiments • Two-phase experiments as introduced by McIntyre (1955): • Consider special case of second phase a laboratory phase
General considerations • Need to randomize laboratory phase so involve two randomizations: • 1st-phase treatments to 1st-phase, unrandomized factors • latter to unrandomized, laboratory factors • Often have a sequence of analyses to be performed and how should one group these over time. • Fundamental difference between 1st and 2nd randomizations • 1st has randomized factors crossed and nested • 2nd has two sets of factors and all combinations of the two sets are not observable; within sets are crossed or nested tendency to ignore 1st phase, unrandomized factors. • Categories of designs • Lab phase factors purely hierarchical or involve crossed rows and columns; • Two-phase randomizations are composed or randomized-inclusive(Brien & Bailey, 2006); related to whether 1st-phase, unrandomized factors divide laboratory unrandomized factors • Treatments added in laboratory phase or not • Lab duplicates included or not
1a) A Biodiversity example (Harch et al., 1997) • Effect of tillage treatments on bacterial and fungal diversity • Two-phase experiment: field and laboratory phase Field phase: • 2 tillage treatments assigned to plots using RCBD with 4 blocks • 2 soil samples taken at each of 2 depths • 2 4 2 2 = 32 samples
Laboratory phase: • Then analysed soil samples in the lab using Gas Chromatography - Fatty Acid Methyl Ester (GC-FAME) analysis • 2 preprocessing methods randomized to 2 samples in each PlotDepth • All samples analysed twice — necessary? • once on days 1 & 2; again on day 3 • In each Int2, 16 samples analyzed
Processing order within Int1Int2 • Logical as similar to order obtained from field • But confounding with systematic laboratory effects: • Preprocessing method effects • Depth effects • Depths assigned to lowest level ─ sensible?
2 Int1 2 Int3 in I1, I2 2 Int2 in I1 2 Int4 in I1, I2, I3 2 Int5 in I1, I2, I3, I4 2 Int6 in I1, I2, I3, I4, I5 2 lab treats 2 Methods 2 B1 2 B2 in B1 2 Tillage 2 field treats 64 analyses Towards an analysis 2 Samples in B, P, D 4 Blocks 2 Plots in B 2 Depths • Dashed arrows indicate systematic assignment 32 samples • 64 analyses divided up hierarchically by 6 x 2-level factors Int1…Int6 of size 32, …, 2 analyses, respectively.
Analysis of example for lab variability • Variability for: • Int4 >Int5 > Int6 • 8 > 4 >2 Analyses • Int1, Int2, Int3 small (< Int4)
Alternative blocking for the biodiversity example • Want to assign the 32 samples to 64 analyses • Consider with the experimenter: • Uninteresting effects — Blocks • Large effects — Depth? • Some treatments best changed infrequently — Methods? • Period over which analyses effectively homogeneous — 16 analyses? 4 analyses? 2 analyses?
Alternative blocking for the biodiversity example • For now, divide 64 analyses into 2 Occasions = Int1, 4 Times = Int2Int3, 8 Analyses = Int4Int5Int6 • Blocks of 8 would be best as 2 Plots x 2 Depths x 2 Methods, but Blocks = Times too variable. • Best if pairs of analyses in a block. • Also Times are similar could take 4 Times x 2 Analyses. • Many other possibilities: e.g. blocks of size 4 with Depths randomized to pairs of blocks.
4 Blocks 2 Depths 2 Plots in B 2 Samples in B, P, D 2 field treats 2 Occasions 4 Intervals in O 8 Analyses in O, I 2 Tillage 2 Methods 2 lab treats 32 samples 64 analyses Proposed laboratory design • Organise 64 analyses into blocks of 8: • Randomization of field units ignores treats • Two composed randomizations (Brien and Bailey, 2006) • Field treats to samples to analyses • Two independent randomizations (Brien and Bailey, 2006) • Field and lab treats to samples • Experiment with • hierarchical lab phase, composed randomizations, duplicates and treatments added at laboratory phase.
Decomposition table for proposed design • Important for design: shows confounding and apportionment of variability Each of the 15 lines is a separate subspace in the final decomp-osition Note Residual df determined by field phase • Randomization-based mixed model (Brien & Bailey, 2006): • Till*Meth*Dep : ((Blk/Plot)*Dep)/Sample – Dep + Occ/Int/Anl • Or Till*Meth*Dep : ((Blk/Till)*Dep)/Meth – Dep + Occ/Int/Anl
1b) When first-phase factors do not divide lab factors • Need to use a nonorthogonal design and two randomized-inclusive randomizations (Brien and Bailey, 2006) Willow experiment (Peacock et al, 2003) • Beetle damage inhibiting rust on willows? • Glasshouse and lab phases • Example here same problem but different details • Will be an experiment with • hierarchical lab phase, randomized-inclusive randomizations, no duplicates and no treatments added at laboratory phase
5 Reps 6 Locations in R, B 2 Benches in R 12 Damages 12 treatments 60 locations Willow experiment (cont’d) • Glasshouse: 60 locations each with a plant • 12 damages to assign to locations. • Only 6 locations per bench: • Damages does not divide no. locations or benches so IBD • Use RIBD with v = 12, k = 6, E = 0.893, bound = 0.898. • Randomize between Reps, Benches within Reps and Locations within Benches.
5 Reps 6 Locations in R, B 2 Benches in R 5 Occasions 3 Cells in O, P 4 Plates in O 3 L2 12 Damages 2 L1 60 cells 12 treatments 60 locations Willow experiment(cont’d) • Lab phase: disk/plant put onto 20 plates, 3 disks /plate • Plates divided into 5 groups for processing on an Occasion • Locations does not divide Cells • divide 6 Locations into 2 sets of 3: cannot do this ignoring Damages • RIBD related to 1st-phase (v = 12, k =3, r = 5, E = 0.698, bound = 0.721) • In fact got this design using CycDesgN (Whittaker et al, 2002) and combined pairs of blocks to get 1st-phase. • To include Locations, read numbers as Locations with these Damages. • Renumber Locations to L1 and L2 to identify those assigned same Plate. • Sometimes better design if allow for lab phase in designing 1st
Decomposition table for proposed design • Each of the 6 lines is a separate subspace in the final decomposition. • Note Residual df for Locations from 1st phase is 39 and has been reduced to 29 in lab phase. • xs are strata variances or portions of E[MSq] from cells and s from locations. • Four estimable variance functions: xO + hR, xOP + hRB, xOP + hRBL, xOPC + hRBL, although 2nd may be difficult. • Randomization-based mixed model (Brien & Bailey, 2006) that corresponds to estimable quantities: • Damages : Rep/Benches/L1 + OccasionsPlatesCells. • Must have Locations in the form of L1 in this model ─ i.e. cannot ignore unrandomized factors from 1st phase.
2. Trend in the biodiversity example • Trend can be a problem in laboratory phase. Is it here? • Plot of Lab-only residuals in run order for 8 Analyses within Times • Linear trend that varies evident • Proposed design (4 x 2) is appropriate ( trend & low Times variability) smallest Analysis Residual
Trend adjustment for example • REML analysis with vector of 1…8 for each Occasion • Significant different linear trends (p < 0.001) • Effect on fixed effects • Trend adjustment reduced • Tillage effect from -0.99 to -0.07 • Plot[Block] component from 13.25 to 0.001. • Low Plot[Block] df makes this dubious.
4. Taking trend into account in design • Cox (1958, section 14.2) discusses trend elimination: • concludes that, where the estimation of trend not required, use of blocking preferred to trend adjustment; • Yeh, Bradley and Notz (1985) combine blocking for trend and adjustment & provide trend-free and nearly trend-free designs with blocks • allow for common quadratic trends within blocks • minimize the effects of adjustment • Look at design of laboratory phase • for field phase with RCBD, b = 3, v = 18 • 3 Occasions in lab phase to which 3 Blocks randomized • allow for different linear & cubic Trends within each Occasion
Different designs for blocks of 18 analyses • RCBD for this no. treats relatively efficient when adjusting for trend • Blocks assigned to 3 Occasions × 6 Analyses (blocking perpendicular to trend?) • Use when Occasions variability low e.g. recalibration • Nearly Trend-Free (using Yeh, Bradley and Notz , 1985) worse than RCBD for different trends: • optimal for common linear trend. • Still to investigate designs that protect against different trends.
Comparing RCBD with RIBDs for k = 6,9 • Use Relative Efficiencies • = av. pairwise variance of RCBD to RIBD for sets of generated data • Generate using random model: • Y = Occasion + Interval[Occasion] + Analyses[OccasionsInterval] + Plots[Blocks] • Expect efficiency • k = 6 > k = 9 • RIBD > RCBD • provided • gBP not dominant and • gOI is non-zero. • How much? • gBP < 10 • gOI≥ 0.5 (very little extra required, but after trend adjustment)
Resolvable design with cols & latinized rows [using CycDesgN (Whittaker et al, 2002), Intra E = 0.49] • Expect LRCD > RIBD if gIA≠ 0 and gBP not dominant; • How much? • If gIA > 1 irrespective of gOI. Again only small gs. • Expect LRCD > RCD if Occasions different. • REs as gBP • (LRCD/RCD < 2 if gBP≥ 2.5).
4. Duplicates • Commonly used, but only need in two-phase experiments if Lab variation large compared to field. • Possibilities: • Separated: analyze all & then reanalyze all in different random order • Nested: some analyzed & then these reanalyzed in a different random order • Crossed: some analyzed & then these reanalyzed in same order • Consecutive: duplicate immediately follows first analysis • Randomized: some analysed & everything randomized • From ANOVAs and REs to randomized, when adjusting for different cubic trends, conclude • Separated duplicates superior, with nested duplicates 2nd best; little gain in efficiency if gOI≤ 0.5 and gBP is considerable; • Crossed and consecutive duplicates perform poorly with RE < 1 often
5. Summary for lab phase design • Two-phase: initial expt & lab phase • Leads to 2 randomizations: composed or r-inclusive related to whether 1st –phase, unrandomized factors divide laboratory, unrandomized factors • Use of pseudofactors with r-inclusive does not ignore field terms and makes explicit what has occurred • Adding treatments in lab phase leads to more randomizations • Cannot improve on field design but can make worse • Important to have some idea of likely laboratory variation: • Will there be recalibration or the like? • Are consistent differences between and/or across Occasions likely? • How does the magnitude of the field and laboratory variation compare? • Are trends probable: common vs different; linear vs cubic? • Will laboratory duplicates be necessary and how will they be arranged? • If yes, separated duplicates best but other arrangements may be OK. • RCBD will suffice if • field variation >> lab variation, in which case duplicates unnecessary. • after adjustment for trend, no extra laboratory variation, except Occasions • can block across occasions when no Occasion differences • If Intervals differences, RIBD better than RCBD ─ not much needed. • LRCD better than RIBD provided, after trend adjustment, moderate consistent differences between Analyses across Occasions.
References • Brien, C.J., and Bailey, R.A. (2006) Multiple randomizations (with discussion). J. Roy. Statist. Soc., Ser. B, 68, 571–609. • Cox, D.R. (1958) Planning of Experiments. New York, Wiley. • John, J.A. and Williams, E.R. (1995) Cyclic and Computer Generated Designs. Chapman & Hall, London. • Harch, B.E., Correll, R.L., Meech, W., Kirkby, C.A. and Pankhurst, C.E. (1997) Using the Gini coefficient with BIOLOG substrate utilisation data to provide an alternative quantitative measure for comparing bacterial soil communities. Journal of Microbial Methods,30, 91–101. • McIntyre, G. (1955) Design and analysis of two phase experiments. Biometrics, 11, p.324–34. • Peacock, L., Hunter, P., Yap, M. and Arnold, G. (2003) Indirect interactions between rust (Melampsora epitea) and leaf beetle (Phratora vulgatissima) damage on Salix. Phytoparasitica, 31, 226–35. • Whitaker, D., Williams, E.R. and John, J.A. (2002) CycDesigN: A Package for the Computer Generation of Experimental Designs. (Version 2.0) CSIRO, Canberra, Australia. http://www.ffp.csiro.au/software • Yeh, C.-M., Bradely, R.A. and Notz, W.I. (1985) Nearly Trend-Free Block Designs. J. Amer. Statist. Assoc., 392, 985–92.