540 likes | 753 Views
A tale of randomization: randomization versus mixed model analysis for single and chain randomizations. Chris Brien Phenomics & Bioinformatics Research Centre, University of South Australia. The Australian Centre for Plant Functional Genomics, University of Adelaide.
E N D
A tale of randomization: randomization versus mixed model analysis for single and chain randomizations Chris Brien Phenomics & Bioinformatics Research Centre, University of South Australia. The Australian Centre for Plant Functional Genomics, University of Adelaide. This work was supported by the Australian Research Council.
A tale of randomization: outline • Era umavezquandoeu era jovem… • Randomization analysis for a single randomization. • Examples for a single randomization. • A BIBD simulation study. • Randomization model for a chain of randomizations. • Randomization analysis for a chain of randomizations. • Some issues. • Conclusions.
1. Era umavez • In the 70s, I believed randomization inference to be the correct method of analysing experiments.
Purism • Kempthorne (1975): • These books demonstrate that p-value from randomization analysis is approximated by p-value from analyses assuming normality for CRDs & RCBDs; • Welch (1937) & Atiqullah(1963) show that true, provided the observed data actually conforms to the variance for the assumed normal model (e.g. homogeneity between blocks).
Sex created difficulties … as did time • Preece (1982, section 6.2): Is Sex a block or a treatment factor? • Semantic problem: what is a block factor? • Often Sex is unrandomized, but is of interest – I believe this to be the root of the dilemma. • If it is unrandomized, it cannot be tested using a randomization test (at all?). • In longitudinal studies, Time is similar. Sites also. • What about incomplete block designs with recombination of information? Missing values? • Seems that not all inference is possible with randomization analysis.
Fisher (1935, Section 21) first proposed randomization tests: • It seems clear that Fisher intended randomization tests to be only a check on normal theory tests.
Fisher (1960, 7thedition) added Section 21.1 that includes: • Less intelligible test nonparametric test. • He is emphasizing that one should model using subject-matter knowledge. • Others (e.g. Kempthorne) have interpreted this differently.
Conversion • I became a modeller, • BUT, I did not completely reject randomization inference. • I have advocated randomization-based mixed models: • a mixed model that starts with the terms that would be in a randomization model (Brien & Bailey, 2006; Brien & Demétrio, 2009). • This allowed me to: • test for block effects and block-treatment interactions; • model longitudinal data. • I comforted myself that when testing a model that has an equivalent randomization test, the former is an approximation to the latter and so robust.
More recently …. • Cox, Hinkelmann and Gilmour pointed out, in the discussion of Brien and Bailey (2006), • no one had so far indicated how a model for a multitiered experiment might be justified by the randomizations employed. • Rosemary Bailey and I have been working for some time on the analysis of experiments with multiple randomizations, • using randomization-based (mixed) models; • Bailey and Brien (2013) details estimation & testing. • I decided to investigate randomization inference for such experiments, • but first single randomizations.
2. Randomization analysis: what is it? • A randomization model is formulated. • It specifies the distribution of the response over all randomized layouts possible for the design. • Estimation and hypothesis testing based on this distribution. • Will focus on hypothesis testing. • A test statistic is identified. • The value of the test statistic is computed from the data for: • all possible randomized layouts, or a random sample (with replacement) of them randomization distribution of the test statistic, or an estimate; • the randomized layout used in the experiment: the observed test statistic. • The p-value is the proportion of all possible values that are as, or more, extreme than the observed test statistic. • Different to a permutation test in that it is based on the randomization employed in the experiment.
Randomization model for a single randomization • Additive model of constants: y=w+ Xht • where y is the m-vector of observed responses for each unit ; • w is the m-vector of constants representing the contributions of each unit to the response; and • t is a t-vector of treatment constants; • Xhis mt design matrix giving the assignment of treatments to units. • Under randomization, i.e. over all allowable unit permutations applied to w, each element of w becomes a random variable, as does each element of y. • Let W and Y be the m-vectors of random variables and so we have Y=W + Xht. • The set of Y forms the multivariate randomization distribution, our randomization model. • Now, we assume ER[W] =0 and so ER[Y] =Xht.
Randomization model (cont’d) • Further, • H is the set of s1 generalized factors (terms) derived from a poset of factors on the units; • zH is the covariance between variables with the same levels of generalized factor H; • yH is the canonical component of excess covariance for H; • hHis the spectral component (eigenvalue) of VR for H and is its contribution to E[MSq]; • BH, SH, and QH are known mm matrices. • This model has the same terms as a randomization-based mixed model (Brien & Bailey, 2006; Brien & Demétrio, 2009) • However, the distributions differ.
Randomization by permutation of units & unit factors • Permutations for an RCBD with b= 2, k=v= 2. • The allowable permutations are: • those that permute the blocks as a whole, and • those that permute the units within a block; • there are b!(k!)b=2!(2!)2=8. • Equivalent to Treatments randomization 1, 2, 2, 1.
Null randomization distribution: RCBD • Under the assumption of no treatment effects, Y*=W + m*1. • In which case, the randomization distribution of Y*is termed the null randomization distribution • Actual distribution obtained by applying each unit permutation to y: Y*ijfor Unit jin Block i. • Note that the null ANOVA is the same for all permutations. • Can show that 1st & 2nd order parameters of the distribution, m*, z*G, z*B and z*BU, are equal to sample statistics. • For example, for all Y*ij: • The distribution of gives the distribution of W.
VR for the RCBD example • The matrices in the expressions for are known. where
Randomization estimation & testing for a single randomization • Propose to use I-MINQUE to estimate the ys and use these estimates to estimate t via EGLS. • I-MINQUE yields the same estimates as REML, but without the need to assume a distributional form for the response.
Test statistics • Have a set R of idempotents specifying a treatment decomposition. • For single treatments factor, only RT = MT – MG for treatment effects. • For an R R, to test H0: RXht=0, use a Wald F, a Wald test statistic divided by its numerator df: • Numerator is a quadratic form: (est)’ (var(est))-1 (est). • For an orthogonal design, FWald is the same as the F from an ANOVA. Otherwise, it is a combined F test statistic. • For nonorthogonal designs, an alternative test statistic is an intrablockF-statistic. • For a single randomization, let QH be the matrix that projects on the eigenspace of V that corresponds to the intrablock source. • Then • The intrablock
Randomization distribution of the test statistic • To obtain it: • Apply, to the unit factors and y, but not the treatment factors, all allowable unit permutations for the design employed: effects a rerandomization of the treatments; • Compute the test statistic for each allowable permutation; • This set of values is the required distribution. • Number of allowable permutations. • For our RCBD, there are 8 permutations and so computing the 8 test statistics is easy. • For b= 10 and k= 3, there are 1.4 x 1035— not so easy. • An alternative is random data permutation (Edgington, 1995): take a Monte Carlo sample of the permutations.
Null distribution of the test statistic under normality • Under normality of the response, the null distribution of FWald is: • for orthogonal designs, an exact F-distribution; • for nonorthogonal designs, an F-distribution asymptotically. • Under normality of the response, the null distribution of an intrablockF-statistic is an exact F-distribution.
3. Examples for a single randomization • Wheat experiment in a BIBD (Joshi, 1987) • Rabbit experiment using the same BIBD (Hinkelmann & Kempthorne, 2008). • Casuarina experiment in a latinized row-column design (Williams et al., 2002).
Wheat experiment in a BIBD (Joshi, 1987) • Six varieties of wheat are assigned to plots arranged in 10 blocks of 3 plots. • The intrablock efficiency factor is 0.80. • The ANOVA with the intrablockF and p: • FWald= 3.05 with p= 0.035 (n1= 5, n2= 19.1). • Estimates: yB= 14.60 (p= 0.403); yBP= 58.28.
Test statistic distributions • 50,000 randomly selected permutations of blocks and plots within blocks selected. IntrablockF-statistic Combined F-statistic • Peak on RHS is all values 10.
Combined F-statistic 50,000 samples from Parametric bootstrap • Part of the discrepancy between F- and the randomization distributions is that combined F-statistic is only asymptotically distributed as an F. • Differs from Kenward & Rogers (1997) & Schaalje et al (2002) for nonorthogonal designs. Randomization distribution
Two other examples • Rabbit experiment using the same BIBD (Hinkelmann & Kempthorne, 2008). • 6 Diets assigned to 10 Litters, each with 3 Rabbits. • Estimates: yL= 21.70 (p= 0.002), yLR= 10.08. • Casuarina experiment in a latinized row-column design (Williams et al., 2002). • 4 Blocks of 60 provenances arranged in 6 rows by 10 columns. • Provenances grouped according to 18 Countries of origin. • 2 Inoculation dates each applied to 2 of the blocks. • Estimates: yC= 0.2710; yB, yBR , yBC < 0.06;yBRC= 0.2711.
ANOVA for Casuarina experiment Provenance represents provenance differences within countries.
Comparison of p-values • For intrablockF, p-values from F and randomization distributions generally agree. • For FWald, p-values from F-distribution generally underestimates that from randomization distribution: • (Rabbit Diets an exception – little interblock contribution).
4. A BIBD simulation study • Evidently, the adequacy of the approximation depends on the size of the Block component. • But on what else does it depend? • Size of the experiment? Residual d.f.? Efficiency? • Took 6 BIBDs that varied in these aspects.`
Skeleton anova for a BIBD • Taking Block and Plots to be random and Treatments to be fixed, gives E[MSq] in the following table.
Data generation • Generated two data sets for each BIBD that differ in their Blocks covariance components (yB). • Wanted data sets with known properties: • Generated a set of normally-distributed, random values, one value for each unit; • The generated values projected into the following 4 subspaces: • Treatments from Block, Blocks Residual, Treatments from Plots[Blocks], and Plots[Blocks]; • The projected effects for each subspace are scaled so that the variance of each equals its E[MSq] and then they are summed. • The scaling ensures that all ANOVA quantities are fixed. • yBP = 1, yB = 0.25 or 2, and sT2 chosen so that intrablockF is 0.05; • However, the I-MINQUE estimates of yB for the same ANOVA values vary for different generated values. • Generated data sets until had one with I-MINQUE estimates close to the target yBvalues.
Analysis of data for b10t6 and yB = 0.25 • Here, the Treats from Plots[Blocks] E[MSq] is: • For the Blocks Residual E[MSq] is: • The I-MINQUE estimates for the chosen data set are:
Properties of the simulated data sets • The treatment effect variance (st2) changed so that p-values are approximately 0.05 — reflects power. • Combined d.f. larger than intrablockd.f. when yB= 0.25, except for b5t5. • This indicates that both inter- and intra-block information used, • Fvalues also larger. • Not the case for yB=2.
p-values from 50,000 randomizations • There is a general tendency for the p-values based on: • the intrablockF to be similar, • the Combined F to be smaller than those for intrablockF, especially for small yB, • the Combined F to differ between Rand and Fdistn , especially for small yB – Fdistn dangerous. • For large yB, can use intrablockF and F-distn, except for a design of: • Medium size with low intrablock efficiency (b15t6): use Combined F and Rand. • For small yB, can use intrablockF and F-distn, except for a design of: • small size (b5t5): use Rand; • a medium design with low intrablock efficiency (b15t6) : use Combined Fand Rand; • large size with large Blocks Residual d.f. (b30t10): use Combined F. S M M L LL S M M L LL
p-values from 50,000 bootstraps • Comparing bootstrap- and randomization-based p-values for one type of F, they are similar, except for small design. • Small design needs further investigation. S M M L LL S M M L LL
Conclusion • Use IntrablockF and F distn, except for • for a design with low intrablockefficiency (b15t6); • when Block variation is small, • for a small design (b5t5), or • a design with large Blocks Residual d.f. (b30t10). • For small design may need randomization analysis. • For other exceptions, can use bootstrap or randomization analysis for combined-F. S M M L LL S M M L LL
2 Squares 3 Rows 4 Columns in Q 2 Halfplots in Q, R, C 6 Judges 2Occasions 3 Intervals in O 4Sittings in O, I 4 Positions in O, I, S, J 4 Trellis 2 Methods 8 treatments 48 halfplots 576 evaluations 5. Randomization model for a chain of randomizations • A chain of two randomizations consists of: • the randomization of treatments to the first set of units; • the randomization of the first set of units, along with treatments, to a second set of units. • For example, a two-phase sensory experiment (Brien & Payne, 1999; Brien & Bailey, 2006, Example 15)involves two randomizations: • Field phase: 8 treatments to 48 halfplots using split-plot with 2 Youden squares for main plots. • Sensory phase: 48 halfplotsrandomized to 576 evaluations, using Latin squares and an extended Youden square. (Q = Squares) • Three sets of objects: treatments (G), halfplots () & evaluations (W).
Randomization model • Additive model of constants: y=z + Xf(w + Xht)=z + Xfw + XfXht • where y is the n-vector of observed responses for each unit w after second phase; • zis the n-vector of constants representing the contributions of each unit in the 2nd randomization (wW) to the response; • w is the m-vector of constants representing the contributions of each unit in the 1st randomization (u) to the response; and • t is a t-vector of treatment constants; • Xf& Xh are nm & mt design matrices showing the randomization assignments. • Under the two randomizations, each element of z and of w become random variables, as does each element of y. Y=Z + XfW + XfXht • where Y, Z and W are the vectors of random variables. • Now, we assume ER[Z] =ER[W] =0and so ER[Y] =XfXht .
Randomization model (cont’d) • Further, • CW& C are the contributions to the variance arising from W and , respectively. • HW & Hare the sets of s2 & s1 generalized factors (terms) derived from the posets of factors on W and ; • are the covariances; • are the canonical components of excess covariance; • are the spectral components (eigenvalues) of CW and C, respectively; • are known nnmatrices.
Forming the null randomization distribution of the response • Under the assumption of no treatment effects, Y*=Z + XfW+ m*1. • There are two randomizations, G to and to W; • to effect G to , and Hare permuted, and • to effect to W, W and HWare permuted. • However, in this model Xf is fixed and reflects the result of the second randomization in the experiment. • Hence, we do not apply the second randomization and consider the null randomization distribution, conditional on the observed randomization of to W. • Apply the permutations of to H, HWand y, to effect a rerandomization of G to . • must also be applied to HW so that it does not effect a rerandomization of to W. • Again the null ANOVA is the same if permutations for just first randomization used, but is not if both randomizations are considered.
6. Randomization analysis for a chain of randomizations • Again, based on the randomization distribution of the response. • Use the same test statistics as for a single randomization: FWald and intrablock F-statistics. • Obtain or estimate the randomization distributions of these test statistics • Based on randomization of G to and is conditional on the observed randomization of toW. • An additional consideration is that it is necessary to constrain spectral components to be nonnegative.
2 Squares 3 Rows 4 Columns in Q 2 Halfplots in Q, R, C 6 Judges 2Occasions 3 Intervals in O 4Sittings in O, I 4 Positions in O, I, S, J 4 Trellis 2 Methods 8 treatments 48 halfplots 576 evaluations A Two-Phase Sensory Experiment (Brien & Bailey, 2006, Example 15) • Involves two randomizations: (Brien & Payne, 1999) (Q = Squares) • The randomization distribution will be based on the randomization of treatments to halfplots and is conditional on the actual randomization of halfplots to evaluations. • Permuting evaluations and y will almost certainly result in unobserved combinations of halfplots and evaluations, so that the randomization model is no longer valid.
ANOVA table for sensory exp't Orthogonalsources Intrablock Trellis
Fit a mixed model • Randomization model: • Trellis * Methods | (Judges * (Occasions / Intervals / Sittings) ) / Positions + (Rows * (Squares / Columns)) / Halfplots • T + M + TM | J + O+ OI + OIS + OJ + OIJ + OISJ + OISJP + R + Q + RQ + QC + RQC + RQCH(Q = Square) • Model of convenience, to achieve a fit • Delete one of O and Q (see decomposition table on previous slide). • Actually dropped both because a small 1 df random term is very difficult to fit.
Checking spectral components • Recall that • It is necessary that each of CW and C are positive semidefinite. • For this, all spectral components, x and , must be nonnegative. • However, fit canonical components, f and y. • Calculate spectral components from canonical components, the relationship between spectral and canonical components being expressions like those for expected mean squares. • If negative, constrain canonical components so that spectral components are zero.
Spectral and canonical components relationships • To constrain RQC to zero, constrain yRQC = -(12/24) yRQCH.
Comparison of p-values • The constrained analysis provides the observed FWald. • Now calculate p-values using the F or randomization distribution. • Need to check spectral components for each rerandomization. • Note the difference in denominator df for Trellis: • Although these are the df for the unconstrained fit, because algorithm failed for the constrained fit. • Not a problem for randomization p-values as they are not needed.
Comparison of distributions Fintra= 13.47 pF= 0.001 pR= 0.004 Fcomb= 15.92 (unconst 25.59) pF= <0.001 pR= 0.004 • Trellis • Trellis F= 5.10 pF= 0.009 pR= 0.005 F= 0.24 pF= 0.627 pR= 0.630 • Trellis#Method • Method
7. Some issues • Size of permutations sample • A controversy: sometimes pooling • Unit-treatment additivity
Size of permutations sample • A study of subsamples of the 50,000 randomly selected permutations revealed that: • the estimates of p-values from samples of 25,000 or more randomized layouts have a range < 0.005. • samples of 5,000 randomized layouts will often be sufficiently accurate – the estimates of p-values • around 0.01 or less, exhibit a range < 0.005; • in excess of 0.20, show a range about 0.03; • around 0.05, display a range of 0.01.
Unit-treatment additivity • Cox and Reid (2000) allow random unit-treatment interaction; • Test hypothesis that treatment effects are greater than unit-treatment interaction. • Nelder(1977) suggests the random form is questionable. • The Iowa school allows arbitrary (fixed) unit-treatment interactions. • Test difference between the average treatment effects over all units, which is biased in the presence of unit-treatment interaction. • Such a test ignores marginality/hierarchy. • Questions: • Which form applies? • How to detect unit-treatment interaction? Often impossible, but, when it is possible, cannot be part of a randomization analysis. • Randomization analysis requires unit-treatment additivity. • If not appropriate, use a randomization-based mixed model.