130 likes | 169 Views
Factorial Design of Experiments. Kevin Leyton-Brown. Terminology: variables. response variable outcome of an experiment. The thing being measured factor a variable upon which the experimenter believes that one or more response variables may depend, and which the experimenter can control
E N D
Factorial Design of Experiments Kevin Leyton-Brown
Terminology: variables • response variable • outcome of an experiment. The thing being measured • factor • a variable upon which the experimenter believes that one or more response variables may depend, and which the experimenter can control • the design of the experiment will largely consist of a policy for determining how to set the factors in each experimental trial • possible values of a factor are called levels. • in other literature, the word “version” is used for a qualitative (not quantitative) level • experimental region / factor space • all possible levels of the factors which are candidates for inclusion in the experiment • covariates • these are like factors which are not under the control of the experimenter. • Like factors, it is believed that they might affect the value of the response. • Unlike factors, they cannot be controlled, though they can be observed. Experimental Design
Terminology: experiments • test run • a single factor/level combination for which an observation of the response is recorded • repeat tests • two or more test runs. Every effort is made to ensure that the individual test runs take place under identical circumstances • replications • two or more repetitions of an experiment or a portion of an experiment under potentially different conditions • experimental unit • a measurement (or a material upon which a measurement is made) • it is important that the experimental units be homogeneous • i.e., they do not exhibit systematic variation Experimental Design
Terminology: Experimental Design • confounding • if two factors are varied simultaneously, we can say the effect of the factors is confounded. Likewise, bad experimental design can lead to the effect of one or more factors being confounded with one or more covariates. • blocking • if experimental units can be partitioned into groups (blocks) so that experimental units within groups are more homogeneous than experimental units across groups, the experiment can be designed differently to minimize the effects of the non-homogeneity of experimental units. For example, the experiment could be replicated in each block. • experimental design / experimental layout • choice of the factor-level combinations to be examined • number of repeat tests, replications; blocking • assignment of factor-level combinations to experimental units • sequencing of test runs • effect of factors on response • the change in the average response under two or more factor-level combinations Experimental Design
Examples • Suspended-Particulate study • measure three factors • consider all combination of factor levels (possible because each factor takes on only two or three values) • random sequencing • temperature measured as a covariate • Soil treatment study • effect of two factors (fertilizer; plowing) on soybean yield • three different fields must be used, so blocking is done • thus, each factor-level combination will be tested on each field, with the experiment replicated three times Experimental Design
Common Design Problems • Masking factor effects • if variation in test results is on the same order of magnitude as the factor effects, the latter may go undetected. • This can be addressed through appropriate choice of sample size. • unmeasured covariates (e.g., the effect of time passing, as an instrument degrades) can also lead to variation that masks factor effects • Uncontrolled factors • it’s pretty obvious that all variables of interest should be included as factors • sometimes, though, it can be tricky to choose the appropriate level of model granularity. • Sometimes a “high-level feature” (i.e., some function of the levels of many factors) could be chosen in place of the many factors that influence it • this can make it difficult to vary the factor appropriately, and can limit analysis of the experimental results. • One-factor-at-a-time testing • it can be tempting to vary each factor value independently, holding the others constant. However, this does not explore the factor space very effectively. • It would be a terrible local search strategy! • Looking at the same point in another way, it neglects the possibility that interactions between factors could be important Experimental Design
Experimental Design: Overview • Consideration of Objectives • consider the sorts of hypotheses that should be tested by the experiment • determine which variables are observable • Factor Effects • select factors, covariates • guess at whether factors act independently of each other or not • Precision and Efficiency • Precision: the least precise definition of this term that could possibly be given. My best guess: “variability in a statistic of interest” • variance in measured response values given identical repeat tests • variance in measured feature levels under homogeneous conditions • ways of improving precision: blocking, repeat tests, replication, accounting for covariates • that is, either increasing sample size or accounting for previously unconsidered sources of variation • Efficiency: for efficiency reasons, the authors leave this one out! • Randomization • …in the sequence of test runs or the assignment of factor-level combinations can help protect against systematic bias. Do it. Experimental Design
Factorial Experiments in Completely Randomized Designs • A (complete) factorial experiment includes all possible factor-level combinations • One of the most straightforward experimental designs to implement in this setting is a completely randomized design • in such a design, factor-level combinations are sequenced / assigned to experimental units uniformly at random • Randomization can prevent all kinds of problems, most of which we probably don’t have in a computational setting: • “instrument drift, power surges, equipment malfunctions, technician or operator errors, …” • note, of course, that randomization doesn’t prevent these problems from occurring. It just eliminates bias • While many of these problems don’t affect us in empirical algorithmic experiments, some still can, and in any case randomization should be very easy for us to do. • Amazingly enough, they suggest that we obtain random numbers by consulting a table in the appendix! Experimental Design
Repeat Tests • Experiment designs should include repeat test runs • that is, runs where all the parameters are exactly the same • can be used to estimate variance due from uncontrolled / random causes • while we often do this when studying randomized algorithms, we probably do it very rarely when studying non-randomized algorithms. • Is it a good idea? • balance – having each factorial-level combination repeated an equal number of times is desirable • when this is not possible or appropriate, a good approach is to randomly (without replacement) select factor-level combinations to repeat once • simply repeating some factor-level combinations many times while not repeating others is a bad idea • for some reason, they state that six repeated tests is a good amount for estimating variance. In practice, I’m sure it depends on dimensionality of factor space, degree of interaction between factors, etc. Experimental Design
Interactions between Factors • Here we investigate situations where joint factor effects are present • in other words, the value of the response variable does not depend only on a function of this kind: • instead, the response variable depends on a function of this kind: • how do we calculate the magnitudes of the interaction effects? Experimental Design
Calculation of Factor Effects: Notation • factors are denoted using uppercase Latin letters: A, B, C, … • an individual response is a lowercase letter, usually subscripted • one subscript for each factor; designates factor-level combination used to generate response. Possibly other subscripts to denote which repeat trial. • example: 3 factors, A, B, C. • Domain of A is {1,2}, subscripted by i; domain of B is {1,2}, subscripted by j. • Domain of C is {1,2,3,4}, subscripted by k. • r repeat trials were conducted; the repeat trial is denoted by a fourth subscript l, with domain {1, …, r} • responses would be written as yijkl • To denote averaging over one or more of the subscripts, we replace the subscript by a dot. In our example (where n = 2 ¢ 2 ¢ 4 ¢ r): Experimental Design
Calculation of Factor Effects • Effects representation: • in the case of two-level factors, encode the domain of each factor as {-1,1} • the AB column is then the product of A and B • Main effect for factor A (two factors, two levels each): • Interaction between two factors A and B: Experimental Design
Calculation of Factor Effects • Based on these definitions, it seems pretty easy to calculate the factor effects by summing appropriately. However, they give us: • Algorithm for calculating effects of two-level factors: • construct the effects representation for each main effect and each interaction • calculate linear combinations of the responses, using the signs in the effects column for ech main effect and for each interaction • Divide the results from (2) by 2k-1 where k is the total number of factors • Then, we can compare the calculated effects for each interaction with the standard error. If an effect is much larger (e.g., more than twice as large) the authors seem to treat it as significant. • they calculate standard error as 2se/ n1/2, where se is the estimated standard deviation of the uncontrolled experimental error, and n is the total number of test runs (including repeat tests) • What if we have more than two factor levels? Basically, things get messy. One solution (if numbers of factor levels are powers of two) is to represent each bit of the factor level using a new, derived two-level factor. Now we need interactions between these simpler factors just to get the main effects of the original factor. However, everything else works as before. Experimental Design