0. Why Conduct an Experiment?

0. Why Conduct an Experiment? • Infer causation • Ensure repeatability Also, • Determine the relationships between variables • Estimate significance of each variable

1. Components of Experimentation • Formulate research hypotheses • Derivations from a theory • Deductions from empirical experience • Speculation Research hypotheses are the questions we hope to answer by conducting the experiment

Components of experimentation • Define variables and design • Are the independent variables capable of testing the hypotheses? • Are the independent variables confounded? • For example, assume buffer size and cycle time are independent variables • Condition 1: buffer size = 10, cycle time = 1 min • Condition 2: buffer size = 15, cycle time = 1 min • Condition 3: buffer size = 15, cycle time = 2 min

Components of experimentation • No meaningful inference can be drawn by conducting an experiment that includes conditions 1 and 3 because the variables are confounded, that is, varied simultaneously under control of the researcher.

Components of experimentation Design of experiments is a technique for examining and maximizing the information gained from an experiment.

Components of experimentation • Conduct experiment • Collect data • Extract information • Analyze results • Test hypotheses • Report outcomes

Components of experimentation An experiment is conducted usually to test a theory. If the outcome of the experiment is negative, the experiment may be inadequate while the theory may be valid.

Definitions • Factor - an input variable • Level - a possible value of a factor • Treatment - a combination of factors, all at a specified level, as in a simulation run • Parameter - a measure calculated from all observations in a population • Statistic - a measure calculated from all observations in a sample

2. Hypothesis Testing In analyzing data from an experiment, we are interested in describing not only the performance of the subjects selected in the treatment conditions–we want to make inferences about the behavior of the source population of our sample of subjects.

Hypothesis testing • We start by making an assumption about the value of a population parameter. • We can test this assumption in two ways: • census • foolproof, time consuming • random sample • not foolproof, faster than census

Hypothesis testing • In hypothesis testing, we make an assumption (hypothesis) about the value of a population parameter and test it by examining evidence from a random sample taken from the population. • Since we are not testing the entire population, we must be aware of the risk associated with our decision.

Hypothesis testing • We start by formulating two competing hypotheses. • We test the hypothesis that is the opposite of the inference we want to make. • The hypothesis we test is called the null hypothesis (H0); the inference we want to make is called the alternative hypothesis (H1).

Example 1 Yosemite recently acquired the Acme Disintegrating Pistol. However, after repeated attempts with the pistol, he has been unsuccessful at destroying Bugs. Yosemite suspects that the pistol is not delivering its rated output of 10 megatons/shot. He has decided to keep the pistol only if the output is over 10 megatons/shot. He takes a random sample of 100 shots and records the outputs. What null and alternative hypotheses should Yosemite use to make the decision?

Example 1 - One Sided Alternative • Let  denote the mean output/shot. H0: 10 H1: >10 • Practically H0: =10 H1: >10

Example 2 - Two Sided Alternative Suppose Yosemite bought a used Pistol and he suspects that the output is not 10 megatons/shot. What should be the null and alternative hypotheses? H0: =10 H1: 10

Hypothesis Testing-Two Populations • With one population, we are interested in making an inference about a parameter of a population. • With two populations, we are interested testing hypotheses about parameters of two populations. • We want to compare the difference between the parameters, not their actual values.

Example 3 Han Solo has been disappointed with the performance of his X-Wing fighter lately. He usually finds himself trailing the other fighters on Death Star missions. He suspects the quality of the fuel he is getting from the neighborhood fuel portal on his home planet of Tatooine. He decides to try the fuel portal located on the nearby planet of Betelgeuse. After each fill, Han checks the fighter’s logs for the time it takes to jump to hyperspace and compares it with the logs from the Tatooine fuel. The jump takes an average of 17.01 trilons on Tatooine fuel and 16.9 trilons on Betelgeuse fuel. Can Han attribute this difference to fuel?

Example 3 • Let 1 denote the time taken to jump to hyperspace on Tatooine fuel and 2 denote the time taken to jump to hyperspace on Betelgeuse fuel. H0: 1 – 2 0 H1: 1 – 2 < 0

Hypothesis Testing-Two Populations • Practically H0: 1 – 2= 0 H1: 1 – 2 < 0 or H0: 1 = 2 H1: 1 < 2

Hypothesis testing • We formulate hypotheses to assert that the treatments (independent variables) will produce an effect. We would not perform an experiment otherwise. • We formulate two mutually exclusive hypotheses that cover all possible parameter values.

Hypothesis testing • The statistical hypothesis we test is called the null hypothesis (H0). It specifies values of a parameter, often the mean. • If the values obtained from the treatment groups are very different than those specified by the null hypothesis, we reject H0 in favor the alternative hypothesis (H1).

Hypothesis Testing-Multiple Populations The null hypothesis usually assigns the same value to the treatment means: H0:1= 2= 3= … H1: not all s are equal

Hypothesis Testing-Multiple Populations 1 = 2 = 3 1  2  3

Hypothesis Testing-Multiple Populations • The null hypothesis is an exact statement – the treatment means are equal. • The alternative hypothesis is an inexact statement – any two treatment means may be unequal. Nothing is said about the actual differences between the means because we would not need to experiment in that case.

Hypothesis Testing-Multiple Populations • A decision to reject H0 suggests significant differences in the treatment means. • If the treatments means are reasonably close to the ones specified in H0, we do not reject H0. • We usually cannot accept H0; we question the experiment instead.

2.1 Experimental Error • We can attribute a portion of the difference among the treatment means to experimental error. • This error can result from: • sampling • error in entering input data • error in recording output data • inadequate run length

Experimental error • Under the null hypothesis, we have two sources of experimental error – differences within treatment means and differences between treatment means. • Under the alternative hypothesis, we have genuine differences among treatment means. However, a false null hypothesis does not preclude experimental error.

Experimental error • A false null hypothesis implies that treatment effects are also contributing toward the differences in means.

2.2 Evaluating H0 • If we form a ratio of the two experimental errors under H0, we have: • This can also be thought of as contrasting two experimental errors:

Evaluating H0 • Under H1, there is an additional component in the numerator:

3. ANOVA and the F ratio • To evaluate the null hypothesis, it is necessary to transform the between- and within-group differences into variances. • The statistical analysis involving the comparison of variances is called the analysis of variance.

ANOVA and the F ratio • Degrees of freedom is approximately the number of observations with independent information, that is, variance is roughly an average of the squared deviations.

3.1 The F Ratio • Under H0, we expect the F ratio to be approximately 1. • Under H1, we expect the F ratio to be greater than 1.

Typical data for 1-way ANOVA

ANOVA table l - treatment levels N - total number of observations

Computational formulas

3.2 Evaluating the F ratio • Assume we have a population of scores and we draw at random 3 sets of 15 scores each. • Assume the null hypothesis is true, that is, each treatment group is drawn from the same population (1=2=3). • Assume we draw a very large number of such experiments and compute the value of F for each case.

3.3 Sampling Distribution of F • If we group the Fs according to size, we can graph them by the frequency of occurrence. • A frequency distribution of a statistic such as F is called the sampling distribution of the statistic.

Sampling distribution of F • The graph demonstrates that the F distribution is the sampling distribution of F when infinitely many experiments are conducted. • This distribution can be determined for any experiment, that is, any number of groups and any number of subjects in the groups.

Sampling distribution of F • The F distribution allows us to make statements concerning how common or rare an observed F value is. For example, only 5% of the time would we expect anFobs 3.23. • This is the probability that an Fobs 3.23 will occur on the basis of chance factors alone.

Sampling distribution of F • We have considered the sampling distribution of F under H0. However, we conduct experiments expecting to find treatment effects. • If H0 is false, we expect that F > 1. The sampling distribution of F under H1 is called F'.

Sampling distribution of F • We cannot plot the distribution of F' as we can with F, because the distribution of F' depends on the magnitude of the treatment effects as well as the df s.

3.4 Testing the Null Hypothesis H0: all means are equal H1: not all means are equal Alternatively, H0: there are no treatment effects H1: there are some treatment effects

Testing the null hypothesis • When we conduct an experiment, we need to decide if the observed F is from the F distribution or the F' distribution. • Since we test the null hypothesis, we focus on the F distribution. • Theoretically, it is possible to obtain any value of F under H0.

Testing the null hypothesis • Thus, we cannot be certain that an observed F is from the F or the F' distribution, that is, we do not know if the sample means are different due to chance. • We can take this attitude and render the experiment useless or we can be willing to make mistakes in rejecting the null hypothesis when it’s true.

Testing the null hypothesis • We select an arbitrary dividing line for any F distribution where values of F falling above the line are unlikely and ones falling below the line are likely. • If the observed F falls above the line, we can conclude that it is incompatible with the null hypothesis (reject H0).

Testing the null hypothesis • If the observed F falls below the line, we can conclude that it is compatible with the null hypothesis (retain H0). • The line conventionally divides the F distribution so that 5% of the area under the curve (cumulative probability) is the region of incompatibility. This probability is called the significance level.

Testing the null hypothesis • We can choose any significance level, as long as it’s done before the experiment. • The formal rule is stated as: Reject H0 when Fobs F(dfnum,dfdenom); otherwise retain H0

Testing the null hypothesis Most software reports the probability of occurrence of Fobs. This relieves us from consulting the F tables (but not from specifying  before the test). The formal rule becomes: If p , reject H0; otherwise retain H0

0. Why Conduct an Experiment?