Framing and testing hypotheses

Framing and testing hypotheses

Hypotheses • Potential explanations that can account for our observations of the external world • They usually describe cause and effect relationships

Collecting observations is a means to the understanding of a cause

Observations from • Manipulative experiments • Observational or correlative studies

Hypothesis • Suggested by the data • Existing body of scientific literature • Predictions of theoretical models • Our own intuition and reasoning

A valid scientific hypothesis • Must be testable • Should generate novel predictions • Should provide a unique set of predictions that do not emerge from other explanations

Scientific method • Is the technique used to decide among hypotheses on the basis of observations and predictions

Deduction and induction • Deduction proceeds from the general case to the specific case: “certain inference” • Induction proceeds from the specific case to the general case: “probable inference” • Both induction and deduction are used in all models of scientific reasoning, but they receive different emphasis

Statistics • It is an inductive process: we are trying to draw general conclusions based on a specific, limited sample

Prediction hypothesis Do new observations match predictions? “Accepted truth” The inductive method Initial observation suggests generates experiments and data New observations YES, confirm hypothesis NO, modify hypothesis

Advantages of the inductive method • It emphasizes the link between data and theory • Explicitly builds and modifies the hypothesis based on previous knowledge • It is confirmatory (we seek data that support the hypothesis)

Disadvantages of the inductive method • Considers only a single starting hypothesis • Derives theory exclusively from empirical observations; “some important hypotheses have emerged well in advance of the critical data that are needed to test them” • Places emphasis on a single correct hypothesis, making it difficult to evaluate cases in which multiple factors are at work.

The null hypothesis • Is the starting point of a scientific investigation • It tries to account for patterns in the data in the simplest way possible, which often means initially attributing variation in the data to randomness or measurement error

How do we generate an appropriate null hypothesis? • Example: • The photosynthetic response of leaves to increases in light intensity

Each point represents a different leaf for which we record the light intensity (x axis, predictor variable) and the photosynthetic rate (y axis, response variable) Simplest null hypothesis is that there is no relationship between the two variables

The Michaelis-Menten equation • Notice that if X is large compared to D, X/(D + X) approaches 1. Therefore, the rate of product formation (k) is equal to Y in this case. • When X equals D, X/(D + X) equals 0.5. In this case, the rate of product formation is half of the maximum rate (1/2 k). By plotting Y against X, one can easily determine Ymax (k) and D.

Using our knowledge about plant physiology, we can formulate a more realistic initial hypothesis The Michaelis-Menten equation [Y=kX/(D+X)], where k =asymptotic assimilation rate, and D =half saturation constant Real data could be used to test the degree of support for this more realistic hypothesis against other alternatives

The Hypothetico-Deductive Method • Championed by the philosopher of science Karl Popper (1902-1994) • The goal of these tests is not to confirm, but to falsify, the hypothesis • The accepted scientific explanation is the hypothesis that successfully withstands repeated attempts to falsify it

Prediction C Prediction B Prediction A Prediction D hypothesis hypothesis hypothesis hypothesis Do new observations match predictions? “Accepted truth” Multiple failed falsifications The Hypothetico-Deductive Method Initial observation suggests YES, repeat attempts to falsify New observations NO, falsify hypothesis

Advantages of the Hypothetico-Deductive Method • It forces a consideration of multiple working hypotheses right from the start • It highlights the key predictive differences between them • The emphasis on falsification tends to produce simple, testable hypotheses, so that parsimonious explanations are considered first and more complicated mechanisms only later.

Disadvantages of the Hypothetico-Deductive Method • Multiple working hypotheses may not always be available, particularly in the early stages of investigation • Even if multiple hypotheses are available, the method does not really work unless the “correct” hypothesis is among the alternatives • Places emphasis on a single correct hypothesis, making it difficult to evaluate cases in which multiple factors are at work.

Testing Statistical Hypotheses • Statistical hypothesis versus Scientific hypothesis • We use statistics to describe pattern in our data, and then we use statistical tests to decide whether the predictions of an hypothesis are supported or not

The Scientific Method • Establishing hypotheses • Articulating predictions • Designing and executing valid experiments • Collecting data • Organizing data • Summarizing data • Statistical tests

Statistical hypothesis versus Scientific hypothesis • Accepting or rejecting a statistical hypothesis is quite distinct from accepting or rejecting a scientific hypothesis. • The statistical null hypothesis is usually one of “no pattern”, such as no difference between groups or no relationship between two continuous variables.

Statistical hypothesis versus Scientific hypothesis • In contrast, the alternative hypothesis is that pattern exists. • You must ask how such patterns relate to the scientific hypothesis you are testing • The absence of evidence is not evidence of absence; failure to reject a null hypothesis is not equivalent to accepting a null hypothesis

The statistical null hypothesis A typical statistical null hypothesis is that “differences between groups are no greater than we would expect due to random variation”

The statistical alternative hypothesis • Once we state the statistical null hypothesis, we then define one or more alternatives to the null hypothesis • The alternative hypothesis is focused simply on the pattern that is present in the data • The investigator “infers” the mechanism from the pattern, but that inference is a separate step

The statistical test merely reveals whether the pattern is likely or unlikely, given that the null hypothesis is true. • Our ability to assign causal mechanisms to those statistical patterns depends on the quality of our experimental design and our measurements

An important goal of a good experimental design is to avoid confounded designs

Statistical significance and P-values • In many statistical analyses, we ask whether the null hypothesis of random variation among individuals can be rejected • A statistical P-value measures the probability that observed or more extreme differences would be found if the null hypothesis were true. P(data|Ho)

What determines the P-value? • The calculated P-value depends on three things: • The number of observations in the samples (n) • The differences between the means of the samples • The level of variation among individuals

When is a P-value small enough? • This is a judgment call, as there is no natural critical value below which we should always reject the null hypothesis and above which we should never reject it. • Convention: P<0.05 (1/20)

When is a P-value small enough? • Perhaps the strongest argument in favor of requiring a low critical value is that we humans are psychologically predisposed to recognizing and seeing patterns in our data, even when they don’t exist!

Decision Errors Because we have incomplete and imperfect information, there are four possible outcomes when testing a H0: • When we correctly reject a false H0 • When we correctly retain a true H0 • When we mistakenly reject a true H0 (Type I Error) • When we mistakenly retain a false H0 (Type II Error)

Decision Errors

Type I Error If we falsely reject a null hypothesis that is true, we have made a false claim that some factor above and beyond random variation is causing patterns in our data. In environmental impact assessment would be a “false +” It is signified by the greek letter: α (alpha) This error only occurs when the H0 is indeed true. Generally, this is the most concerning error because it misleads us into believing that our results are significant when they are not. “Producer error”

Type I Error

Type II Error This error occurs when there are systematic differences between the groups being compared, but the investigator has failed to reject the null hypothesis and has concluded incorrectly that only random variation among observations is present. In environmental impact assessment would be a “false -” It is signified by the greek letter: β (Beta) This error only occurs when the H0 is false. A Type II error will mislead you into thinking that there is no significant effect happening, when in actuality there is. Depending on the experimental design, this type of error can be just as damaging (e.g. environmental impact surveys, medical diagnosis, etc). “Consumer error”

Type II Error

Power • (1-β): equals the probability of correctly rejecting the null hypothesis when is false • Ideally, we would like to minimize both Type I and Type II errors in our statistical inference. However strategies designed to reduce Type I error inevitably increase the risk of Type II error, and vice versa.

Power • Although Type I and Type II errors are inversely related to one another, there is no simple mathematical relationship between them, because the probability of a Type II error depends on: • The alternative hypothesis • How large an effect we hope to detect • Sample size • Wisdom of our experimental design and sampling protocol

The relationship between Type I and Type II errors

Estimating Power • ES is effect size we wish to detect, n is sample size, α is the significance level, and σ is the standard deviation between sampling or experimental units • R. Lenth provides free online software to assist in a priori power analysis for various statistical tests: http://www.stat.uiowa.edu/~rlenth/Power/

Parameter estimation and prediction • Rather than try to test multiple hypotheses, it may be more worthwhile to estimate the relative contributions of each to a particular pattern. • In such cases, rather than ask whether a particular cause has some effect versus no effect, we ask what is the best estimate of the parameter that expresses the magnitude of the effect

Framing and testing hypotheses