Comparing Classical and Bayesian Approaches to Hypothesis Testing

Comparing Classical and Bayesian Approaches to Hypothesis Testing James O. Berger Institute of Statistics and Decision Sciences Duke University www.stat.duke.edu

Outline • The apparent overuse of hypothesis testing • When is point null testing needed? • The misleading nature of P-values • Bayesian and conditional frequentist testing of plausible hypotheses • Advantages of Bayesian testing • Conclusions

I. The apparent overuse of hypothesis testing • Tests are often performed when they are irrelevant. • Rejection by an irrelevant test is sometimes viewed as “license” to forget statistics in further analysis

Prototypical example

Statistical mistakes in the example • The hypothesis is not plausible; testing serves no purpose. • The observed usage levels are given without confidence sets. • The rankings are based only on observed means, and are given without uncertainties. (For instance, perhaps Pr (A>B)=0.6 only.)

II. When is testing of a point null hypothesis needed? Answer: When the hypothesis is plausible, to some degree.

Examples of hypotheses that are not realistically plausible • H0: small mammals are as abundant on livestock grazing land as on non-grazing land • H0: survival rates of brood mates are independent • H0: bird abundance does not depend on the type of forest habitat they occupy • H0: cottontail choice of habitat does not depend on the season

Examples of hypotheses that may be plausible, to at least some degree: • H0: Males and females of a species are the same in terms of characteristic A. • H0: Proximity to logging roads does not affect ground nest predation. • H0: Pollutant A does not affect Species B.

III. For plausible hypotheses, P-values are misleading as measures of evidence

IV. Bayesian testing of point hypotheses

The prior distribution

Posterior probability that H0 is true, given the data (from Bayes theorem):

Conditional frequentist interpretation of the posterior probability of H0

V. Advantages of Bayesian testing • Pr (H0 | data x) reflects real expected error rates: P-values do not. • A default formula exists for all situations:

Posterior probabilities allow for incorporation of personal opinion, if desired. Indeed, if the published default posterior probability of H0 is P*, and your prior probability of H0 is P0, then your posterior probability of H0 is

Posterior probabilities are not affected by the reason for stopping experimentation, and hence do not require rigid experimental designs (as do classical testing measures). • Posterior probabilities can be used for multiple models or hypotheses.

An aside: integrating science and statistics via the Bayesian paradigm • Any scientific question can be asked (e.g., What is the probability that switching to management plan A will increase species abundance by 20% more than will plan B?) • Models can be built that simultaneously incorporate known science and statistics. • If desired, expert opinion can be built into the analysis.

Conclusions • Hypothesis testing is overutilized while (Bayesian) statistics is underutilized. • Hypothesis testing is needed only when testing a “plausible” hypothesis (and this may be a rare occurrence in wildlife studies). • The Bayesian approach to hypothesis testing has considerable advantages in terms of interpretability (actual error rates), general applicability, and flexible experimentation.

Comparing Classical and Bayesian Approaches to Hypothesis Testing