BIAS AND CONFOUNDING

BIAS AND CONFOUNDING Nigel Paneth

HYPOTHESIS FORMULATION AND ERRORS IN RESEARCH All analytic studies must begin with a clearly formulated hypothesis. The hypothesis must be quantitative and specific. It must predict a relationship of a specific size.

• For example: “Babies who are breast-fed have less illness than babies who are bottle-fed.” Which illnesses? How is feeding type defined?How large a difference in risk? • A better example: “Babies who are exclusively breast-fed for three months or more will have a reduction in the incidence of hospital admissions for gastroenteritis of at least 30% over the first year of life.”

Only specific prediction allows one to draw legitimate conclusions from a study which tests a hypothesis. But even with the best formulated hypothesis, two types of errors can occur. • Type 1 - observing a difference when in truth there is none. • Type 2 - failing to observe a difference when there is one.

These errors are generally produced by one or more of the following: • RANDOM ERROR • RANDOM MISCLASSIFICATION • BIAS • CONFOUNDING

RANDOM ERROR Deviation of results and inferences from the truth, occurring only as a result of the operation of chance. Can produce type 1 or type 2 errors.

RANDOM (OR NON-DIFFERENTIAL) MISCLASSIFICATION Random error applied to the measurement of an exposure or outcome. Errors in classification can only produce type 2 errors, except if applied to a confounder or to an exposure gradient.

BIAS Systematic, non-random deviation of results and inferences from the truth, or processes leading to such deviation. Any trend in the collection, analysis, interpretation, publication or review of data that can lead to conclusions which are systematically different from the truth. (Dictionary of Epidemiology, 3rd ed.)

MORE ON BIAS Note that in bias, the focus is on an artifact of some part of the research process (assembling subjects, collecting data, analyzing data) that produces a spurious result. Bias can produce either a type 1 or a type 2 error, but we usually focus on type 1 errors due to bias.

MORE ON BIAS Bias can be either conscious or unconscious. In epidemiology, the word bias does not imply, as in common usage, prejudice or deliberate deviation from the truth.

CONFOUNDING A problem resulting from the fact that one feature of study subjects has not been separated from a second feature, and has thus been confounded with it, producing a spurious result. The spuriousness arises from the effect of the first feature being mistakenly attributed to the second feature. Confounding can produce either a type 1 or a type 2 error, but we usually focus on type 1 errors.

THE DIFFERENCE BETWEEN BIAS AND CONFOUNDING Bias creates an association that is not true, but confounding describes an association that is true, but potentially misleading.

EXAMPLES OF RANDOM ERROR, BIAS, MISCLASSIFICATION AND CONFOUNDING IN THE SAME STUDY: STUDY: In a cohort study, babies of women who bottle feed and women who breast feed are compared, and it is found that the incidence of gastroenteritis, as recorded in medical records, is lower in the babies who are breast-fed.

EXAMPLE OF RANDOM ERROR By chance, there are more episodes of gastroenteritis in the bottle-fed group in the study sample, producing a type 1 error. (When in truth breast feeding is not protective against gastroenteritis). Or, also by chance, no difference in risk was found, producing a type 2 error (When in truth breast feeding is protective against gastroenteritis).

EXAMPLE OF RANDOM MISCLASSIFICATION Lack of good information on feeding history results in some breast-feeding mothers being randomly classified as bottle-feeding, and vice-versa. If this happens, the study finding underestimates the true RR, whichever feeding modality is associated with higher disease incidence, producing a type 2 error.

EXAMPLE OF BIAS The medical records of bottle-fed babies only are less complete (perhaps bottle fed babies go to the doctor less) than those of breast fed babies, and thus record fewer episodes of gastro-enteritis in them only. This is called ias because the observation itself is in error.

EXAMPLE OF CONFOUNDING The mothers of breast-fed babies are of higher social class, and the babies thus have better hygiene, less crowding and perhaps other factors that protect against gastroenteritis. Crowding and hygiene are truly protective against gastroenteritis, but we mistakenly attribute their effects to breast feeding. This is called confounding. because the observation is correct, but its explanation is wrong.

PROTECTION AGAINST RANDOM ERROR AND RANDOM MISCLASSIFICATION Random error can work to falsely produce an association (type 1 error) or falsely not produce an association (type 2 error). We protect ourselves against random misclassification producing a type 2 error by choosing the most precise and accurate measures of exposure and outcome.

PROTECTION AGAINST TYPE 1 ERRORS We protect our study against random type 1 errors by establishing that the result must be unlikely to have occurred by chance (e.g. p < .05). P-values are established entirely to protect against type 1 errors due to chance, and do not guarantee protection against type 1 errors due to bias or confounding. This is the reason we say statistics demonstrate association but not causation.

PROTECTION AGAINST TYPE 2 ERRORS We protect our study against random type 2 errors by • providingadequate sample size, and • hypothesizing large differences. The larger the sample size, the easier it will be to detect a true difference, and the largest differences will be the easiest to detect. (Imagine how hard it would be to detect a 1% increase in the risk of gastroenteritis with bottle-feeding).

TWO WAYS TO INCREASE POWER The sample size needed to detect a significant difference is called the powerof a study. • Choosing the most precise and accurate measures of exposure and outcome has the effect of increasing the power of our study, because of variances of the outcome measures, which enter into statistical testing, are decreased. • Having an adequate sized sample of study subjects

KEY PRINCIPLE IN BIAS AND CONFOUNDING The factor that creates the bias, or the confounding variable, must be associated with both the independent and dependent variables (i.e. with the exposure and the disease). Association of the bias or confounder with just one of the two variables is not enough to produce a spurious result.

In the example just given: The BIAS, namely incomplete chart recording, has to be associated with feeding type (the independent variable) and also with recording of gastroenteritis (the dependent variable) to produce the false result. The CONFOUNDING VARIABLE (or CONFOUNDER) better hygiene, has to be associated with feeding type and also with gastroenteritis to produce the spurious result.

Were the bias or the confounder associated with just the independent variable or just the dependent variable, they would not produce bias or confounding. This gives a useful rule: If you can show that a potential confounder is NOT associated with either one of the two variables under study (exposure or outcome), confounding can be ruled out.

GOOD STUDY DESIGN PROTECTS AGAINST ALL FORMS OF ERROR

SOME TYPES OF BIAS 1. SELECTION BIAS Any aspect of the way subjects are assembled in the study that creates a systematic difference between the compared populations that is not due to the association under study.

2. INFORMATION BIAS Any aspect of the way information is collected in the study that creates a systematic difference between the compared populations that is not due to the association under study. (some call this measurement bias). The incomplete chart recording in the baby feeding example would be a form of information bias. Other examples - • Diagnostic suspicion bias • Recall bias Sometimes biases apply to a population of studies, rather than to one study, as in publication bias (tendency to publish papers which show positive results).

BIAS AND CONFOUNDING