Advanced Epidemiology Methods Short Course by Matthew Fox, Boston University

Advanced EpiAugust 15-19th 2011SACEMA Matthew Fox Boston University Center for Global Health and Development Department of Epidemiology Health Economics and Epidemiology Research Office mfox@bu.edu

Introductions • Who are you? • Where do you work/study? • What do you study?

Welcome • About me • Week long short course on epi methods • 2 Sessions/day each about 3 hours (depending) • Assumes intro/intermediate epi, practical experience with epi and stats • Mix of lecture and discussion • Too much material, take good notes, go back to them • Finish mid-day on Friday • Course works if you read and participate

Course Overview • Review basic epidemiologic principles • Reinterpret them in a new light • Think through problems/implications of what we learned in intro/intermed epi • Develop a causal framework(s) to hang our epidemiologic thinking • Learn/apply advanced epi methods

Modern Epidemiology III

Questions for Today • What is epidemiology, what is its goal? • What are measures of association and measures of effect? • What do these measures really mean? • Which ones have causal meanings? • What is the odds ratio really about • Why does everyone use it?

The goal of epidemiologic research • Epidemiology is study of: • The distribution and determinants of disease in human populations and the application of that knowledge to the control of disease • But the goal is: • To obtain a valid and precise (and generalizable) estimate of the effect of an exposure on a disease • Validity is the opposite of bias, precision is the opposite of random error • Fundamentally concerned with measurement

Anyone remember Type I and Type II error? What are they?

Basic Statistics Type I: If we reject the null, what are the chance there is no effect? Type II: If we fail to reject the null, what are the chances there is an effect?

How do we know a particular epidemiologic finding is true? • Find that the relative risk of exposure to vitamin # on cancer @ is 2.5, p=0.049 • Assume we did the perfect study • No bias (confounding, selection, information) • 80% power, alpha = 0.05 • What is chance there is really no effect of vitamins on cancer? • i.e. True relative risk is 1

Syphilis testing in the US • In US pre-2005, Massachusetts required a syphilis test before marriage • Assume the test was: • 95% sensitive and 95% specific • If I test positive, how likely is it that I truly have syphilis? • Answer is that it depends

Syphilis Se = 95% Sp = 95% PPV = 16% 95 495 590 5 9405 9410 100 9900 10,000

Back to our study Alpha and beta use the TRUTH as the denominator and so are like Se and Sp

Back to our study Judging the “correctness” of a single study is the PPV, and depends of the prevalence of true hypotheses

Back to our study alpha = 5%, (Sp 95%) beta = 5%, (Se 95%) 68% chance our study is right 950 450 1400 50 8550 8600 1000 9000

Take home message: We need to critically examine the way we have been taught to design and interpret epidemiologic research

Review of basic concepts Study design, measures of disease frequency, measures of effect/association

The Source Population • The population that gives rise to cases • It is defined: • In time and place • With respect to population characteristics • With respect to external influences (modifiers) • Not as a sample of the general population

Cohorts • Membership in a cohort requires a person meet admissibility criteria • Have common admissibility-defining events • Membership begins once the temporally last criterion is met • Once a member, a person never leaves (membership is static or closed) • A closed cohort adds no new members and loses only to death, an open cohort is adding new members

Dynamic population • Membership requires a person satisfy the membership status criteria • They have common admissibility-defining characteristics • Membership exists so long as all of the status criteria are satisfied • A person can enter a dynamic population, leave it, and then re-enter

Cohorts vs. Dynamic Populations • Framingham heart study • Cohort – the admissibility criteria are enrolling in the study in 1948. Never leave the cohort once you enroll. • Dynamic population – could have instead studied all residents of Framingham from 1948 onwards, the catchment population for a case registry there. Some will leave, new people will join.

STUDY DESIGN: How to harvest information from the base • Census (cohort) or Sample (case-control) • Cases are valuable (information rich) • In SE calcs, these drive your standard error • Ex. SE(LN(RR)) = sqrt(1/A–1/N1+1/B–1/N0) • Include all the cases in the population • Information density of population that gave rise to cases is not great • Can include all or sample • Nearly all base’s info is harvested when sample of base is small multiple of the cases

Which is the best measure to assess causal effects? 1) Risk Difference 2) Risk Ratio 3) Odds Ratio

In a case-control study, from what population do we sample controls? Those with disease Those without disease Everyone, regardless of whether they have the disease

Cohort Study

Case-control Study

Kramer and Bovin 1987 We define a cohort study as a study in which subjects are followed forward from exposure to outcome… Inferential reasoning is from cause to effect. In case-control studies, the directionality is the reverse. Study subjects are investigated backwards from outcome to exposure, and the reasoning is from effect to cause.”

Cohort Study: Relative Risks • Relative risk: (A/N1) / (B/N0) • Risk in exposed / risk in unexposed • Risk is number of cases / total at risk • Numerator is number of cases • Denominator is cases and controls!

Cohort Concept Exposed Cases A NE+ C (NE+ - a) t0 t D (NE- - b) NE- Unexposed Cases B

Cohort Study: Relative Risks • Relative risk: • (A/N1)/(B/N0) can be rearranged as (A/B)/(N1/N0) • A/B is ratio of exposed to unexposed cases • N1/N0 is ratio of exposed to unexposed in population

Relative risk has meaning: average increase in risk produced by exposure

Case-control: Cases • Members of population who develop disease over the follow-up period • Same cases as the analogous cohort study • Case ascertainment is influenced by design • Primary base: population defined first • Secondary base: cases defined first

Case-control: Controls • A sample of the population experience that gave rise to the cases • 3 options (paradigms) • Un-diseased experience • Population at risk at beginning of the study • Population experience over follow-up 0 mos 6 mos 12 mos 18 mos 24 mos Cases 0 5 10 15 20 Non-cases 100 95 90 85 80

Case-control Concept Option 2: Case-cohort Exposed Cases A Option 1: Cumulative NE+ C (NE+ - a) t0 t D (NE- - b) NE- Unexposed Cases B Option 3: Density Sampling

Case-control study • Now we can’t estimate risk A/N1 and B/N0 because we don’t know the denominators • Left with an odds ratio • But how to interpret?

2 ways to calculate an OR • Cross product ratio: • (A*D)/(B*C) • Not particularly meaningful, but it works

2 ways to calculate an OR • Case ratio/base ratio: • (A/B) / (C/D) • A/B is the ratio of exposed to unexposed cases • C/D is the ratio of exposed to unexposed controls • Remember back to Relative Risk • Here C/D fills in for N1/N0

The trohoc fallacy 10% sample of non-cases RR = (400/1000) / (100/1000) = 4.0 OR = (400/60) / (100/90) = 6.0 • The trohoc fallacy is idea that a case-control study is a cohort study done backwards (heteropalindrome) • Requires a rare disease assumption for the odds ratio to approximate the relative risk

Case-control Concept Option 2: Case-cohort Exposed Cases A Option 1: Cumulative NE+ C (NE+ - a) t0 t D (NE- - b) NE- Unexposed Cases B

10% sample of population that gave rise to cases The trohoc fallacy revealed RR = (400/1000) / (100/1000) = 4.0 OR = (400/100) / (100/100) = 4.0 • Sample total population that gave rise to cases (which includes cases), not undiseased at end • Cases can be their own controls if randomly sampled • Requires no rare disease assumption

Miettinen on the trohoc fallacy “Consider the clinical trial: the concern is, as always, to contrast categories of treatment as to subsequent occurrence of some outcome phenomenon, whereas comparing different categories of the outcome as to the antecedent distribution of treatment is uninteresting if not downright perverse.” Preferred terms like “case-referent” and “case-base” studies as “the base sample is no more a control series than a census of the base is”

Why it works • OR = [A*D] / [B*C] = [A/B] / [C/D] • If we sample 10% of the base then the odds ratio is: • OR = [A/B] /[(10%*N1)/(10%*N0)] • = [A/B]/(N1/N0) = RR

Cohort studies exclude those who are not at risk for disease (though they don’t need to). In a case control study. Should we exclude those not at risk for exposure? Ex. In a study of hormonal contraception and heart disease, should we exclude nuns?

With appropriate sampling, odds ratio is interpreted as estimate of relative risk, which has meaning. Case control studies are cohort studies done efficiently, not cohort studies done backwards.

Measures of Disease Frequency • Provide an estimate of the occurrence of disease in a population • Typically we study first occurrence as later occurrences are often affected by first • Incorporates: • Disease state • Time • Population definition

Measures of Disease Frequency • Prevalence: • Proportion of population with disease at a particular time • Cross-sectional • Reflects rate of disease occurrence and survival with disease

Measures of Disease Frequency • Cumulative Incidence (Simple) • Proportion of a population that develops disease over a follow-up period • Also called incidence proportion or risk • Bounded by 0 and 1 • Time not part of measure but must report • Difficult to measure in dynamic populations CI(t0,t) = I(t0,t)/N0

Advanced Epidemiology Methods Short Course by Matthew Fox, Boston University