1 / 46

Confounding: control, standardization

Summer Course: Introduction to Epidemiology. August 29, 0900-1030. Confounding: control, standardization. Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa. Session Overview. Review methods used to control, prevent or deal with confounding

saul
Download Presentation

Confounding: control, standardization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summer Course:Introduction to Epidemiology August 29, 0900-1030 Confounding: control, standardization Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa

  2. Session Overview • Review methods used to control, prevent or deal with confounding • Review matching methods • Present standardization methods both direct and indirect (SMR). • This session will give you a ‘flavour’ of the more advanced things in our graduate courses

  3. Confounding (1) Consider a case-control study relating alcohol intake to mouth cancer risk. • Crude OR = 3.2 (95% CI: 2.1 to 4.9) • Stratify the data by smoking status (ever/never): • Ever: OR = 1.2 (95% CI: 0.5 to 2.9) • Never: OR = 1.2 (95% CI: 0.5 to 2.9) • Best guestimate of the ‘true’ OR is 1.2 • This is a simple type of Adjusted OR • More complex methods are used in the ‘real world’. • This is CONFOUNDING.

  4. Confounding (2) Alcohol Mouth cancer Smoking Alcohol Mouth cancer ???

  5. Confounding (3) • Confounding requires three or more variables. • Two variables with multiple levels cannot produce confounding. • Three requirements for confounding • Confounder relates to outcome • Confounder relates to exposure • Confounder is not part of causal pathway between exposure and outcome

  6. Confounding (4) • Confounding is a very serious problem in epidemiological research • Potential confounders are often unknown • Odds Ratio for leukemia in children living near high power hydro lines is about 1.3 • BUT, could be explained by unknown confounders • e.g. pesticide application to grass under hydro towers.

  7. Confounding (5) • How do we deal with confounding? • Prevention • You need to ‘break’ one of the links between the confounder and the exposure or outcome • ‘Treatment’ (analysis) • Stratified analysis • Like my simple example • Standardization • We’ll discuss this later • Regression modeling methods • Covered in a different course 

  8. Confounding (6) • Prevention • Randomization • One of the big advantages of an RCT • Restriction • Limits the subject to one level of confounder • Study effect of alcohol on mouth cancer ONLY in non-smokers • Matching • Ensures that the distribution of the exposure is the same for all levels of confounder

  9. Confounding (7) • Randomization • The ‘treatment’ is the exposure in the confounding triangle. • Subjects randomly assigned to each treatment without regard to other factors. • On average, distribution of other factors will be the same in each treatment group • Implies no confounder/exposure correlation •  no confounding. • Issues • Small sample sizes • Chance imbalances • Infeasible in many situations • Stratified allocation

  10. Confounding (8) • Restriction • Limit the study to people who have the same level of a potential confounder. • Study alcohol and mouth cancer only in non-smokers. • Lack of variability in confounder means it can not ‘confound’ • There is only one 2X2 table in the stratified analysis • Relatively cheap

  11. Confounding (9) • Restriction (cont.) • ISSUES • Limits generalizability • Cannot study effect of confounder on risk • Limited value with multiple potential confounders • Continuous variables? • Can only study risk in one level of confounder • exposure X confounder interactions can’t be studied • Impact on sample size and feasibility • Alternative: do a regular study with stratified analysis • Report separate analyses in each stratum

  12. Confounding (10) • Matching • The process of making a study group and a comparison group comparable with respect to some extraneous factor. • Breaks the confounder/exposure link • Most often used in case-control studies. • Usually can’t match on more than 3-4 factors in one study • With 4 factors, minimum # of matching groups: • 2x2x2x2 = 16 • Let’s talk more about matching

  13. Matching (1) • Example study (case-control) • Identify 200 cases of mouth cancer from a local hospital. • As each new case is found, do a preliminary interview to determine their smoking status. • Identify a non-case who has the same smoking status as the case • If there are 150 cases who smoke, there will also be 150 controls who smoke.

  14. Matching (2) Here is the confounder/outcome 2x2 table in the observed study: Implies no smoking/outcome link and no confounding

  15. Matching (3) Two main types of matching • Individual (pair) • Matches subjects as individuals • Twins • Right/left eye • Frequency • Ensures that the distribution of the matching variable in cases and controls is similar but does not match individual people.

  16. Matching (4) • Matching by itself does not fully eliminate confounding in a case-control study! • You must use analytic methods as well • Matched OR • Stratified analyses • Logistic regression models • In a cohort study, matching is uncommon • But, if done, no need to use complex statistics to complete adjustment

  17. Matching (5) • Advantages • Strengthens statistical analysis, especially when the number of cases is small. • Increases study credibility for ‘naive’ readers. • Useful when confounder is a complex, nominal variable (e.g. occupation). • Without matching, standard statistical methods can be problematic, especially if many levels have very few subjects.

  18. Matching (6) • Disadvantages • You can not study the relationship of matched variable to outcome. • Can be costly and time consuming to find matches, especially if you have many matching factors. • Often, some important predictors can not be matched since you have no information on their level in potential controls before doing interview/lab tests • Genotype • Depression/stress • If matching factor is not a confounder, can reduce precision and power.

  19. Matching (7) Individual matching • My personal view • many studies misrepresent frequency matching as individual matching • In many apparent cases of individual matching, that isn’t what is going on. • Most useful when there is a strong ‘natural’ pairing. • Twins • Body parts • Analysis uses McNemar method to estimate OR (and to do a chi-square test). • Unit of analysis is the pair.

  20. Matching (8) 625 pairs of subjects • 201 pairs where both case and control were exposed • 80 pairs where only case was exposed • 43 pairs where only control was exposed • 302 pairs where neither case or control were exposed

  21. Matching (9) • If exposure causes disease, there should be more pairs with only the case exposed then pairs with only the control exposed. • McNemar OR = 80/43 = 1.86 • Ignoring matching would give OR=1.28

  22. Matching (10) • Can calculate a statistical test of the null hypothesis that OR=1.0 (McNemar’s chi-square test)

  23. Matching (11) • McNemar OR = b/c • ‘a’and ‘d’pairs contribute no information on OR (wasteful of interviews). • Avoiding this justifies ‘counter-matching’, another advanced topic • Make sure table is set-up correctly!! • More sophisticated analysis uses conditional logistic regression modeling (another course).

  24. Matching (12) Frequency matching • Most commonly used method • Many ways to implement this. Here is one: • Case-control study of prostate cancer. • Cases will include all new cases in Ottawa in one year. • Based on cancer registry data, we know what the age distribution of cases will be. • Controls selected at random from the population. • We use the projected distribution of age in the cases to describe how many controls we need in each age group.

  25. Matching (13) • 400 cases & 400 conts • 5% of cases are under age 60 • I want 5% of my controls to be under 60 • 400 * 0.05 = 20 • Similarly for other age groups

  26. Matching (14) • Frequency matching (cont.) • Do you distribute the control recruitment through-out the case recruitment period? • Analysis must stratify by matching groups or strata • Having too many matching groups is a problem • How do I find the matching controls? • Only 4% of the population is between age 75 & 84 but about 30% of my cases are in this group. How do I efficiently over-sample this age group? • Lack of control selection lists in Canada • Random Digit Dialing (RDD) methods. • Volunteers

  27. Confounding (11) Analysis options • Stratified analysis • Divide study into strata based on levels of potential confounding variable(s). • Do analysis within each strata to give strata-specific OR or RR. • If the strata-specific values are ‘close’, produce an adjusted estimate as some type of average of the strata-specific values. • Many methods of adjustment are available. • Mantel-Haenzel is most commonly used.

  28. Confounding (12) • Stratified analysis (cont.) • Strata specific OR’s are: 2.3, 2.6, 3.4 • Strata-specific sample sizes: 100, 200, 50 • A ‘credible’ adjusted estimate should be between 2.3 and 3.4. • Simple average is: 2.8 • Ignores the number of subjects in the strata. If one group has very few subjects, its estimate is less ‘valuable’. • Weight by # of subjects in each group: • Mantel-Haenzel does the same thing with different weights

  29. Confounding (13) • Stratified analysis (cont.) • This approach limits the number of variables which can be controlled or adjusted. • Also hard to apply it to continuous confounders • But, gives information about strata-specific effects and can help identify effect modification. • Used to be very common. Now, no longer widely used in research with case-control studies. • Stratified analysis methods can be applied to cohort studies with person-time. This is still commonly used

  30. Confounding (14) • Analysis options • Regression modeling • Beyond the scope of this course • The most common approach to confounding • Can control multiple factors (often 10-20 or more) • Can control for continuous variables • Logistic regression is most popular method for case-control studies • Cox models (proportional hazard models) are often used in cohort studies.

  31. Standardization (1) • Crude prostate cancer incidence rates (fictional): • Canada (2000): 100/100,000 • Canada (1940): 50/100,000 • Does this mean that prostate cancer is twice as common in 2000 (RR = 2.0)? • Yes, the rate is twice as high • BUT: answer is too simplistic if it is taken to mean that people in 2000 are at higher risk of developing prostate cancer.

  32. Standardization (2) • Concern is that the population in Canada is older in 2000 than in 1950. • Prostate cancer incidence increases with age. • Sound familiar? Age Calendar time Prostate cancer

  33. Standardization (3) • Changes in the age distribution of the Canadian population could confound any change in incidence over time. • Will make it appear that the population is at higher risk when it really isn’t • This is really a type of confounding. • For historical reasons, this issue is usually taught as a separate topic, often before confounding is introduced. • Approached through direct standardization or age adjustment.

  34. Standardization (4) • Remember stratified analysis? • Divide the sample into strata • Within each stratum, compute the OR/RR/etc. • Produce an average of the strata-specific estimates to adjust for the confounder. • Roughly, the same process is used for direct standardization.

  35. Standardization (5) Direct Standardization • Compute age-specific incidence in each study group. • Select a reference population (can be anything) • Multiply the age-specific incidence by the # of people in the reference population in that age stratum  ‘expected’ number of cases • Add up the ‘expected’ number and divide by the total size of the reference population. Age-adjusted rate for the study group. • Let’slook at an example

  36. Standardization (6) RRA:B= 2.35 Mean age: Area A = 63.4 yrs. Area B = 49.7 yrs.

  37. Standardization (7) RR in each age stratum (A vs. B) = 0.75 not 2.35

  38. Standardization (8) • Why the difference? • Area ‘B’ is a lot younger than area ‘A’. • Incidence increases with age. •  confounding by age. • Direct standardization • First, select reference population • Use combined population of area ‘A’ and area ‘B’.

  39. Standardization (9)

  40. Standardization (10) Area A adjusted incidence = 1,110/32,000 = 0.0347 Area B adjusted incidence = 1,480/32,000 = 0.0463 The adjusted RR (area A to B) = 0.75

  41. Standardization (11) • Adjustment has rendered the rates comparable by eliminating the confounding due to age. • There are more complex ways of doing this but this approach gives the basic ideas.

  42. Standardization (11) NO!! • Does it always work? • If the rate is higher in one area for younger age groups but lower for older ones, adjustment can give a misleading picture. • Do NOT treat adjusted rates as ‘real’ rates. • To estimate the burden of illness, you must use unadjusted rates. • What if the group has very few events? • SMR & indirect standardization.

  43. Standardization (12) Indirect Standardization • Used when the study group has few cases so the age-specific rates will be unstable (subject to wide chance variation). • Does not produce adjusted estimates. • Is used to compare study population to rates expected based on a large general population or reference population. • Rate taken from reference pop (unlike direct standardization). • Main statistic produced is the SMR (standardized mortality rate)

  44. Standardization (13)

  45. Standardization (14) Indirect Standardization (cont.) • SMR does not depend on the number of observed events in each age stratum. • Only the total # of observed events matters. • That is why it is useful when the number of cases is small. • Interpret an SMR similar to an RR or OR: < 1.0  protection 1.0  null value (no effect) > 1.0  increased risk

  46. Summary • Confounding is a very common problem • Try to prevent it through: • Restriction • Matching • Use statistical methods to adjust for it: • Stratified analysis • Matched analysis • Regression modeling

More Related