1 / 65

Bias and Confounding in Information Accuracy

Explore the concepts of bias and confounding in data collection, analysis, and interpretation, focusing on information accuracy. Learn how misclassification and recall bias can impact study results and techniques to reduce information bias.

Download Presentation

Bias and Confounding in Information Accuracy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Precision and Validity Information Bias Dr. Jørn Olsen Epi 200B January 21 and 26, 2010

  2. Bias and confounding (Last, Dictionary) Bias: Deviation of results or inference from truth, or processes leading to such deviations. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth. 2

  3. Bias and confounding (Last, Dictionary) Confounding: A situation in which the effect of two processes are not separated. Confounder, confounding factor, confounding variable-Poor term, confounding is study specific. No variables are always confounders. 3

  4. Dictionary; IEA/Last: • Information bias (observational bias): A flaw in measuring exposure or outcome data that results in different quality (accuracy) of information between comparisons groups

  5. Information Bias and Other Method Problems • Information: exposures, end points, confounders, modifiers • For discrete variables: classification error/misclassification • Differential/non-differential information bias

  6. Data accuracy • Data are almost never 100% accurate • Coding errors, measurement errors • We ask questions that cannot be answered correctly-exposed to ETS last year

  7. Non-differential – does not depend upon the value of other variables Example – diagnosing has the same sensitivity and specificity among exposed and non-exposed. Or, exposure is reported with the same sensitivity and specificity among cases and controls

  8. Non-differential misclassification better than differential • Non-differential misclassification can often be achieved in follow-up studies • Exposures are recorded prior to disease occurrence • Diseases may be recorded by doctors who do not ask about exposures

  9. Recall bias misclassification of the exposure A serious problem in case control studies or cross sectional studies based upon recall

  10. Recall bias Hungarian case-control surveillance of congenital abnormalities (Epidemiology 2001; 12: 461-66.) Drug use = self-reported data (interview, memory aids) = log-book: medicine prescribed by ANC doctors Sensitivity a/(a+c) Specificity d/(b+d)

  11. A low sensitivity is expected if mothers provide a complete recall since only ANC prescribed drugs are in the log book.

  12. Short-term drugs

  13. Long-term drugs

  14. What to do to reduce differential information bias? • Use blinding if possible-”blind till it hurts” Cochrane. • Use of hospital controls may, in some cases, help to reduce information bias. • The disease used to identify the comparison group must NOT be associated with the exposure under study (must not be a cause or a preventive factor).

  15. For case-control studies • First study is important • No disclosure of study hypothesis • Use biomarkers of exposure if possible • Use secondary data collected prior to the disease • Use neutral interviewers

  16. Differential misclassification of the endpoint: sometimes a problem in follow-up studies

  17. Is this follow-up study vulnerable to differential misclassification of DVT?

  18. Follow-up studies are usually less vulnerable to differential recall bias because the exposure is recorded before the end point, but knowing the hypothesis may introduce bias if the exposure is a suspected cause of the disease under study. Blind the clinicians, if possible.

  19. It is often stated that non-differential misclassification leads to bias towards no association (RR = IRR = OR = 1, RD = IRD = 0) First argument for that was provided by Bross in the 1950’s. Non differential misclassification is not the same as random misclassification (random is only non-differential in the long run). Random misclassification (blinding) can be very differential by chance in a small study.

  20. P = proportion of smokers; Pl and Pr l = Lung cancer r = reference

  21. TP = P x sens FN = P x (1-sens) FP = (1-P) (1-spec) TN = (1-P) spec

  22. If we take interest in the difference between Pl and Pr, D = Pl – Pr (normally we would take an interest in exposure odds-for example)

  23. We are only able to estimate Pl and Pr, and then Include D = Pl – Pr and in case of non-diff. miscl. FPL = FPr = FP FNL = FNr = FN

  24. Then = D (1– (FN + FP)) (check it out) Meaning ≠ D if FN and FP ≠ 0 (sens + spec < 2) FN + FP < 1.0 D < D (but same sign) FP + FN = 1.0 D = 0 (like flipping a coin) FN + FP = 2 D = -D (coding!) Also true for ORs ^ ^ ^

  25. Non differential misclassification of a dichotomous variable will, in most cases, bias values towards no association (but there are other sources of error in a study and the combined effect may be away from the null) Non differential misclassification of a variable with more than two categories can cause bias away from the null but mainly in rather unusual situations Misclassification of a confounder can cause bias in any direction.

  26. When estimating relative effect measures a high specificity is wanted. True cohort data If sensitivity is 0.8 but specificity is 1

  27. If sensitivity is 1 but specificity is 0.80

  28. If sensitivity is 0.8 and specificity is 0.9

  29. The corresponding case-cohort studies would produce the following (similar) results (if done right in this situation as a case-cohort study).

  30. The corresponding case-cohort studies would produce the following (similar) results

  31. If we get a reference pathologist to eliminate all FP cases, we would get (for the last table)

  32. Adjusting for misclassification is possible if sens and spec are known

  33. Example sens = 0.44 spec = 0.94; based upon comparison with “Golden Standard” – clinical diagnosing

  34. Exp P (M) = (350/1777 + 0.94 – 1) / (0.44 + 0.94 – 1) = 0.360 (640 with the disease) Exp P (F) = (277/2064 + 0.94 – 1) / (0.44 + 0.94 – 1) = 0.195 (403 with the disease) In case of differential misclassification, use sex specific sens and spec = 1.85

  35. Misclassification of a confounder may bias a result in any direction (Greenland & Robins. Am J Epidemiol 1985:122;495-506) Let this be the true data:

  36. The confounder has an effect (OR=2) The exposure has no effect (OR=1)

  37. Now assume exposure and disease status is recorded without error. Only the confounder is non-differential misclassified (sens=0.8 and spec=0.9), we then get:

  38. When stratifying on the confounder True data

  39. Miscl data

  40. Misclassification is likely if we ask for sensitive data (alcohol intake), if we ask for data that can not be easily recalled like diet, if the relevant time window is short (teratology), if we give little attention to the data collection or perhaps if we give too much attention to the data collection.

  41. Regression towards the mean. Misclassification for a group of people because we over sample large random errors. This selection leads to misclassification. IQ = IQ + ε Σε = 0 for all in the study but not for those selected from extreme parts of the distribution (Σε > 0). Their measured IQs may be unusual because their IQs are unusual or because their measurement errors were large, or both. In a new round of measuring IQ one would expect Σε to be zero (at least closer to 0). IQ ^

  42. Regression towards the mean comes in many different forms. Assume you want to predict PTB and collect data on a number of potential risk factors. • You select those who have the highest RR and claim you can predict 60% of PTB using these markers. When you apply these ‘predictors’ in a new data source, you are in for a disappointment, why?

  43. Misclassification has an impact on estimates of effect sizes and power A smaller study with better quality data may be preferable than a large study with poor quality data Use blinding to avoid differential misclassification Estimate misclassification/repeated measures

  44. Capture – recapture to estimate completeness of recording (the degree of underreporting). If you have two different data sources (parental reporting of febrile seizures and hospitalizations for febrile seizures) you may be able to estimate these data sources actual coverage

  45. The arguments come from biologists and go like this: You want to know the number of salmon in a given lake; you can empty the lake and count all salmons. Or 1. You catch some salmon (M1) in the lake and give them a mark and throw them back into the lake 2. You make another catch of salmon (M2) and note how many had the mark (were caught in the first catch) M3 3. Now you know M1, M2 and M3 and you are ready to estimate the total number of salmon in the lake, N.

  46. M1 x M2 N M1 x M2 M3 P1 (first catch) M1/N P2 (second catch) M2/N M3 = N x P1 x P2 = N x M1/N x M2/N M3 = N =

  47. Say, in our study, we had parental reports for 100 children with FS and 75 hospital reports. Our estimate of the total number of children with FS in the study would be (if 50 were registered with FS both places) (100 x 75)/50 = 150

More Related