1 / 36

Understanding Variability in Medical Measurement Practice

This lecture explores biological and measurement variation affecting precision and validity in medical data, discussing measures of variation, agreement, and remedies for clinical variability.

bona
Download Presentation

Understanding Variability in Medical Measurement Practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EPI-820 Evidence-Based Medicine (EBM) LECTURE 2: MEDICAL MEASUREMENT Mat Reeves BVSc, PhD Department of Epidemiology Michigan State University

  2. Objectives: • 1. Understand biological and measurement variation and its effects on precision and validity. • 2. Understand the components of variability • biological and measurement • between- and within-person/observer • 3. Understand measures of variation and measures of agreement. • 4. Understand the calculation and application of K. • 5. Understand the consequences of variability in clinical data and possible remedies to ameliorate • 6. Understand regression to the mean.

  3. I. Variation in Clinical Data • 1. Biologic Variation= variation in the actual entity being measured • derives from the dynamic nature of physiology, homeostasis and pathophysiology. • within (intra-person) biologic variability and, • between (inter-person) biologic variability

  4. Within (day-to-day variation) and Between Person Biological Variation: Coefficient of Variation (%) (see Winkel et al, 1974) • VariableCV (Within)CV (Between) • Na 0.7% 0.8% • K 4.3% 4.3% • Cl 2.1% 1.2% • Ca 1.7% 2.8% • BUN 12.3% 16.4% • Creatinine 4.3% 9.5% • Cholesterol 5.3% 13.6% • SGOT (ALT) 24.2% 24.8% • TP 2.9% 5.7%

  5. I. Variation in Clinical Data • 2. Measurement Variation= variation due to the measurement process • inaccuracy of the instrument (instrument error), and/or, • inaccuracy of the person (operator error) • can introduce both random error andbias

  6. Analytical Variation - Coefficient of Variation (%) of Duplicate Samples • VariableCV (Analytical) • Na 1.1% • K 2.6% • Cl 2.1% • Ca 2.1% • BUN 2.2% • Creatinine 3.4% • Cholesterol 3.1% • SGOT (ALT) 7.3% • TP 1.7%

  7. Validity • Degree to which a measurement process measures what is intended i.e., accuracy. • Lack of systematic error or bias. • A valid instrument will, on average, be close to the underlying true value. • Assessment of validity requires a “gold standard” (a reference).

  8. What if no gold standard? (e.g., pain, nausea or anxiety) • Use instrument or clinical scale to measure a specific phenomenon or construct. • CriterionValidity - the degree to which the scale predicts a directly observable phenomenon e.g. APGAR score and neonatal survival. • Content Validity - the extent to which the instrument includes all of the dimensions of the construct being measured e.g. does APGAR include all relevant patho-physiological parameters? • Construct Validity - the degree to which the scale correlates with other known measures of the phenomenon e.g. how well does a new “Neonatal assessment scale” correlate with APGAR score?

  9. How do you measure validity? • Dichotomous data • sensitivity, specificity, and predictive values. • Continuous data • mean and standard deviation of the difference between surrogate measure and gold standard (see Bland and Altman, 1986).

  10. Precision(or reliability or reproducibility) • the extent that repeated measurements of a phenomenon tend to yield the same results (regardless of their accuracy!). • Precision refers to the lack of random error • Precision ~ 1 / random error

  11. Blood chloride level Left ventricular ejection volume Migraine severity 28-d stroke case-fatality rate Indirect costs of school absenteeism Direct costs of school absenteeism Degree of depression Alzheimer severity Self-reported ability to do domestic chores Self-reported ability to climb stairs Patient preferences for induced labour Self-reported assessment of health Hard versus Soft Data ?

  12. Hard versus Soft Data • No specific criteria to define “hard” data, attributes include: • Consistency: the ability to preserve basic evidence (repeated observations are consistent) (most important attribute). • Objectivity: observations are free of subjective influences. • Quantifiable: the ability to express the result as a number.

  13. Hard versus Soft Data • Usually hard data are numeric measures, such as lab data, but not always (e.g., histology, cancer stage) • Hard (numeric) data preferred to softer (qualitative) measures because they are more objective and reliable? (but see Feinstein AR et al, 1985, Will Rogers phenomenon)

  14. Between and Within Person Variation • Four categories of clinical variability: • 1. Between-person biological variability • 2. Within-person biological variability • 3. Between-observer measurement variability • 4. Within-observer measurement variability

  15. ANOVA Model Conceptualization • yijkl = i + ij + ik + il • where: • yijk = the observed measurement for individual i, measured at time j, by the kth observer at the lth replication. • i = individuals usual true mean (between person biological variation) • ij = perturbation due to biological variation at time j (within person biologic variation). • ik = perturbation due to measurement error by the kth observer (between observer measurement variation). • il = perturbation due to measurement error at the lth replication (within observer measurement variation).

  16. II. Statistical aspects of variability • A. Measures of Variation • 1. Variance and Standard Deviation • SD = absolute value of average differences of individual values from the overall mean. • CLT = 68%, 95%, 99% • Example: • Av. US Cholesterol = 220 mg/dl, SD = 15 mg/dl • Indv. readings expected to vary 190-250 mg/dl

  17. A. Measures of Variation • 2. Co-efficient of Variation (CV) • represents the % variation of a set of measurements around their mean • conceptualized as a “noise-to-signal ratio” • useful index for comparing the precision of different instruments, individuals and/or laboratories. %

  18. B. Measures of Agreement • 1. Correlation (r) • Pearson product moment correlation and Spearman’s rank correlation • measures the degree of linear relationship between two variables (-1, +1) • correlation between two sets of continuous measurements (= reliability) or extent of replication

  19. 1. Correlation (Cont’d) • Two observers, same time period = inter-rater reliability. • Single observer, two time periods = intra-rater reliability (test-retest reliability). • Can have very high values of r, but little direct agreement between raters or instruments. • Can only be used as a test of validity if the actual true values are known.

  20. B. Measures of Agreement • Intra-class Correlation Coefficient (R or reliability) • a measure of reliability for continuous or quantitative data • an observed value (X) consists of two parts: • X = T + e • where: • T = the “True” unknown level or “error-free” score or “steady state” or “signal” • e = error (whether “biologic” or “measurement” error) • true error-free value varies about some unknown mean () with a variance of 2T.

  21. 2. R (Cont’d) • error term is regarded as iid ( = 0, 2e ). • Variance of X (2x ) = 2T + 2e • relative size of error variance (2e) in relation to variance of true value (2T ) is a measure of the imprecision. • R = 2T. 2T + 2e • R = the proportion of the total variance due to subject-to-subject (or between-person) variability in the “true” value. • As random error decreases, the value of R increases

  22. 2. Categorical data – Kappa (K) • A measure of reliability for categorical or qualitative data. • Kappa corrects for the degree of chance in the overall level of agreement, and is preferred over other measures (like overall percent agreement). • K = Po - Pe = Actual agreement beyond chance 1 - Pe Potential agreement beyond chance • Po = the total proportion of observations on which there is agreement • Pe = the proportion of agreement expected by chance alone.

  23. OBSERVER B OBSERVER A Yes No TOTALS Yes a b f1 No c d f2 TOTALS n1 n2 N Agreement matrix for kappa statistic (inter-rater agreement, 2 observers, dichotomous data)

  24. OBSERVER B OBSERVER A Yes No TOTALS Yes 69 15 84 No 18 48 66 TOTALS 87 63 150 Agreement matrix for kappa statistic (2 observers, dichotomous data)

  25. K (Cont’d) • Observed agreement (Po) = 78% • (69 + 48)/150 = 0.78 or 78%. • Agreement expected dt chance (Pe) = 51%. • Calculated by the product of the marginal totals for cells a and d [87 x 84/150 = 48.75 + 63 x 66/150 = 27.72] • Then divide sum [76.47] by 150 to get Pe = 0.51 or 51%.

  26. K (Cont’d) • K = Po - Pe = 0.78 - 0.51 = 0.27 = 0.55 or 55% 1 - Pe 1 - 0.51 0.47 • Kappa varies from -1 to +1, with a value of zero denoting agreement no better than chance (negative values denotes agreement worse than chance!) • Value of kStrength of agreement <0 Poor0 - 0.20 Slight0.21 - 0.40 Fair0.41 - 0.60 Moderate0.61 - 0.80 Substantial0.81 - 1.0 Almost perfect

  27. K - Issue of Prevalence • The prevalence of condition affects the likelihood that observers will agree purely due to chance - hence the importance of using kappa.Example: • Observer A classified 120/150 patients • Observer B classified 130/150 patients • Pe is now 72%.

  28. K - More Complicated Scenarios • Overall (summary) kappa: • several observers or raters and/or where the subjects are classified into several different categories. • Weighted kappa: • measuring the relative degree of disagreement when subjects are classified into several ordinal categories (e.g., normal, slightly abnormal and very abnormal). • MacClure and Willett (1987): • Use kappa for dichotomous data or nominal polytomous data only. • For ordinal data use either Spearman’s rank correlation or R.

  29. IV. Consequences of variability of clinical data • A. Clinical impact • Errors in diagnosis, prognosis and even treatment. • Clinical disagreement between clinicians. • B. Research Impact • Between-person biological variability is a prerequisite for etiologic studies. • Random within-person variability (a form unreliability) results in non-differential misclassification - with a resulting dilution or attenuation of effect.

  30. B. Research impact • Generally, imprecision has less impact in research setting than individual clinical setting because can average over a large number of observations (but still require measure to be valid). • Variability and misclassification result in the need for larger samples sizes (and increased costs). • Measurement errors can introduce bias if they do not occur at random - non-differential misclassification

  31. Regression Dilution Bias • Example: MacMahon et al., (1990) • imprecision resulting from a single measurement of diastolic blood pressure resulted in a 60% attenuation of RR’s (for the effect of elevated blood pressure on stroke and MI). • “regression dilution bias”.

  32. C. Regression towards the mean • Group of individuals selected based on the results of an “abnormal” test can be divided into: • a) those with a true underlying abnormal value, and • b) those with a true underlying normal value (but random fluctuations resulted in an outlying [abnormal] value). • On retesting, patients in group b are closer to their typical (normal) values, so, the overall mean is less extreme (= regression to the mean). • Occurs when repeated observations are performed on a variable that is inherently variable.

  33. C. RTTM • Often interpreted as a sign of clinical improvement, regardless of effectiveness of treatment (an important explanation for the placebo effect) • If first reading is d units higher than the true value (), then on average, the next value will be closer to the mean by d(1 - r) units, • where r is the correlation between the two measurements • RTTM increases if d is large and r is small. • RTTM is a general tendency for describing the average behaviour of a group, not necessarily individuals!!

  34. V. Remedies for variability of clinical data • A. Within-person biologic variation • Standardized measurements: use a standard protocol i.e., time of day, body position etc. • Average repeated tests e.g., take several blood pressure reading. • Use a less variable test e.g., for diabetes use glycosolated Hb, rather than blood glucose. • Plot the data - what is the trend? • Develop reference values for each individual - especially if: • within-person variability <<< between-person variability • this results in a wide reference range which makes it difficult to identify individual deviations • e.g., body weight, PSA, EKG

  35. B. Measurement Error • Measurement imprecision corrected by adjusting the machine or re-training the tester, (or, average several values?). • Measurement error that causes bias requires quality assurance testing. Fix by re-calibration (don’t average!!).

  36. Sackett - Six strategies for preventing or minimizing clinical disagreements • 1. Match diagnostic environment to the diagnostic task. • 2. Corroborate key findings by: • repeating observations and questions • confirm information with other sources (e.g., family members) • confirm key findings using appropriate diagnostic tests • seek confirmation from “blinded” colleagues • 3. Report actual findings then report inference • 4. Use appropriate technical aids to avoid imprecision (e.g., ruler). • 5. “Blinded” assessments of diagnostic findings. • 6. Apply skills of social sciences • establish understanding, follow a logical order, listen, observe, interrupt only where necessary).

More Related