1 / 49

Clinical Research:

Clinical Research:. Sample Measure (Intervene) Analyze Infer. A study can only be as good as the data . . . -J.M. Bland i.e. no matter how brilliant your study design or analytic skills you can never overcome poor measurements. .

ciro
Download Presentation

Clinical Research:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clinical Research: Sample Measure (Intervene) Analyze Infer

  2. A study can only be as good as the data . . . -J.M. Bland i.e. no matter how brilliant your study design or analytic skills you can never overcome poor measurements.

  3. Understanding Measurement: Aspects of Reproducibility and Validity • Reproducibility vs validity • Impact of reproducibility on validity & statistical precision • Assessing reproducibility of interval scale measurements • within-subject standard deviation • coefficient of variation • (Next week’s Section: assessing validity of interval scale measurements)

  4. Measurement Scales

  5. Reproducibility vs Validity • Reproducibility • the degree to which a measurement provides the same result each time it is performed on a given subject or specimen • less than perfect reproducibility caused by random error • Validity • from the Latin validus - strong • the degree to which a measurement truly measures (represents) what it purports to measure (represent) • less than perfect validity is fault of systematic error

  6. Reproducibility vs Validity • Reproducibility • aka: reliability, repeatability, precision, variability, dependability, consistency, stability • “Reproducibility” is most descriptive term: “how well can a measurement be reproduced” • Validity • aka: accuracy

  7. Vocabulary for Error

  8. Reproducibility and Validity of a Measurement Consider having 5 replicates Good Reproducibility Poor Validity Poor Reproducibility Good Validity

  9. Reproducibility and Validity of a Measurement Good Reproducibility Good Validity Poor Reproducibility Poor Validity

  10. Why Care About Reproducibility? Impact on Validity of Inferences Derived from Measurement (and later: Impact of Precision of Inferences) • Consider a study of height and basketball shooting ability: • Assume height measurement: imperfect reproducibility • Imperfect reproducibility means that if we measure height twice on a given person, most of the time we get two different values; at least 1 of the 2 values must be wrong (imperfect validity) • If study measures everyone only once, errors, despite being random, will lead to biased inferences when using these measurements (i.e. inferences lack validity)

  11. Impact of Reproducibility on Statistical Precision • Classical Measurement Theory: observed value (O) = true value (T) + measurement error (E) If we assume E is random and normally distributed: E ~ N (0, 2E) .06 .04 Fraction .02 0 -3 -2 -1 0 1 2 3 error Error

  12. Impact of Reproducibility on Statistical Precision • Assume: observed value (O) = true value (T) + measurement error (E) E is random and ~ N (0, 2E) • Then, when measuring a group of subjects, the variability of observed values (2O ) is a combination of: the variability in their true values (2T ) and the variability in the measurement error (2E) 2O =2T + 2E

  13. Why Care About Reproducibility? 2O =2T + 2E • More measurement error means more variability in observed measurements • e.g. measure height in a group of subjects. • If no measurement error • If measurement error Distribution of observed height measurements Frequency Height

  14. More variability of observed measurements has important influences on statistical precision/power 2O =2T + 2E • Descriptive studies: wider confidence intervals • Analytic studies (Observational/RCT’s): power to detect an exposure (treatment) difference is reduced truth truth + error truth truth + error

  15. Mathematical Definition of Reproducibility • Reproducibility • Varies from 0 (poor) to 1 (optimal) • As 2Eapproaches 0 (no error), reproducibility approaches 1

  16. Power Simulation study looking at the association of a given risk factor and a certain disease. Truth is an odds ratio= 1.6 R= reproducibility of risk factor measurement Power: probability of estimating a risk ratio within 15% of 1.6 Phillips and Smith, J Clin Epi 1993

  17. Taking the average of many replicates of a measurement with poor reproducibility can result in a highly valid measurement Good Reproducibility Poor Validity Poor Reproducibility Good Validity

  18. Sources of Random Measurement Error: What contributes to 2E ? • Observer (the person who performs the measurement) • within-observer (intrarater) • between-observer (interrater) • Instrument • within-instrument • between-instrument • Importance of each varies by study

  19. Sources of Measurement Error • e.g., plasma HIV viral load • observer: measurement to measurement differences in tube filling, time before processing • instrument: run to run differences in reagent concentration, PCR cycle times, enzymatic efficiency

  20. Within-Subject Biologic Variability • Although not the fault of the measurement process, moment-to-moment biological variability can have the same effect as errors in the measurement process • Recall that: • observed value (O) = true value (T) + measurement error (E) • Assume, for biological variables with intrinsic variability • True value = the average of measurements taken over time • E is difference in any one value from the average value • Moment-to-moment biologic variability increases the variability in the error term and increase overall variability: 2O =2T + 2E

  21. error

  22. Assessing Reproducibility Depends on measurement scale • Interval Scale • within-subject standard deviation and derivatives • coefficient of variation • Categorical Scale • Kappa (see Clinical Epidemiology course) • (can be used for both predictors and outcomes)

  23. Reproducibility of an Interval Scale Measurement: Peak Flow • Assessment requires >1 measurement per subject • Peak Flow in 17 adults (Bland & Altman)

  24. Assessment by Simple Correlation and Correlation Coefficients?

  25. Don’t Use Simple Correlation for Assessment of Reproducibility • Too sensitive to range of data • correlation is always higher for greater range of data • Depends upon ordering of data • get different corr. coeff. depending upon classification of meas 1 vs 2 • Importantly: It measures linear association only • it would be amazing if the replicates weren’t related • association is not the relevant issue; agreement is

  26. Final Limitation of Simple Correlation for Assessment of Reproducibility • Gives no meaningful parameter using the same scale as the original measurement • Cannot evaluate in substantive (clinical) terms • What does correlation coefficient = 0.7 vs 0.8 vs 0.9 mean in the context of peak flow data which ranges from 200 to 600?

  27. Special Note on the Intraclass Correlation Coefficient (ICC) • ICC • Overcomes many of the limitations of the simple (Pearson) correlation coefficient • However, still does not portray reproducibility on the same unit scale as the measurement • (Calculation explained in S&N Appendix)

  28. Within-Subject Standard Deviation • Common (or mean) within-subject standard deviation (sw) = 15.3 l/min

  29. What can be done with the within-subject standard deviation (sw)? We would like to know: • Just how different could two measurements taken on the same individual be -- from random measurement error alone? • Begins to give sense of how small of a difference: • between two or more groups, or • within a given person before/after an intervention you could detect with adequate statistical power with the measurement

  30. Further work with swHow different might two measurements appear to be from random error alone? • Difference between any 2 replicates for same person = difference = meas1 - meas2 • Because var(diff) = var(meas1) + var(meas2), therefore, s2diff = sw2 + sw2 = 2sw2 sdiff

  31. Distribution of Differences Between Two Replicates • If assume that differences between two replicates: • arenormally distributed and mean of differences is 0 • sdiff estimates standard deviation of differences • The difference between 2 measurements for the same subject is expected to be less than (1.96)(sdiff) = (1.96)(1.41)sw = 2.77swfor 95% of all pairs of measurements xdiff 0 sdiff (1.96) (sdiff)

  32. 2.77sw = Repeatability Value • For Peak Flow data: • The difference between 2 measurements for the same subject is expected to be less than 2.77sw =(2.77)(15.3) = 42.4 l/min for 95% of all pairs • i.e. the difference between 2 replicates may be as much as 42.4 l/min just by random measurement error alone. • 42.4 l/min termed (by Bland-Altman): “repeatability” or “repeatability coefficient” of measurement

  33. Interpreting the “Repeatability” Value: Is 42.4 liters a lot? Depends upon the context Clinical management • If other gold standards exist that are more reproducible, and: • differences < 42.4 are clinically relevant, then 42.4 is bad • differences < 42.4 not clinically relevant, then 42.4 not bad • If no gold standards, probably unwise to consider differences as much as 42.4 to represent clinically important changes • would be valuable to know “repeatability” for all clinical tests Research • Depends upon the differences in peak flow you hope to detect • If ~40, you’re in trouble • If several hundred, then not bad

  34. One Common Underlying sw • Appropriate only if there is one sw • i.e, sw does not vary with true underlying value correlation coefficient = 0.17, p = 0.36 40 30 Within-Subject Std Deviation 20 Bland-Altman approach: plot mean by difference (or standard deviation) 10 0 100 300 500 700 Subject Mean Peak Flow

  35. Another Interval Scale Example • Salivary cotinine in children (Bland-Altman) • n = 20 participants measured twice

  36. Cotinine: Absolute Difference vs. Mean correlation = 0.62, p = 0.001 4 3 Subject Absolute Difference 2 1 0 0 2 4 6 Subject Mean Cotinine

  37. Logarithmic (base 10) Transformation

  38. Log10 Transformed: Absolute Difference vs. Mean correlation = 0.07 p=0.7 .6 .4 Subject abs log diff .2 0 -1 -.5 0 .5 1 Subject mean log cotinine

  39. sw for log-transformed cotinine data • sw • because this is on the log scale, it refers to a multiplicative factor and hence is known as the geometric within-subject standard deviation • it describes variability in ratio terms (rather than absolute numbers)

  40. “Repeatability” of Cotinine Measurement • The difference between 2 measurements for the same subject is expected to be less than a factor of (1.96)(sdiff) = (1.96)(1.41)sw = 2.77sw for 95% of all pairs of measurements • For cotinine data, sw= 0.175 log10, therefore: • 2.77*0.175 = 0.48 log10 • back-transforming, antilog(0.48) = 10 0.48 = 3.1 • For 95% of all pairs of measurements, the ratio between the measurements may be as much as 3.1 fold (this is “repeatability”)

  41. Coefficient of Variation • For cotinine data, the within-subject standard deviation (on the native scale) varies with the level of the measurement • If the within-subject standard deviation is proportional to the level of the measurement, this can be summarized as: coefficient of variation = = 1.49 -1 = 0.49 • At any level of cotinine, the within-subject standard deviation of repeated measures is 49% of the level

  42. Coefficient of Variation for Peak Flow Data • By definition, when the within-subject standard deviation is not proportional to the mean value, as in the Peak Flow data, then there is not a constant ratio between the within-subject standard deviation and the mean. • Therefore, there is not one common coefficient of variation • Estimating the the “average” coefficient of variation (within-subject sd/overall mean) is not meaningful

  43. Peak Flow Data: Use of Coefficient of Variation when sw is Constant Could report a family of CV’s but this is tedious

  44. Assessing Validity • Measures can be assessed for validity in 3 ways: • Content validity • Face • Sampling • Construct validity • Criterion validity (aka empirical; when gold standards are present) • Concurrent (concurrent gold standards present) • Interval scale measurement: 95% limits of agreement • Categorical scale measurement: sensitivity & specificity • Predictive (gold standards present in future)

  45. Assessing Validity of Interval Scale Measurements - When Gold Standards are Present • Use similar approach as when evaluating reproducibility • Examine plots of within-subject differences by the mean of the two approaches (Bland-Altman plots) • Determine mean within-subject difference • Determine range of within-subject differences - aka “95% limits of agreement” • Practice in next week’s Section

  46. Conclusions • Measurement reproducibility plays a key role in determining validity and statistical precision in our different study designs • When assessing reproducibility, for interval scale measurements: • avoid correlation coefficients • use within-subject standard deviation and derivatives like “repeatability” • (For categorical scale measurements, use Kappa) • What is acceptable reproducibility depends upon desired use • Assessment of validity depends upon whether or not gold standards are present, and can be a challenge when they are absent

  47. Measurement in Clinical ResearchEpi 225; Fall QuarterA. Stewart, Ph.D., Course Director • Conceptualizing health and its determinants and developing one’s own conceptual framework • Measurement terminology and locating measures • Classical methods of scale construction • Psychometric characteristics I: variability, reliability, and interpretability • Psychometric characteristics II: validity and bias, responsiveness and sensitivity to change • Choosing measures and pretesting • Creating a questionnaire and questionnaire guides • Issues in research with diverse populations including health disparities research • Adapting measures, steps in creating and testing scale scores, and presenting measurement data

More Related