1 / 67

Testing reliability and validity in medical research

Testing reliability and validity in medical research. Moon Seok Park, MD Seoul National University Bundang Hospital. Reliability. 1 년 차 때 , 교수님이 “ 내일까지 X-ray 1000 장 재 봐서 결론 내 !!” 고 오더를 내리셔서 . 처음 재보는 각도 , 밤새 측정을 했다 . 힘들어서 인턴도 시켰다 . 제대로 했는지도 잘 모르겠다 . 그런데 , 결과는 의미 있게 나왔다 . OK!!.

brookswendy
Download Presentation

Testing reliability and validity in medical research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testing reliability and validity in medical research Moon Seok Park, MD Seoul National University Bundang Hospital

  2. Reliability

  3. 1년 차 때, 교수님이 “내일까지 X-ray 1000장 재 봐서 결론 내!!”고 오더를 내리셔서. • 처음 재보는 각도, 밤새 측정을 했다. 힘들어서 인턴도 시켰다. 제대로 했는지도 잘 모르겠다. • 그런데, 결과는의미 있게 나왔다. OK!!

  4. 두 개의 다른 방법으로 측정을 했을 때, 신뢰도를 알아 보려면 paired t-test로 하면 안 되는가? • Paired t-test는 어떨 때 쓰는 방법일까?

  5. Reliability • Extent to which scale items measure the same construct, with freedom of random error • 신뢰도 • 측정 시 마다 측정치가 비슷한가? • Test-retest reliability, Inter-rater reliability, Intra-rater reliability, Alternative form reliability, Internal consistency.

  6. Test-retest reliability • 주로 Psychometric analysis : 인터뷰, 설문지…. • 일정한 시간 간격을 두고, 같은 검사를 시행. • Cohen’s kappa, weighted kappa, Pearson’s correlation, Intraclass correlation coefficient(ICC). • Cf) Intra-rater(observer reliability) : 방사선 검사 계측…. • Memory contamination

  7. Inter-rater reliability • 전문가에 의한 인터뷰, scoring, 신체 계측, 방사선 계측. • 여러 명이 한 객체를 계측하여, 비슷한가 비교. • Cf) Agreement : 혼용되어 사용되지만, 특히 다른 기구를 이용한 측정, 예를 들어 MRI와 CT의 비교 등… • 방사선 계측 등에서는 intra- and inter-observer(rater) reliability를 set로. • Cohen’s kappa, weighted kappa, Pearson’s correlation, Intraclass correlation coefficient(ICC)

  8. Internal consistency • 이전의 reliability와는 조금 다른 의미. Psychometric analysis (설문지, 인터뷰) 등에 주로 국한 되어 사용. • Homogeneity • 가령 10개의 문항이 있다고 하면, 각각의 문항이 서로 비슷. • Item to item, Item to total, Cronbach’s alpha • Too high internal consistency = Item redundancy. • Cf) Uni-dimensionality, Item response theory, Rasch analysis(INFIT statistics)

  9. Question: which is reliable? 1 2 3 4

  10. What are the main measures of reliability? • What if the data are dichotomous or polychotomous? • Kappa coefficient • What if the data are quantitative (interval or ratio scale? • Intraclass Correlation Coefficient (ICC)

  11. ICC • Intraclass correlation coefficient • Reliability test for quantitative data

  12. Models of ICC • One-way random effect model • Raters: a random effect • Two-way random effect model • Raters: a random effect • Subjects: a random effect • Two-way mixed effect model • Raters: a fixed effect • Subjects: a random effect

  13. Types of ICC • Absolute agreement • Measures if raters assign the same absolute score • Consistency • Measures if raters’ scores are highly correlated even if they are not identical in absolute terms

  14. Measures of ICC • Single measures • Individual ratings constitute the unit of analysis • Average measures • The mean of all ratings is the unit of analysis

  15. ICC • Affected by true subject variability as well as measurement error

  16. Example • Measurement error • Data 1 = Data 2 • Subject variability • Data 1 < Data 2 Data 1 Data 2

  17. Shrout and Fleiss, 1979 • Propose 6 ICC types: ICC(1,1) ICC(2,1) ICC(3,1) ICC(1,k) ICC(2,k) ICC(3,k) } Expected Reliability of a Single Rater’s Rating } Expected Reliability of the Mean of a set of k Raters

  18. k (no.of observers), n (no.of targets)

  19. between-target mean square (BMS); within-target mean square(WMS); BMS represents true subject variability, and WMS represents measurement error

  20. Shrout and Fleiss, 1979 • Important issue in the choice of an appropriate index • Whether the ANOVA design should be one way or two way • Whether raters are considered fixed or random effects • Whether the unit of analysis is a single rater or the mean of several raters

  21. Pitfalls and important issues in testing reliability using ICC in orthopaedic research

  22. Literature review • Pubmed database • Orthopaedic articles that used ICC • Of the 92 articles identified, 58 (63%) did not clarify the ICC model used. • The model, types, and measures used were clearly declared in only 5 (5%)

  23. ICC of physical examinations • 30 patients with CP • Interobserver reliability of physical examinations using ICC • Popliteal angle • Thomas test • Staheli test Same dimension !! (joint angle)

  24. Simulated data

  25. Conclusion • ICC value could represent the opposite tendency to true measurement error (mean absolute difference) even when measuring similar dimension • ICC could be variable depending on the model used. • ICC value was affected by measurement error, subject variability, and slopes.

  26. 결론적으로 이렇게 해야.. • ICC values were large when measurement errors were small, subject variability large, and slopes parallel. • Clinical context need to be considered when interpreting ICC. • ICC setting should be declared.

  27. Validity

  28. Validity • Extent to which instruments is really measuring what it purpose to measures. • 보통 internal validity라고 이야기 한다. • Cf) external validity = generalisability

  29. Validity • Face validity • Content validity • Criterion(concurrent, predictive) validity • Construct(convergent, discriminant) validity

  30. Face validity • 안면 타당도(액면 타당도) • Content validity와 혼동될 수 있지만, 좀 더 추상적임. • 예를 들어 영어 시험의 문항에 수학 문제가 있으면, face validity에 문제가 있는 것. • 대게 저자들이 screening하는 정도로 표현.

  31. Content validity • 내용 타당도 • Face validity와 비슷하지만, 좀 더 systematic하게 분석. • 일정 수의 panel이 모여서 content validity를 scoring하여, 점수화 하고, 평균 점수가 미달이면 기각.

  32. Criterion validity • Concurrent validity : gold standard와 얼마나 비슷한가? • 방사선 지표를 측정한다. Gold standard로 생각하는 CT 측정치와 비교. • Cf) convergent validity. • Predictive validity

  33. Construct validity • 구인 타당도 • Convergent validity : 비슷한 지표(gold standard는 아님)와 상관관계가 있는가? • TEPS라는 영어시험을 만들었다. 타당도를 보려고, TOFLE과 상관관계를 보았다. (영어실력의 gold standard는 ?) • 사람이 측정한 방법과 컴퓨터가 측정한 방법에 상관 관계가 있는가? • Pearson correlation.

  34. Construct validity • Discriminant validity : 전혀 다른 것을 측정하는 지표와 상관 관계가 있는가? • 인성검사와 지능검사의 상관관계 • Cf) Known group validity : 확실히 다른 집단에서 다른 점수가 나오는가?

  35. Others • Precision • Responsiveness • Sensitivity • Specificity • Sensitivity analysis • Item response theory • Rasch analysis

  36. Introduction • Increased femoral anteversionand coxavalgaare common deformities associated with intoeing gait and unstable hips in CP, which need surgical correction.

  37. Introduction • Physical examination and neck shaft angle measured on hip radiographs are primary tools evaluating femoral anteversion and coxavalga.

  38. Introduction • Physical examinations measuring femoral anteversion include • Trochanteric prominence angle test (TPAT) • Hip internal rotation (IR) • Hip external rotation (ER)

  39. Introduction • CT measurement is accurate, but expensive and involves radiation exposure.

  40. Purpose of Study • To assess the validity and reliability of physical exams measuring femoral anteversion and neck shaft angle on hip X-ray • Concurrent validity • Intra- and interobserver reliability

  41. Reliable and valid Not reliable but valid Not reliable and not valid Reliable but not valid

  42. Materials and Methods • Prospective study approved by IRB • 36 consecutive patients with CP • Mean age 11.0 years (SD 1.3) • M : F = 26 : 10 • GMFCS I / II / III / IV / V 5 / 11 / 11 / 7 / 2 • Exclusion • Previous Op, trauma, infection, etc.

  43. Hip Internal Rotation • Prone position • Angle between vertical line & long axis of the leg • legs are rotated outward maximally

More Related