670 likes | 707 Views
Testing reliability and validity in medical research. Moon Seok Park, MD Seoul National University Bundang Hospital. Reliability. 1 년 차 때 , 교수님이 “ 내일까지 X-ray 1000 장 재 봐서 결론 내 !!” 고 오더를 내리셔서 . 처음 재보는 각도 , 밤새 측정을 했다 . 힘들어서 인턴도 시켰다 . 제대로 했는지도 잘 모르겠다 . 그런데 , 결과는 의미 있게 나왔다 . OK!!.
E N D
Testing reliability and validity in medical research Moon Seok Park, MD Seoul National University Bundang Hospital
1년 차 때, 교수님이 “내일까지 X-ray 1000장 재 봐서 결론 내!!”고 오더를 내리셔서. • 처음 재보는 각도, 밤새 측정을 했다. 힘들어서 인턴도 시켰다. 제대로 했는지도 잘 모르겠다. • 그런데, 결과는의미 있게 나왔다. OK!!
두 개의 다른 방법으로 측정을 했을 때, 신뢰도를 알아 보려면 paired t-test로 하면 안 되는가? • Paired t-test는 어떨 때 쓰는 방법일까?
Reliability • Extent to which scale items measure the same construct, with freedom of random error • 신뢰도 • 측정 시 마다 측정치가 비슷한가? • Test-retest reliability, Inter-rater reliability, Intra-rater reliability, Alternative form reliability, Internal consistency.
Test-retest reliability • 주로 Psychometric analysis : 인터뷰, 설문지…. • 일정한 시간 간격을 두고, 같은 검사를 시행. • Cohen’s kappa, weighted kappa, Pearson’s correlation, Intraclass correlation coefficient(ICC). • Cf) Intra-rater(observer reliability) : 방사선 검사 계측…. • Memory contamination
Inter-rater reliability • 전문가에 의한 인터뷰, scoring, 신체 계측, 방사선 계측. • 여러 명이 한 객체를 계측하여, 비슷한가 비교. • Cf) Agreement : 혼용되어 사용되지만, 특히 다른 기구를 이용한 측정, 예를 들어 MRI와 CT의 비교 등… • 방사선 계측 등에서는 intra- and inter-observer(rater) reliability를 set로. • Cohen’s kappa, weighted kappa, Pearson’s correlation, Intraclass correlation coefficient(ICC)
Internal consistency • 이전의 reliability와는 조금 다른 의미. Psychometric analysis (설문지, 인터뷰) 등에 주로 국한 되어 사용. • Homogeneity • 가령 10개의 문항이 있다고 하면, 각각의 문항이 서로 비슷. • Item to item, Item to total, Cronbach’s alpha • Too high internal consistency = Item redundancy. • Cf) Uni-dimensionality, Item response theory, Rasch analysis(INFIT statistics)
Question: which is reliable? 1 2 3 4
What are the main measures of reliability? • What if the data are dichotomous or polychotomous? • Kappa coefficient • What if the data are quantitative (interval or ratio scale? • Intraclass Correlation Coefficient (ICC)
ICC • Intraclass correlation coefficient • Reliability test for quantitative data
Models of ICC • One-way random effect model • Raters: a random effect • Two-way random effect model • Raters: a random effect • Subjects: a random effect • Two-way mixed effect model • Raters: a fixed effect • Subjects: a random effect
Types of ICC • Absolute agreement • Measures if raters assign the same absolute score • Consistency • Measures if raters’ scores are highly correlated even if they are not identical in absolute terms
Measures of ICC • Single measures • Individual ratings constitute the unit of analysis • Average measures • The mean of all ratings is the unit of analysis
ICC • Affected by true subject variability as well as measurement error
Example • Measurement error • Data 1 = Data 2 • Subject variability • Data 1 < Data 2 Data 1 Data 2
Shrout and Fleiss, 1979 • Propose 6 ICC types: ICC(1,1) ICC(2,1) ICC(3,1) ICC(1,k) ICC(2,k) ICC(3,k) } Expected Reliability of a Single Rater’s Rating } Expected Reliability of the Mean of a set of k Raters
between-target mean square (BMS); within-target mean square(WMS); BMS represents true subject variability, and WMS represents measurement error
Shrout and Fleiss, 1979 • Important issue in the choice of an appropriate index • Whether the ANOVA design should be one way or two way • Whether raters are considered fixed or random effects • Whether the unit of analysis is a single rater or the mean of several raters
Pitfalls and important issues in testing reliability using ICC in orthopaedic research
Literature review • Pubmed database • Orthopaedic articles that used ICC • Of the 92 articles identified, 58 (63%) did not clarify the ICC model used. • The model, types, and measures used were clearly declared in only 5 (5%)
ICC of physical examinations • 30 patients with CP • Interobserver reliability of physical examinations using ICC • Popliteal angle • Thomas test • Staheli test Same dimension !! (joint angle)
Conclusion • ICC value could represent the opposite tendency to true measurement error (mean absolute difference) even when measuring similar dimension • ICC could be variable depending on the model used. • ICC value was affected by measurement error, subject variability, and slopes.
결론적으로 이렇게 해야.. • ICC values were large when measurement errors were small, subject variability large, and slopes parallel. • Clinical context need to be considered when interpreting ICC. • ICC setting should be declared.
Validity • Extent to which instruments is really measuring what it purpose to measures. • 보통 internal validity라고 이야기 한다. • Cf) external validity = generalisability
Validity • Face validity • Content validity • Criterion(concurrent, predictive) validity • Construct(convergent, discriminant) validity
Face validity • 안면 타당도(액면 타당도) • Content validity와 혼동될 수 있지만, 좀 더 추상적임. • 예를 들어 영어 시험의 문항에 수학 문제가 있으면, face validity에 문제가 있는 것. • 대게 저자들이 screening하는 정도로 표현.
Content validity • 내용 타당도 • Face validity와 비슷하지만, 좀 더 systematic하게 분석. • 일정 수의 panel이 모여서 content validity를 scoring하여, 점수화 하고, 평균 점수가 미달이면 기각.
Criterion validity • Concurrent validity : gold standard와 얼마나 비슷한가? • 방사선 지표를 측정한다. Gold standard로 생각하는 CT 측정치와 비교. • Cf) convergent validity. • Predictive validity
Construct validity • 구인 타당도 • Convergent validity : 비슷한 지표(gold standard는 아님)와 상관관계가 있는가? • TEPS라는 영어시험을 만들었다. 타당도를 보려고, TOFLE과 상관관계를 보았다. (영어실력의 gold standard는 ?) • 사람이 측정한 방법과 컴퓨터가 측정한 방법에 상관 관계가 있는가? • Pearson correlation.
Construct validity • Discriminant validity : 전혀 다른 것을 측정하는 지표와 상관 관계가 있는가? • 인성검사와 지능검사의 상관관계 • Cf) Known group validity : 확실히 다른 집단에서 다른 점수가 나오는가?
Others • Precision • Responsiveness • Sensitivity • Specificity • Sensitivity analysis • Item response theory • Rasch analysis
Introduction • Increased femoral anteversionand coxavalgaare common deformities associated with intoeing gait and unstable hips in CP, which need surgical correction.
Introduction • Physical examination and neck shaft angle measured on hip radiographs are primary tools evaluating femoral anteversion and coxavalga.
Introduction • Physical examinations measuring femoral anteversion include • Trochanteric prominence angle test (TPAT) • Hip internal rotation (IR) • Hip external rotation (ER)
Introduction • CT measurement is accurate, but expensive and involves radiation exposure.
Purpose of Study • To assess the validity and reliability of physical exams measuring femoral anteversion and neck shaft angle on hip X-ray • Concurrent validity • Intra- and interobserver reliability
Reliable and valid Not reliable but valid Not reliable and not valid Reliable but not valid
Materials and Methods • Prospective study approved by IRB • 36 consecutive patients with CP • Mean age 11.0 years (SD 1.3) • M : F = 26 : 10 • GMFCS I / II / III / IV / V 5 / 11 / 11 / 7 / 2 • Exclusion • Previous Op, trauma, infection, etc.
Hip Internal Rotation • Prone position • Angle between vertical line & long axis of the leg • legs are rotated outward maximally