430 likes | 447 Views
On Some Statistical Aspects of Agreement Among Measurements. Bikas Sinha [ISI, Kolkata] Math. & Stat . Sciences ASU [Tempe] February 26, 2016. Quotes of the Day. “I now tend to believe …somehow…for so long…I was completely wrong .”
E N D
On Some Statistical Aspects of Agreement Among Measurements • BikasSinha [ISI, Kolkata] Math. & Stat . Sciences • ASU [Tempe] • February 26, 2016
Quotes of the Day • “I now tend to believe …somehow…for so long…I was completely wrong.” • “Ah ! That’s good. You and I finally agree!“ • *************** • “When two men of science disagree, they do not invoke the secular arm; they wait for further evidence to decide the issue, because, as men of science, they know that neither is infallible”.
Book Chapters…. • Introduction 1 • 1.1 Precision, Accuracy, and Agreement • 1.2 Traditional Approaches for Continuous Data • 1.3 Traditional Approaches for Categorical Data
Chapter 2 • 2. Continuous Data • 2.1 Basic Model • 2.2 Absolute Indices • 2.2.1 Mean Squared Deviation • 2.2.2 Total Deviation Index • 2.2.3 Coverage Probability • 2.3 Relative Indices • 2.3.1 Intraclass Correlation Coefficient • 2.3.2 Concordance Correlation Coefficient
Chapter 3 • 3. Categorical Data • 3.1 Basic Approach When Target Values Are Random • 3.1.1 Data Structure • 3.1.2 Absolute Indices • 3.1.3 Relative Indices: Kappa and Weighted Kappa
Seminar Plan Agreement for Categorical Data [Part I] 30 minutes Agreement for Continuous Data [Part II] 25 minutes Discussion.....5 minutes
Key References : Part I Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational & Psychological Measurement, 20(1): 37 – 46. [Famous for Cohen’s Kappa] Cohen, J. (1968). Weighted Kappa : Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70(4): 213-220.
References ….contd. Banerjee, M., Capozzoli, M., Mcsweeney, L. & Sinha, D.(1999). Beyond Kappa : A Review of Interrater Agreement Measures. Canadian Jour. of Statistics, 27(1) : 3 - 23.
Measurements : Provided by Experts / Observers / Raters • Could be two or more systems, assessors, chemists, psychologists, radiologists, clinicians, nurses, rating system or raters, diagnosis or treatments, instruments or methods, processes or techniques or formulae…… • Rater....Generic Term
Agreement : Categorical Data Illustrative Example Study on Diabetic Retinopathy Screening Problem : Interpretation of Single-Field Digital Fundus Images Assessment of Agreement WITHIN / ACROSS 4 EXPERT GROUPS Retina Specialists / General Opthalmologists / Photographers / Nurses : 3 from each Group
Description of Study Material 400 Diabetic Patients Selected randomly from a community hospital in Bangkok One Good Single-Field Digital Fundus Image Taken from each patient with Signed Consent Approved by Ethical Committee on Research with Human Subjects Raters : Allowed to Magnify / Move the Images NOT TO MODIFY Brightness / Contrasts
THREE Major Features #1. Diabetic Retinopathy Severity [6 options] No Retinopathy / Mild / Moderate NPDR Severe NPDR / PDR / Ungradable #2. Macular Edema [ 3 options] Presence / Absence / Ungradable #3. Referral to Opthalmologists [3 options] Referrals / Non-Referrals / Uncertain
Retina Specialists’ Ratings [DR] RS1 \ RS2 CODES 0 1 2 3 4 9Total 0 247 2 2 1 0 0 252 1 12 18 7 1 0 0 38 2 22 10 40 8 0 1 81 3 0 0 3 2 2 0 7 4 0 0 0 1 9 0 10 9 5 0 1 0 0 6 12 Total 286 30 53 13 11 7 400
Retina Specialists’ Consensus Rating [DR] RS1 \ RSCR CODES 0 1 2 3 4 9Total 0 252 0 0 0 0 0 252 1 17 19 2 0 0 0 38 2 15 19 43 2 1 1 81 3 0 0 2 4 1 0 7 4 0 0 0 0 10 0 10 9 8 0 0 0 0 4 12 Total 292 38 47 6 12 5 400
Retina Specialists’ Ratings [Macular Edema] RS1 \ RS2 CODES Presence Absence Subtotal Ungradable Total Presence 326 11 337 1 338 Absence 18 22 40 3 43 Subtotal344 33 377 -- -- Ungradable 9 0 -- 10 19 Total 353 33 -- 14 400
Retina Specialists’ Consensus Rating [ME] RS1 \ RSCR CODES Presence Absence Subtotal Ungradable Total Presence 335 2 337 1 338 Absence 10 33 43 0 43 Subtotal345 35 380 -- -- Ungradable 10 0 -- 9 19 Total 355 35 -- 10 400
Cohen’s Kappa for 2x2 Rating • Rater I vs Rater II : 2 x 2 Case Categories : Yes & No : Prop. (i,j) (Y,Y) & (N,N) : Agreement Prop (Y,N) & (N,Y) :Disagreement Prop 0 = (Y,Y) + (N,N) = P[agreement] e = (Y,.) (.,Y) + (N,.) (.,N) P [Chancy Agreement] • = [ 0 - e ] / [ 1 - e ] • Chance-corrected Agreement Index
Study of Agreement [RS-ME] 2 x 2 Table : Cohen’s Kappa () Coefficient Retina Specialist Retina Specialist 2 1 Presence Absence Subtotal Presence 326 11 337 Absence 18 22 40 Subtotal 344 33 377 IGNORED ’Ungradable’ to work with 2 x 2 table % agreement : (326 + 22) / 377 = 0.9231 = 0 % Chancy Agreement : %Yes. %Yes + %No. %No (337/377)(344/377) + (40/377)(33/377) = 0.8250 = e = [0 – e] / [ 1 – e ] = 56% only ! Nett Agreement Standardized
What About Multiple Ratings likeDiabetic Retinopathy [DR] ? 1 Retina Specialists 2 CODES 0 1 2 3 4 9Total 0 247 2 2 1 0 0 252 1 12 18 7 1 0 0 38 2 22 10 40 8 0 1 81 3 0 0 3 2 2 0 7 4 0 0 0 1 9 0 10 9 5 0 1 0 0 6 12 Total 286 30 53 13 11 7400
- Computation…… % Agreement =(247+18+40+2+9+6)/400 = 322/400 =0.8050 = 0 % Chance Agreement = (252/400)(286/400) + ….+(12/400)(7/400) = 0.4860 = e = [0 – e ] / [ 1 – e ] = 62% ! Note : 100% Credit for ’Hit’ & No Credit for ’Miss’. Criticism : Heavy Penalty for narrowly missed ! Concept of Weighted Kappa
Table of Weights for 6x6 Ratings Ratings Ratings [ 1 to 6 ] 1 2 3 4 5 6 1 1 24/25 21/25 16/25 9/25 0 2 24/25 1 24/25 21/25 16/25 9/25 3 21/25 24/25 1 24/25 21/25 16/25 4 16/25 21/25 24/25 1 24/25 21/25 5 9/25 16/25 21/25 24/25 1 24/25 6 0 9/25 16/25 21/25 24/25 1 Formula wiJ = 1 – [(i – j)^2 / (6-1)^2]
Formula for Weighted Kappa • 0 (w) = ∑∑wij f ij / n • e(w) = ∑ ∑ wij (fi. /n)(f.j /n) • These ∑ ∑ are over ALL cells with f ij as freq. in the (i,j)th cell • For unweighted Kappa : we take into account only the cell freq. along the main diagonal with 100% weight
-statistics for Pairs of Raters Categories DR ME Referral Retina Specialists 1 vs 2 0.63 0.58 0.65 1 vs 3 0.55 0.64 0.65 2 vs 3 0.56 0.51 0.59 1 vs CGroup 0.67 0.65 0.66 2 vs CGroup 0.70 0.65 0.66 3 vs CGroup 0.71 0.73 0.72
for Multiple Raters’ Agreement • Judgement on Simultaneous Agreement of Multiple Raters with Multiple Classification of Attributes….... # Raters = n # Subjects = k # Mutually Exclusive & Exhaustive Nominal Categories = c Example....Retina Specialists (n = 3), Patients (k = 400) & DR (c=6 codes)
Formula for Kappa • Set k ij = # raters to assign ith subject to jth category PJ = ∑i k ij / nk = Prop. of all assignments to jth category Chance-corrected assignment to category j [∑i k2ij – knPJ {1+(n-1)PJ}] J = ------------------------------------------- kn(n-1)PJ (1 – PJ)
Computation of Kappa • Chance-corrected measure of over-all agreement • ∑J Numerator of J • = ----------------------------------------- • ∑J Denominator of J • Interpretation ….Intraclass correlation
-statistic for multiple raters… CATEGORIES DR ME Referral Retina Specialsts 0.58 0.58 0.63 Gen. Opthalmo. 0.36 0.19 0.24 Photographers 0.37 0.38 0.30 Nurses 0.26 0.20 0.20 All Raters 0.34 0.27 0.28 Except for Retina Specialists, no other expert group shows good agreement in any feature
Conclusion based on K-Study • Of all 400 cases….. • 44 warranted Referral to Opthalmologists due to Retinopathy Severity • 5 warranted Referral to Opthalmologists due to uncertainty in diagnosis • Fourth Retina Specialist carried out Dilated Fundus Exam of these 44 patients and substantial agreement [K = 0.68] was noticed for DR severity…… • Exam confirmed Referral of 38 / 44 cases.
Discussion on the Study • Retina Specialists : All in active clinical practice : Most reliable for digital image interpretation • Individual Rater’s background and experience play roles in digital image interpretation • Unusually high % of ungradable images among nonphysician raters, though only 5 out of 400 were declared as ’ungradable’ by consensus of the Retina Specialists’ Group. • Lack of Confidence of Nonphysicians, rather than true image ambiguity ! • For this study, other factors [blood pressure, blood sugar, cholesterol etc] not taken into account……
That’s it in Part I …… • Part II : Continuous Data Set-up
Cohen’s Kappa : Need for Further Theoretical Research • COHEN’S KAPPA STATISTIC: A CRITICAL APPRAISAL AND SOME MODIFICATIONS • Sinha et al (2007) • Calcutta Statistical Association Bulletin, 58, 151-169
Further Theoretical Studies on Kappa – Statistics…. • Recent Study on Kappa : Attaining limits • Where’s the problem ? • = [0 – e ] / [ 1 – e ] Range : -1 ≤ ≤ 1 • = 1 iff 100% Perfect Rankings • = 0 iff 100% Chancy Ranking • = -1 iff 100% Imperfect AND Split-Half [?]
Why Split Half ? Example Presence Absence • Presence ---- 30% • Absence 70% --- = - 73% [& not -100%] ************************************ OnlySplit Half---- 50% provides 50% ---- = - 1
Kappa Modification… • This modification originated from M = [0 – e ] / [A – e ] • and suggesting a value of ‘A’ to take care of the situations: • (Y,Y) = (N,N) = 0 and • (Y,N) = α and (N,Y) = 1 – α for all αalong with M = -1.
Kappa Modification…. • The above implies • M = -2α(1-α) / [A – 2α(1-α)] = -1 implies • A = 4α(1-α) • It is seen that α has a dual interpretation • [= (Y,.) = (.,N) and hence a choice is given by α = [(Y,.) + (.,N)]/2. • Substituting for α in A and upon simplification, we end up with M1
Kappa-Modified…. • M1= [0 – e ] / [(Y,.) (N,.) + (.,Y) (.,N)] M1 satisfies M1 = 1 iff 100% Perfect Rankings..whatever • = 0 iff 100% Chancy Ranking • = -1 iff 100% Imperfect Ranking…whatever… • whatever ......arbitrary distribution of freq. across the categories subject to perfect/imperfect ranking
Other Formulae..…. • What if it is apriori known that there is 80% (Observed) Agreement between the two raters i.e., 0 = 80% ? • Max = 1 ? Min = -1 ?...NOT Really.... • So we need standardization of as • M2 = [ – Min ] / [Max – Min] where Max. & Min. are to be evaluated under the stipulated value of observed agreement
Standardization yields… • + (1- 0 )/(1+0 ) • M2 = ----------------------------------------------- {20 / [1+(1-0)2]} +{(1- 0)/(1+0)} {[M1+ (1- 0)/(1+0)} M3 = ----------------------------------------- [{0 /(2-0 ) + {(1- 0)/(1+0)}] Related inference procedures are studied.
Beyond Kappa ….. • A Review of Inter-rater Agreement Measures • Banerjee et al : Canadian Journal of Statistics : 1999; 3-23 • Modelling Patterns of Agreement : • Log Linear Models • Latent Class Models
The End • That’s it in Part I …… • BKSinha