1 / 43

On Some Statistical Aspects of Agreement Among Measurements

On Some Statistical Aspects of Agreement Among Measurements. Bikas Sinha [ISI, Kolkata] Math. & Stat . Sciences ASU [Tempe] February 26, 2016. Quotes of the Day. “I now tend to believe …somehow…for so long…I was completely wrong .”

nugents
Download Presentation

On Some Statistical Aspects of Agreement Among Measurements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Some Statistical Aspects of Agreement Among Measurements • BikasSinha [ISI, Kolkata] Math. & Stat . Sciences • ASU [Tempe] • February 26, 2016

  2. Quotes of the Day • “I now tend to believe …somehow…for so long…I was completely wrong.” • “Ah ! That’s good. You and I finally agree!“ • *************** • “When two men of science disagree, they do not invoke the secular arm; they wait for further evidence to decide the issue, because, as men of science, they know that neither is infallible”.

  3. Latest Book on Measuring Agreement

  4. Book Chapters…. • Introduction 1 • 1.1 Precision, Accuracy, and Agreement • 1.2 Traditional Approaches for Continuous Data • 1.3 Traditional Approaches for Categorical Data

  5. Chapter 2 • 2. Continuous Data • 2.1 Basic Model • 2.2 Absolute Indices • 2.2.1 Mean Squared Deviation • 2.2.2 Total Deviation Index • 2.2.3 Coverage Probability • 2.3 Relative Indices • 2.3.1 Intraclass Correlation Coefficient • 2.3.2 Concordance Correlation Coefficient

  6. Chapter 3 • 3. Categorical Data • 3.1 Basic Approach When Target Values Are Random • 3.1.1 Data Structure • 3.1.2 Absolute Indices • 3.1.3 Relative Indices: Kappa and Weighted Kappa

  7. Seminar Plan Agreement for Categorical Data [Part I] 30 minutes Agreement for Continuous Data [Part II] 25 minutes Discussion.....5 minutes

  8. Key References : Part I Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational & Psychological Measurement, 20(1): 37 – 46. [Famous for Cohen’s Kappa] Cohen, J. (1968). Weighted Kappa : Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70(4): 213-220.

  9. References ….contd. Banerjee, M., Capozzoli, M., Mcsweeney, L. & Sinha, D.(1999). Beyond Kappa : A Review of Interrater Agreement Measures. Canadian Jour. of Statistics, 27(1) : 3 - 23.

  10. Measurements : Provided by Experts / Observers / Raters • Could be two or more systems, assessors, chemists, psychologists, radiologists, clinicians, nurses, rating system or raters, diagnosis or treatments, instruments or methods, processes or techniques or formulae…… • Rater....Generic Term

  11. Agreement : Categorical Data Illustrative Example Study on Diabetic Retinopathy Screening Problem : Interpretation of Single-Field Digital Fundus Images Assessment of Agreement WITHIN / ACROSS 4 EXPERT GROUPS Retina Specialists / General Opthalmologists / Photographers / Nurses : 3 from each Group

  12. Description of Study Material 400 Diabetic Patients Selected randomly from a community hospital in Bangkok One Good Single-Field Digital Fundus Image Taken from each patient with Signed Consent Approved by Ethical Committee on Research with Human Subjects Raters : Allowed to Magnify / Move the Images NOT TO MODIFY Brightness / Contrasts

  13. THREE Major Features #1. Diabetic Retinopathy Severity [6 options] No Retinopathy / Mild / Moderate NPDR Severe NPDR / PDR / Ungradable #2. Macular Edema [ 3 options] Presence / Absence / Ungradable #3. Referral to Opthalmologists [3 options] Referrals / Non-Referrals / Uncertain

  14. Retina Specialists’ Ratings [DR] RS1 \ RS2 CODES 0 1 2 3 4 9Total 0 247 2 2 1 0 0 252 1 12 18 7 1 0 0 38 2 22 10 40 8 0 1 81 3 0 0 3 2 2 0 7 4 0 0 0 1 9 0 10 9 5 0 1 0 0 6 12 Total 286 30 53 13 11 7 400

  15. Retina Specialists’ Consensus Rating [DR] RS1 \ RSCR CODES 0 1 2 3 4 9Total 0 252 0 0 0 0 0 252 1 17 19 2 0 0 0 38 2 15 19 43 2 1 1 81 3 0 0 2 4 1 0 7 4 0 0 0 0 10 0 10 9 8 0 0 0 0 4 12 Total 292 38 47 6 12 5 400

  16. Retina Specialists’ Ratings [Macular Edema] RS1 \ RS2 CODES Presence Absence Subtotal Ungradable Total Presence 326 11 337 1 338 Absence 18 22 40 3 43 Subtotal344 33 377 -- -- Ungradable 9 0 -- 10 19 Total 353 33 -- 14 400

  17. Retina Specialists’ Consensus Rating [ME] RS1 \ RSCR CODES Presence Absence Subtotal Ungradable Total Presence 335 2 337 1 338 Absence 10 33 43 0 43 Subtotal345 35 380 -- -- Ungradable 10 0 -- 9 19 Total 355 35 -- 10 400

  18. Cohen’s Kappa for 2x2 Rating • Rater I vs Rater II : 2 x 2 Case Categories : Yes & No : Prop. (i,j) (Y,Y) & (N,N) : Agreement Prop (Y,N) & (N,Y) :Disagreement Prop 0 = (Y,Y) + (N,N) = P[agreement] e = (Y,.) (.,Y) + (N,.) (.,N) P [Chancy Agreement] • = [ 0 - e ] / [ 1 - e ] • Chance-corrected Agreement Index

  19. Study of Agreement [RS-ME] 2 x 2 Table : Cohen’s Kappa () Coefficient Retina Specialist Retina Specialist 2 1 Presence Absence Subtotal Presence 326 11 337 Absence 18 22 40 Subtotal 344 33 377 IGNORED ’Ungradable’ to work with 2 x 2 table % agreement : (326 + 22) / 377 = 0.9231 = 0 % Chancy Agreement : %Yes. %Yes + %No. %No (337/377)(344/377) + (40/377)(33/377) = 0.8250 = e  = [0 – e] / [ 1 – e ] = 56% only ! Nett Agreement Standardized

  20. What About Multiple Ratings likeDiabetic Retinopathy [DR] ? 1 Retina Specialists 2 CODES 0 1 2 3 4 9Total 0 247 2 2 1 0 0 252 1 12 18 7 1 0 0 38 2 22 10 40 8 0 1 81 3 0 0 3 2 2 0 7 4 0 0 0 1 9 0 10 9 5 0 1 0 0 6 12 Total 286 30 53 13 11 7400

  21.  - Computation…… % Agreement =(247+18+40+2+9+6)/400 = 322/400 =0.8050 = 0 % Chance Agreement = (252/400)(286/400) + ….+(12/400)(7/400) = 0.4860 = e  = [0 – e ] / [ 1 – e ] = 62% ! Note : 100% Credit for ’Hit’ & No Credit for ’Miss’. Criticism : Heavy Penalty for narrowly missed ! Concept of Weighted Kappa

  22. Table of Weights for 6x6 Ratings Ratings Ratings [ 1 to 6 ] 1 2 3 4 5 6 1 1 24/25 21/25 16/25 9/25 0 2 24/25 1 24/25 21/25 16/25 9/25 3 21/25 24/25 1 24/25 21/25 16/25 4 16/25 21/25 24/25 1 24/25 21/25 5 9/25 16/25 21/25 24/25 1 24/25 6 0 9/25 16/25 21/25 24/25 1 Formula wiJ = 1 – [(i – j)^2 / (6-1)^2]

  23. Formula for Weighted Kappa • 0 (w) = ∑∑wij f ij / n • e(w) = ∑ ∑ wij (fi. /n)(f.j /n) • These ∑ ∑ are over ALL cells with f ij as freq. in the (i,j)th cell • For unweighted Kappa : we take into account only the cell freq. along the main diagonal with 100% weight

  24. -statistics for Pairs of Raters Categories DR ME Referral Retina Specialists 1 vs 2 0.63 0.58 0.65 1 vs 3 0.55 0.64 0.65 2 vs 3 0.56 0.51 0.59 1 vs CGroup 0.67 0.65 0.66 2 vs CGroup 0.70 0.65 0.66 3 vs CGroup 0.71 0.73 0.72

  25.  for Multiple Raters’ Agreement • Judgement on Simultaneous Agreement of Multiple Raters with Multiple Classification of Attributes….... # Raters = n # Subjects = k # Mutually Exclusive & Exhaustive Nominal Categories = c Example....Retina Specialists (n = 3), Patients (k = 400) & DR (c=6 codes)

  26. Formula for Kappa • Set k ij = # raters to assign ith subject to jth category PJ = ∑i k ij / nk = Prop. of all assignments to jth category Chance-corrected assignment to category j [∑i k2ij – knPJ {1+(n-1)PJ}] J = ------------------------------------------- kn(n-1)PJ (1 – PJ)

  27. Computation of Kappa • Chance-corrected measure of over-all agreement • ∑J Numerator of J •  = ----------------------------------------- • ∑J Denominator of J • Interpretation ….Intraclass correlation

  28.  -statistic for multiple raters… CATEGORIES DR ME Referral Retina Specialsts 0.58 0.58 0.63 Gen. Opthalmo. 0.36 0.19 0.24 Photographers 0.37 0.38 0.30 Nurses 0.26 0.20 0.20 All Raters 0.34 0.27 0.28 Except for Retina Specialists, no other expert group shows good agreement in any feature

  29. Conclusion based on K-Study • Of all 400 cases….. • 44 warranted Referral to Opthalmologists due to Retinopathy Severity • 5 warranted Referral to Opthalmologists due to uncertainty in diagnosis • Fourth Retina Specialist carried out Dilated Fundus Exam of these 44 patients and substantial agreement [K = 0.68] was noticed for DR severity…… • Exam confirmed Referral of 38 / 44 cases.

  30. Discussion on the Study • Retina Specialists : All in active clinical practice : Most reliable for digital image interpretation • Individual Rater’s background and experience play roles in digital image interpretation • Unusually high % of ungradable images among nonphysician raters, though only 5 out of 400 were declared as ’ungradable’ by consensus of the Retina Specialists’ Group. • Lack of Confidence of Nonphysicians, rather than true image ambiguity ! • For this study, other factors [blood pressure, blood sugar, cholesterol etc] not taken into account……

  31. That’s it in Part I …… • Part II : Continuous Data Set-up

  32. Cohen’s Kappa : Need for Further Theoretical Research • COHEN’S KAPPA STATISTIC: A CRITICAL APPRAISAL AND SOME MODIFICATIONS • Sinha et al (2007) • Calcutta Statistical Association Bulletin, 58, 151-169

  33. Further Theoretical Studies on Kappa – Statistics…. • Recent Study on Kappa : Attaining limits • Where’s the problem ? •  = [0 – e ] / [ 1 – e ] Range : -1 ≤  ≤ 1 •  = 1 iff 100% Perfect Rankings • = 0 iff 100% Chancy Ranking • = -1 iff 100% Imperfect AND Split-Half [?]

  34. Why Split Half ? Example Presence Absence • Presence ---- 30% • Absence 70% ---  = - 73% [& not -100%] ************************************ OnlySplit Half---- 50% provides 50% ----  = - 1

  35. Kappa Modification… • This modification originated from M = [0 – e ] / [A – e ] • and suggesting a value of ‘A’ to take care of the situations: • (Y,Y) = (N,N) = 0 and • (Y,N) = α and (N,Y) = 1 – α for all αalong with M = -1.

  36. Kappa Modification…. • The above implies • M = -2α(1-α) / [A – 2α(1-α)] = -1 implies • A = 4α(1-α) • It is seen that α has a dual interpretation • [= (Y,.) = (.,N) and hence a choice is given by α = [(Y,.) + (.,N)]/2. • Substituting for α in A and upon simplification, we end up with M1

  37. Kappa-Modified…. • M1= [0 – e ] / [(Y,.) (N,.) + (.,Y) (.,N)] M1 satisfies M1 = 1 iff 100% Perfect Rankings..whatever • = 0 iff 100% Chancy Ranking • = -1 iff 100% Imperfect Ranking…whatever… • whatever ......arbitrary distribution of freq. across the categories subject to perfect/imperfect ranking

  38. Other Formulae..…. • What if it is apriori known that there is 80% (Observed) Agreement between the two raters i.e., 0 = 80% ? • Max = 1 ? Min = -1 ?...NOT Really.... • So we need standardization of  as • M2 = [ – Min ] / [Max – Min] where Max. & Min. are to be evaluated under the stipulated value of observed agreement

  39. Standardization yields… •  + (1- 0 )/(1+0 ) • M2 = ----------------------------------------------- {20 / [1+(1-0)2]} +{(1- 0)/(1+0)} {[M1+ (1- 0)/(1+0)} M3 = ----------------------------------------- [{0 /(2-0 ) + {(1- 0)/(1+0)}] Related inference procedures are studied.

  40. Beyond Kappa ….. • A Review of Inter-rater Agreement Measures • Banerjee et al : Canadian Journal of Statistics : 1999; 3-23 • Modelling Patterns of Agreement : • Log Linear Models • Latent Class Models

  41. The End • That’s it in Part I …… • BKSinha

More Related