1 / 11

Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa

Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa. Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of Medicine. Inter-rater Reliability of Clinical Interview Based Measures.

ismet
Download Presentation

Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of Medicine

  2. Inter-rater Reliability of Clinical Interview Based Measures • Ratings of clinical severity for specific symptom domains (e.g, PANSS, BPRS, SAPS, SANS) • Continuous scales • Use intraclass correlations to assess inter-rater reliability. • Diagnostic Assessment • Categorical Data / Nominal Scale Data • How do we quantify reliability between diagnosticians? • Percent Agreement, Chi-Square, Kappa

  3. Two raters classify n cases into k mutually exclusive categories. Rater 2 Category nij=number of cases falling into cell =freq of joint event ij Rater 1 n..=total number of cases pij= nij / n.. = proportion of cases falling into particular cell. Reliability by Percentage Agreement = ∑ipii = 1/n ∑inii

  4. Percent Agreement Fails to Consider Agreement by Chance Rater 2 .90 x .90 = .81 Rater 1 .10 x .10 = .01 Proportion Agreement = .82 •Assume that two raters whose judgments are completely independent (i.e., not influenced by the true diagnostic status of the patient) each diagnose 90% of cases to have schizophrenia and 10% of cases to not have schizophrenia (i.e., Other). •Expected agreement by chance for each category obtained by multiplying the marginal probabilities together. •Can get Percentage Agreement of 82% strictly by chance.

  5. • Can perform a Chi-Square Test of Association to test null hypothesis that the two raters’ judgments are independent. • To reject independence, show that observed agreement departs from what would be expected by chance alone. Chi-Square = ∑cells (Observed - Expected)2 / Expected • Problem: In example below, we have a perfect association between the Raters with zero agreement.Chi-Square is a test of Association, not Agreement. It is sensitive to any departure from chance agreement, even when the dependency between the raters’ judgments involves perfect non-agreement. • So, we cannot use Chi-Square Test to assess agreement between raters. Chi-Square Test of Association as Proposed Solution Rater 2 Rater 1

  6. po -pc Kappa, K = 1 - p c po= .53 + .14 .03 = .7 pc= .39 + .075 + .01 = .475 .7 - .475 K = = .429 1 - .475 K = 1, perfect agreement K = 0, chance agreement K< 0, agreement worse than chance. •High reliability requires that the frequencies along the diagonal should be > chance and off diagonal frequencies should be < chance. • Use marginal frequencies/probabilities to estimate chance agreement. Kappa Coefficient (Cohen, 1960) Proportion agreement observed, po= ∑ipii = 1/n ∑inii Proportion agreement expected by chance, pc= ∑ipi. x p.i Rater 2 Rater 1 pi. x p.i .39 .075 .01

  7. po -pc Kappa, K = 1 - p c po= .53 + .14 .03 = .7 pc= .39 + .075 + .01 = .475 .7 - .475 K = = .429 1 - .475 • Interpretations of Kappa K = P (agreement | no agreement by chance) 1-pc = 1- .475 = .525 of cases where no agreement by chance po - pc = .7- .475 = .225 of cases are those non-chance agreement cases where observers agreed. Kappa is the probability that judges will agree given no agreement by chance. Can test Ho that Kappa = 0, Kappa is normally distributed with large samples, can test significance using normal distribution. Can erect confidence intervals for Kappa.

  8. Kw= 1 - pc(w) Weighted Kappa Coefficient Can assign weights, wij, to classification errors according to their seriousness using ratio scale weights. po(w) - pc(w) Rater 2 Rater 1

  9. Kappa Rules of Thumb • K ≥ .75 is considered excellent agreement. • K ≤ .46 is considered poor agreement.

  10. Weighted Kappa and the ICC • Is an intraclass correlation coefficient ( except for factor of 1/n) when weights have following property: wij = 1 - (i - j)2 (k - 1) 2

  11. Problems with Kappa • Affected by base rates of diagnoses. • Can’t easily compare across studies that have different base rates, either in the population, or in the reliability study. • Chance agreement is a problem? • When the null hypothesis of rater independence is not met (which is most of the time), the estimate of chance agreement is inaccurate and possibly inappropriate).

More Related