400 likes | 718 Views
Interpreting Kappa in Observational Research: Baserate Matters. Cornelia Taylor Bruckner Vanderbilt University. Acknowledgements . Paul Yoder Craig Kennedy Niels Waller Andrew Tomarken MRDD training grant KC Quant core. Overview. Agreement is a proxy for accuracy
E N D
Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Vanderbilt University
Acknowledgements • Paul Yoder • Craig Kennedy • Niels Waller • Andrew Tomarken • MRDD training grant • KC Quant core
Overview • Agreement is a proxy for accuracy • Agreement statistics 101 • Chance agreement • Agreement matrix • Baserate • Kappa and baserate, a paradox • Estimating accuracy from kappa • Applied example
Framing as observational coding • I will be framing the talk today within observational measurement but the concepts apply to many other situations e.g., • Agreement between clinicians on diagnosis • Agreement between reporters on child symptoms (e.g. mothers and fathers)
“Rater accuracy”: A fictitious session • Madeline Scientist writes a script for an interval coded observation session where the • Presence or absence of target behavior in interval • Two coders (Eager Beaver and Slack Jack), blind to the script, are asked to code the session. • Accuracy of each coder with the script is calculated
Who has the best accuracy? • Eager Beaver of course. • Slack Jack was not very accurate • Notice that accuracy is about agreement with the occurrence and nonoccurrence of behavior.
We don’t always know the truth • It is great when we know the true occurrence and nonoccurrence of behaviors • But, in the real world we deal with agreement between fallible observers
Agreement between raters • Point by point interobserver agreement is achieved when independent observers : • see the same thing (behavior, event) • at the same time
Difference between agreement and accuracy • Agreement can be directly measured. • Accuracy can not be directly measured. • We don’t know the “truth” of a session. • However, agreement is used as a proxy for accuracy • Accuracy can be estimated from agreement • The method for this estimation is the focus of today’s talk
Percent agreement • Percent agreement • The proportion of intervals that were agreed upon • Agreements/agreements+disagreements • Takes into account occurrence and nonoccurrence agreement • Varies from 0-100%
Occurrence and Nonoccurrence agreement • Occurrence agreement • The proportion of intervals that either coder recorded the behavior that were agreed upon • Positive agreement • Non-occurrence agreement • The proportion of intervals that either coder recorded a nonoccurrence that were agreed upon • Negative agreement
Problem with agreement statistics • We assume that agreement is due to accuracy • Agreement statistics do not control for chance agreement • So agreement could be due only to chance
Chance agreement and point by point agreement Nonoccurrence agreement Occurrence agreement
Using a 2x2 table to check agreement on individual codes • When IOA is computed on the total code set it is an omnibus measure of agreement • This does not inform us on agreement on any one code. • To know agreement on a particular code the confusion matrix needs to be collapsed into a 2x2 matrix.
Baserate in A 2x2 table Eager Beaver Slack Jack Happy All other emotions Happy 60 10 70 All other emotions 7 123 67 200 (67+70)/(2*200)= .34
Review • Defined accuracy • Described the relationship between chance agreement and IOA • Creating a 2x2 table • Calculating a best estimate of the base rate
Kappa • Kappa is an agreement statistic that controls for chance agreement • Before kappa there was a sense that we should control for chance but we did not know how • Cohen’s 1960 paper has been cited over 7000 times
Definition of Kappa • Kappa is the proportionof non-chance agreement observed out of all the non-chance agreement K = Po-Pe 1 - Pe
Definition of Terms • Po= The proportion of events for which there is observed agreement. • Same metric as percent agreement • Pe= The proportion of events for which agreement would be expected by chance alone • Defined as the probability of two raters coding the same behavior at the same time by chance
Agreement matrix for EB and SJ with (chance agreement) Po = .36+.18; Pe = .33 + .15; k = (.54-.48)/(1-.48)=.12
What determines the value of kappa • Accuracy and base rate • Increasing accuracy increases observed agreement therefore: kappa is a consistent estimator of accuracy if base rate is held constant • If accuracy is held constant, kappa will decrease as the estimated true base rate deviates from .5
Obtained kappa, across baserate, for 80% accuracy Accuracy 80%
Obtained kappa, across baserate, for 80% and 99% accuracy Accuracy = 99% Accuracy = 80%
Obtained kappa, across baserate, from 80% to 99% accuracy Accuracy=99% Accuracy=95% Accuracy=90% Accuracy=85% Accuracy=80%
Bottom line • When we observe behaviors that are High or Low baserate our kappa’s will be low. • This is important for researchers studying low baserate behaviors • Many of the behaviors we observe in young children with developmental disabilities are very low baserate
Criterion values for IOA • Cohen never suggested using criterion values for kappa • Many professional organizations recommend criterions for IOA • e.g., The Council for Exceptional Children: Division for Research Recommendations 2005 • “ Data are collected on the reliability or inter-observer agreement (IOA) associated with each dependent variable, and IOA levels meet minimal standards (e.g., IOA = 80%; Kappa = .60)”
Criterion accuracy? • Setting a criterion for kappa independent of baserate is not useful • If we can estimate accuracy • And I am suggesting that we can • We need to consider what sufficient accuracy would be
Criterion accuracy cont. • If we consider 80% agreement sufficient than • Would we consider 80% accuracy sufficient? • If we used 80% accuracy as a criterion • Acceptable kappa could be as low as .19 depending on baserate
Why it is really important not to use criterion kappas • There is a belief that the quality of data will be higher if kappa is higher. • This is only true if there is no associated loss of content or construct validity. • The processes of collapsing and redefining codes often result in a loss of validity.
Applied example • See handout for formulas and data
Use the table on the first page of your handout to determine the accuracy of raters from baserate and kappa
.32 .85
Recommendations • Calculate agreement for each code using a 2x2 table • Use the table to determine the accuracy of observers from baserate and obtained kappa • Report kappa and accuracy
Software to calculate kappa • Comkappa, Developed by Bakeman to calculate kappa, SE of kappa, kappa max, and weighted kappa. • MOOSES, Developed by Jon Tapp. Calculates kappa on the total code set and individual codes. Can be used with live coding, video coding, and transcription. • SPSS
Challenge • The challenge is to change the standards of observational research that demand kappa's above a criteria of .6 • Editors • PI’s • Collaborators