270 likes | 406 Views
RELIABILITY OF DISEASE CLASSIFICATION. Nigel Paneth. TERMINOLOGY. Reliability is analogous to precision Validity is analogous to accuracy Reliability is how well an observer classifies the same individual under different circumstances.
E N D
RELIABILITY OF DISEASE CLASSIFICATION Nigel Paneth
TERMINOLOGY Reliability is analogous to precision Validity is analogous to accuracy Reliability is how well an observer classifies the same individual under different circumstances. Validity is how well a given test reflects another test of known greater accuracy.
RELIABILITY AND VALIDITY Reliability includes: • assessments of the same observer at different times - INTRA-OBSERVER RELIABILITY • assessments of different observers at the same time - INTER-OBSERVER RELIABILITY Reliability assumes that all tests or observers are equal; Validity assumes that there is a gold standard to which a test or observer should be compared.
ASSESSING RELABILITY How do we assess reliability? One way is to look simply at percent agreement. Percent agreement is the proportion of all diagnoses classified the same way by two observers.
EXAMPLE OF PERCENT AGREEMENT Two physicians are each given a set of 100 X-rays to look at independently and asked to judge whether pneumonia is present or absent. When both sets of diagnoses are tallied, it is found that 95% of the diagnoses are the same.
IS PERCENT AGREEMENT GOOD ENOUGH? Do these two physicians exhibit high diagnostic reliability? Can there be 95% agreement between two observers without really having good reliablity?
Compare the two tables below: Table 1 Table 2 In both instances, the physicians agree 95% of the time. Are the two physicians equally reliable in the two tables?
What is the essential difference between the two tables? • The problem arises from the ease of agreement on common events (e.g. not having pneumonia in the first table). • So a measure of agreement should take into account the “ease” of agreement due to chance alone.
USE OF THE KAPPA STATISTIC TO ASSESS RELIABILITY Kappa is a widely used test of inter or intra-observer agreement (or reliability) which corrects forchance agreement.
KAPPA VARIES FROM + 1 to - 1 + 1 means that the two observers are perfectly reliable. They classify everyone exactly the same way. 0 means there is no relationship at all between the two observer’s classifications, above the agreement that would be expected by chance. - 1 means the two observers classify exactly the opposite of each other. If one observer says yes, the other always says no.
GUIDE TO USE OF KAPPAS IN EPIDEMIOLOGY AND MEDICINE Kappa > .80 is considered excellent Kappa .60 - .80 is considered good Kappa .40 - .60 is considered fair Kappa < .40 is considered poor
1st WAY TO CALCULATE KAPPA 1. Calculate observed agreement (cells in which the observers agree/total cells). In both table 1 and table 2 it is 95% 2. Calculate expected agreement (chance agreement) based on the marginal totals
How do we calculate the N expected by chance in each cell? • We assume that each cell should reflect the marginal distributions, i.e. the proportion of yes and no answers should be the same within the four-fold table as in the marginal totals.
To do this, we find the proportion of answers in either the column (3% and 97%, yes and no respectively for MD #1) or row (4% and 96% yes and no respectively for MD #2) marginal totals, and apply one of the two proportions to the other marginal total. For example, 96% of the row totals are in the “No” category. Therefore, by chance 96% of MD #1’s “No’s” should also be in the “No” column. 96% of 97 is 93.12.
By subtraction, all other cells fill in automatically, and each yes/no distribution reflects the marginal distribution. Any cell could have been used to make the calculation, because once one cell is specified in a 2x2 table with fixed marginal distributions, all other cells are also specified.
Now you can see that just by the operation of chance, 93.24 of the 100 observations should have been agreed to by the two observers. (93.12 + 0.12)
Lets now compare the actual agreement with the expected agreement. • Expected agreement is 6.76% from perfect agreement of 100% (100 – 93.24) • Actual agreement is 5.0% from perfect agreement (100 – 95). • So our two observers were 1.76% better than chance, but if they had agreed perfectly they would have been 6.76% better than chance. So they are really only about ¼ better than chance (1.76/6.76)
Below is the formula for calculating Kappa from expected agreement Observed agreement - Expected Agreement 1 - Expected Agreement 95% - 93.24% = 1.76% = .26 1 - 93.24% 6.76%
How good is a Kappa of 0.26? Kappa > .80 is considered excellent Kappa .60 - .80 is considered good Kappa .40 - .60 is considered fair Kappa < .40 is considered poor
In the second example, the observed agreement was also 95%, but the marginal totals were very different
Using the same procedure as before, we calculate the expected N in any one cell, based on the marginal totals. For example, the lower right cell is 54% of 55, which is 29.7
And, by subtraction the other cells are as below. The cells which indicate agreement are highlighted in yellow, and add up to 50.4%
Enter the two agreements into the formula: Observed agreement - Expected Agreement 1 - Expected Agreement 95% - 50.4% = 44.6% = .90 1 - 50.4% 49.6% In this example, the observers have the same % agreement, but now they are much different from chance. Kappa of 0.90 is considered excellent
A 2nd WAY TO CALCULATE THE KAPPA STATISTIC 2(AD - BC) N1N4 + N2N3 where the Ns are the marginal totals, labeled thus:
Look again at the tables on slide 7. For Table 1: 2(94 x 1 - 2 x 3) = 176 = .26 4 x 97 + 3 x 96 676 For Table 2: 2(52 x 43 - 3 x 2) = 4460 = .90 46 x 55 + 45 x 54 4960
Note parallels between: THE ODDS RATIO THE CHI-SQUARE STATISTIC THE KAPPA STATISTIC Note that the cross-products of the four-fold table, and their relation to marginal totals, are central to all three expressions