1 / 95

Observational Methods Part Two

Observational Methods Part Two. January 20, 2010. Today’s Class. Survey Results Probing Question for today Observational Methods Probing Question for next class Assignment 1. Survey Results. Much broader response this time – thanks! Good to go with modern technology

kevina
Download Presentation

Observational Methods Part Two

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Observational MethodsPart Two January 20, 2010

  2. Today’s Class • Survey Results • Probing Question for today • Observational Methods • Probing Question for next class • Assignment 1

  3. Survey Results • Much broader response this time – thanks! • Good to go with modern technology • Generally positive comments • Some contradictions in best-part, worst-part • The best sign one is doing well is when everyone wants contradictory changes?  • I will give another survey in a few classes

  4. Today’s Class • Survey Results • Probing Question for today • Observational Methods • Probing Question for next class • Assignment 1

  5. Probing Question • For today, you have read • D'Mello, S., Taylor, R.S., Graesser, A. (2007) Monitoring Affective Trajectories during Complex Learning. Proceedings of the 29th Annual Meeting of the Cognitive Science Society, 203-208 • Which used data from a lab study • If you wanted to study affective transitions in real classrooms, which of the methods we discussed today would be best? Why?

  6. What’s the best way? • First, let’s list out the methods • For now, don’t critique, just describe your preferred method • One per person, please • If someone else has already presented your method, no need to repeat it • If you propose something similar, quickly list the difference (no need to say why right now)

  7. For each method • What are the advantages? • What are the disadvantages?

  8. Votes for each method

  9. Today’s Class • Survey Results • Probing Question for today • Observational Methods • Probing Question for next class • Assignment 1

  10. Topics • Measures of agreement • Study of prevalence • Correlation to other constructs • Dynamics models • Development of EDM models (ref to later)

  11. Agreement/ Accuracy • The easiest measure of inter-rater reliability is agreement, also called accuracy # of agreements total number of codes

  12. Agreement/ Accuracy • There is general agreement across fields that agreement/accuracy is not a good metric • What are some drawbacks of agreement/accuracy?

  13. Agreement/ Accuracy • Let’s say that Tasha and Uniqua agreed on the classification of 9200 time sequences, out of 10000 actions • For a coding scheme with two codes • 92% accuracy • Good, right?

  14. Non-even assignment to categories • Percent Agreement does poorly when there is non-even assignment to categories • Which is almost always the case • Imagine an extreme case • Uniqua (correctly) picks category A 92% of the time • Tasha always picks category A • Agreement/accuracy of 92% • But essentially no information

  15. An alternate metric • Kappa (Agreement – Expected Agreement) (1 – Expected Agreement)

  16. Kappa • Expected agreement computed from a table of the form

  17. Kappa • Expected agreement computed from a table of the form • Note that Kappa can be calculated for any number of categories (but only 2 raters)

  18. Cohen’s (1960) Kappa • The formula for 2 categories • Fleiss’s (1971) Kappa, which is more complex, can be used for 3+ categories • I have an Excel spreadsheet which calculates multi-category Kappa, which I would be happy to share with you

  19. Expected agreement • Look at the proportion of labels each coder gave to each category • To find the number of agreed category A that could be expected by chance, multiply pct(coder1/categoryA)*pct(coder2/categoryA) • Do the same thing for categoryB • Add these two values together and divide by the total number of labels • This is your expected agreement

  20. Example

  21. Example • What is the percent agreement?

  22. Example • What is the percent agreement? • 80%

  23. Example • What is Tyrone’s expected frequency for on-task?

  24. Example • What is Tyrone’s expected frequency for on-task? • 75%

  25. Example • What is Pablo’s expected frequency for on-task?

  26. Example • What is Pablo’s expected frequency for on-task? • 65%

  27. Example • What is the expected on-task agreement?

  28. Example • What is the expected on-task agreement? • 0.65*0.75= 0.4875

  29. Example • What is the expected on-task agreement? • 0.65*0.75= 0.4875

  30. Example • What are Tyrone and Pablo’s expected frequencies for off-task behavior?

  31. Example • What are Tyrone and Pablo’s expected frequencies for off-task behavior? • 25% and 35%

  32. Example • What is the expected off-task agreement?

  33. Example • What is the expected off-task agreement? • 0.25*0.35= 0.0875

  34. Example • What is the expected off-task agreement? • 0.25*0.35= 0.0875

  35. Example • What is the total expected agreement?

  36. Example • What is the total expected agreement? • 0.4875+0.0875 = 0.575

  37. Example • What is kappa?

  38. Example • What is kappa? • (0.8 – 0.575) / (1-0.575) • 0.225/0.425 • 0.529

  39. So is that any good? • What is kappa? • (0.8 – 0.575) / (1-0.575) • 0.225/0.425 • 0.529

  40. Interpreting Kappa • Kappa = 0 • Agreement is at chance • Kappa = 1 • Agreement is perfect • Kappa = negative infinity • Agreement is perfectly inverse • Kappa > 1 • You messed up somewhere

  41. Kappa<0 • It does happen, but usually not in the case of inter-rater reliability • Occasionally seen when Kappa is used for EDM or other types of machine learning • More on this in 2 months!

  42. 0<Kappa<1 • What’s a good Kappa? • There is no absolute standard • For inter-rater reliability, • 0.8 is usually what ed. psych. reviewers want to see • You can usually make a case that values of Kappa around 0.6 are good enough to be usable for some applications • Particularly if there’s a lot of data • Or if you’re collecting observations to drive EDM • Remember that Baker, Corbett, & Wagner (2006) had Kappa = 0.58

  43. Landis & Koch’s (1977) scale

  44. Why is there no standard? • Because Kappa is scaled by the proportion of each category • When one class is much more prevalent • Expected agreement is higher than • If classes are evenly balanced

  45. Because of this… • Comparing Kappa values between two studies, in a principled fashion, is highly difficult • A lot of work went into statistical methods for comparing Kappa values in the 1990s • No real consensus • Informally, you can compare two studies if the proportions of each category are “similar”

  46. There is a way to statistically compare two inter-rater reliabilities… • “Junior high school” meta-analysis

  47. There is a way to statistically compare two inter-rater reliabilities… • “Junior high school” meta-analysis • Do a 1 df Chi-squared test on each reliability, convert the Chi-squared values to Z, and then compare the two Z values using the method in Rosenthal & Rosnow (1991)

  48. There is a way to statistically compare two inter-rater reliabilities… • “Junior high school” meta-analysis • Do a 1 df Chi-squared test on each reliability, convert the Chi-squared values to Z, and then compare the two Z values using the method in Rosenthal & Rosnow (1991) • Or in other words, nyardleynyardleynyoo

  49. Comments? Questions?

  50. Topics • Measures of agreement • Study of prevalence • Correlation to other constructs • Dynamics models • Development of EDM models

More Related