1 / 58

Generalizability Theory

Generalizability Theory. Nothing more practical than a good theory!. This presentation is made by Prof. Zhao. Overview of Presentation. Classes of reliability theories Generalizability Theory G-study D-study Illustrations. Three Reliability Theories. Classical Test Theory

Download Presentation

Generalizability Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generalizability Theory Nothing more practical than a good theory!

  2. This presentation is made by Prof. Zhao

  3. Overview of Presentation • Classes of reliability theories • Generalizability Theory • G-study • D-study • Illustrations

  4. Three Reliability Theories • Classical Test Theory • Generalizability Theory • Item Response Theory

  5. Overview of Presentation • Classes of reliability theories • Generalizability Theory • G-study • D-study • Illustrations

  6. Generalizability Theory • Fundamental is the concept of parallel measures (like classical test theory), but the theory allows a multitude of error sources • Generalizability concept: Reliability is dependent on the inferences (generalizations) that the investigator wishes to make with the data from the measurement

  7. Illustration • Essay test • 7 vignette based essay questions • 2 markers independently marking all questions for all examinees • Reliability in a classical framework: • Cronbach’s alpha: 0.66 • Inter rater reliability (i.e. kappa) 0.71

  8. Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X The larger the variance of T in relation to X, the higher the reliability

  9. Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X = = =

  10. Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X Reliability = Variance of T Var T + Var E

  11. Multiple sources of error variance Reliability = Variance of T Var T + Var E Markers Essays Unexplained

  12. Two steps in G analysis • G(eneralizability)-study: Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers) • D(ecision)-study: Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)

  13. G-study steps • Determine facets (factors of variance) • Determine design • Random vs fixed • Crossed vs nested

  14. A B A B C D E F G H I J K L 1 2 3 4 5 6 Crossed vs nested designs Crossed design Nested design

  15. G-study • Determine facets (factors of variance) • Determine design • Random vs Fixed • Crossed vs nested • Collect data • Analysis of Variance (ANOVA) • Estimation of variance components

  16. One facet design Random Crossed Illustration 1 • Essay Test • 7 vignette based open ended questions • 100 students • One marker marked all essays for all students • G-study questions? • N of factors/facets? • Random/fixed facets? • Nested or crossed?

  17. Sources of Variance Person x Items p pi,e i

  18. p pi,e Sources of Variance Person x Items i

  19. p pi,e i Sources of Variance Person x Items

  20. pi,e Sources of Variance Person x Items p

  21. + + 2 (Xpi) = 2p 2i 2pi,e ^ ^ ^ ^ Variance component estimation (one facet design) An observed score for a person on an item (Xpi): Xpi = [Overall mean] + p -  [Person effect] + i -  [Item effect] + pi - p - i -  [Residual] Each of these effects have an average (always 0) and a variance (2). The latter ones are the variance components. The variance of all observes scores Xpi across all persons and items: ^

  22. Variance components P x I design Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e

  23. A B A B C D E F G H I J K L 1 2 3 4 5 6 Crossed vs nested designs Crossed design Nested design

  24. Sources of Variance Items : Persons p i,pi,e

  25. Variance components I : P design 13.35 86.65 261.24 371.97 Estimated Variance Component 35.75 50.90 Percentage of Total Variance 97.57 663.21 Source p i,pi,e i pi,e

  26. 97.57 663.21 13.35 86.65 p i,pi,e Variance components I : P design 13.35 86.65 261.24 371.97 Estimated Variance Component 35.75 50.90 Percentage of Total Variance 97.57 663.21 Source p i,pi,e i pi,e

  27. Sources of Variance Person x Items x Judges pi p i pij,e ij pj j

  28. 2i 2ij 2pi 2pj 2j 2pij,e 2 (Xpij) = 2p ^ ^ ^ ^ ^ ^ ^ ^ + + + + + + Variance component estimation (two facet design) An observed score for a person on an item (Xpi): Xpi = [Overall mean] + p -  [Person effect] + j -  [Item effect] + i -  [Judge effect] + pj - p - j +  [Person by judge effect] + pi - p - i +  [Person by item effect] + ij - j - i +  [Judge x item effect] + pij - pj - pi - ij + p + j + i -  [Residual] The variance of observes scores Xpi across all persons and items:

  29. Variance components P x I x J design Estimated Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 Percentage of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 Source p i j pi pj ij pij,e

  30. Overview of Presentation • Classes of reliability theories • Generalizability Theory • G-study • D-study • Illustrations

  31. Two steps in G analysis • G(eneralizability)-study: Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers) • D(ecision)-study: Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)

  32. Interpretation of scores • Norm-oriented perspective Scores have relative meaning; scores have meaning in relation to each other • Domain-oriented perspective Scores have absolute meaning to the domain of measurement • Mastery-oriented perspective Scores have meaning in relation to a cut-off score (reliability of decisions, not of scores)

  33. Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X Reliability = Variance of T Var T + Var E

  34. Illustration 1 • Essay test • 7 vignette based essay questions • 1 markers marked all questions for all examinees • Norm-referenced perspective Calculate generalizability coefficient!

  35. Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e D-study (ni = 7; norm-referenced) T 97.57 = G = = 0.65 T + E 97.57 + 371.97 /7

  36. Illustration 2 • Essay test • 7 vignette based essay questions • 1 markers marked all questions for all examinees • Domain-referenced perspective Calculate dependability coefficient!

  37. Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e D-study (ni = 7; domain referenced) 97.57 D = = 0.52 97.57 + 261.24/ + 371.97/ 7 7

  38. Illustration 3 • Essay test • 7 vignette based essay questions • 1 markers marked all questions for all examinees • Domain-referenced perspective Calculate dependability coefficient for a sample of 10 essays!

  39. Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e 97.57 D = = 0.61 97.57 + 261.24/ + 371.97/ 10 10 D-study (ni = 10; domain referenced)

  40. D-studies for several item samples Generalizability Coefficient (G) 0.21 0.57 0.65 0.72 0.80 Dependability Coefficient (D) 0.13 0.44 0.52 0.61 0.70 N Essays 1 5 7 10 15

  41. Illustration 4 • Essay test • 7 vignette based essay questions • 2 markers independently marked all questions for all examinees • Norm-referenced perspective Calculate generalizability coefficient!

  42. D-study (ni=7; nj=2; norm referenced) Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 % of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 Source p i j pi pj ij pij,e 48.71 G = = 0.50 48.71 + 185.87/ + 33.18/ + 72.94/ 7 2 2 x 7

  43. Illustration 5 • Essay test • 7 vignette based essay questions • 2 markers independently marked all questions for all examinees • Domain-referenced perspective Calculate dependability coefficient!

  44. D-study (ni=7; nj=2; domain referenced) Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 % of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 Source p i j pi pj ij pij,e 48.71 D = = 0.43 48.71 + 25.12/ + 15.00/ + 185.87/ + 33.18/ + 80.00/ + 72.94/ 7 2 14 2 14 14

  45. Illustration 6 • Essay test • 7 vignette based essay questions • 2 different markers independently marked each question for all examinees • Norm-referenced perspective Calculate generalizability coefficient!

  46. D-study (ni=7; nj=2; norm referenced) (Judges : Items) x Persons 10.57 5.45 20.62 40.33 23.03 48.71 25.18 95.00 185.87 106.12 Estimated Var Component Perc of Total Variance Source p i j,ij pi pj,pij,e 48.71 G = = 0.52 48.71 + 185.87/ + 106.12/ 7 2 x 7

  47. Same Marker for all essays Different Marker for each essay One Marker 0.36 0.41 0.45 0.49 One Marker 0.39 0.47 0.56 0.65 Two Markers 0.46 0.54 0.63 0.72 Two Markers 0.44 0.50 0.56 0.61 Number of Essays 5 7 10 15 D-study summary table Norm-referenced score interpretation

  48. Another reliability index • Reliability coefficient (G & D coefficients) •  Scale independent (0-1) •  Non-intuitive interpretation • Standard Error of Measurement (SEM) •  Intuitive interpretation •  Scale dependent

  49. E Standard Error of Measurement X = Observed score X = T + E T = True score E = Error score Reliability index = Variance of T Variance T + Variance E Standard Error of Measurement (SEM) =

  50. 45 45 45 60 60 60 50 50 50 55 55 55 65 65 65 70 70 70 75 75 75 Interpretation of SEM Suppose an examinee has a score of 60% and the SEM is 5: 65% CI 1.96 x 5  10 95% CI 2.14 x 5  11 95% CI

More Related