1.27k likes | 2.69k Views
Generalizability Theory. Nothing more practical than a good theory!. This presentation is made by Prof. Zhao. Overview of Presentation. Classes of reliability theories Generalizability Theory G-study D-study Illustrations. Three Reliability Theories. Classical Test Theory
E N D
Generalizability Theory Nothing more practical than a good theory!
Overview of Presentation • Classes of reliability theories • Generalizability Theory • G-study • D-study • Illustrations
Three Reliability Theories • Classical Test Theory • Generalizability Theory • Item Response Theory
Overview of Presentation • Classes of reliability theories • Generalizability Theory • G-study • D-study • Illustrations
Generalizability Theory • Fundamental is the concept of parallel measures (like classical test theory), but the theory allows a multitude of error sources • Generalizability concept: Reliability is dependent on the inferences (generalizations) that the investigator wishes to make with the data from the measurement
Illustration • Essay test • 7 vignette based essay questions • 2 markers independently marking all questions for all examinees • Reliability in a classical framework: • Cronbach’s alpha: 0.66 • Inter rater reliability (i.e. kappa) 0.71
Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X The larger the variance of T in relation to X, the higher the reliability
Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X = = =
Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X Reliability = Variance of T Var T + Var E
Multiple sources of error variance Reliability = Variance of T Var T + Var E Markers Essays Unexplained
Two steps in G analysis • G(eneralizability)-study: Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers) • D(ecision)-study: Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)
G-study steps • Determine facets (factors of variance) • Determine design • Random vs fixed • Crossed vs nested
A B A B C D E F G H I J K L 1 2 3 4 5 6 Crossed vs nested designs Crossed design Nested design
G-study • Determine facets (factors of variance) • Determine design • Random vs Fixed • Crossed vs nested • Collect data • Analysis of Variance (ANOVA) • Estimation of variance components
One facet design Random Crossed Illustration 1 • Essay Test • 7 vignette based open ended questions • 100 students • One marker marked all essays for all students • G-study questions? • N of factors/facets? • Random/fixed facets? • Nested or crossed?
Sources of Variance Person x Items p pi,e i
p pi,e Sources of Variance Person x Items i
p pi,e i Sources of Variance Person x Items
pi,e Sources of Variance Person x Items p
+ + 2 (Xpi) = 2p 2i 2pi,e ^ ^ ^ ^ Variance component estimation (one facet design) An observed score for a person on an item (Xpi): Xpi = [Overall mean] + p - [Person effect] + i - [Item effect] + pi - p - i - [Residual] Each of these effects have an average (always 0) and a variance (2). The latter ones are the variance components. The variance of all observes scores Xpi across all persons and items: ^
Variance components P x I design Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e
A B A B C D E F G H I J K L 1 2 3 4 5 6 Crossed vs nested designs Crossed design Nested design
Sources of Variance Items : Persons p i,pi,e
Variance components I : P design 13.35 86.65 261.24 371.97 Estimated Variance Component 35.75 50.90 Percentage of Total Variance 97.57 663.21 Source p i,pi,e i pi,e
97.57 663.21 13.35 86.65 p i,pi,e Variance components I : P design 13.35 86.65 261.24 371.97 Estimated Variance Component 35.75 50.90 Percentage of Total Variance 97.57 663.21 Source p i,pi,e i pi,e
Sources of Variance Person x Items x Judges pi p i pij,e ij pj j
2i 2ij 2pi 2pj 2j 2pij,e 2 (Xpij) = 2p ^ ^ ^ ^ ^ ^ ^ ^ + + + + + + Variance component estimation (two facet design) An observed score for a person on an item (Xpi): Xpi = [Overall mean] + p - [Person effect] + j - [Item effect] + i - [Judge effect] + pj - p - j + [Person by judge effect] + pi - p - i + [Person by item effect] + ij - j - i + [Judge x item effect] + pij - pj - pi - ij + p + j + i - [Residual] The variance of observes scores Xpi across all persons and items:
Variance components P x I x J design Estimated Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 Percentage of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 Source p i j pi pj ij pij,e
Overview of Presentation • Classes of reliability theories • Generalizability Theory • G-study • D-study • Illustrations
Two steps in G analysis • G(eneralizability)-study: Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers) • D(ecision)-study: Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)
Interpretation of scores • Norm-oriented perspective Scores have relative meaning; scores have meaning in relation to each other • Domain-oriented perspective Scores have absolute meaning to the domain of measurement • Mastery-oriented perspective Scores have meaning in relation to a cut-off score (reliability of decisions, not of scores)
Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X Reliability = Variance of T Var T + Var E
Illustration 1 • Essay test • 7 vignette based essay questions • 1 markers marked all questions for all examinees • Norm-referenced perspective Calculate generalizability coefficient!
Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e D-study (ni = 7; norm-referenced) T 97.57 = G = = 0.65 T + E 97.57 + 371.97 /7
Illustration 2 • Essay test • 7 vignette based essay questions • 1 markers marked all questions for all examinees • Domain-referenced perspective Calculate dependability coefficient!
Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e D-study (ni = 7; domain referenced) 97.57 D = = 0.52 97.57 + 261.24/ + 371.97/ 7 7
Illustration 3 • Essay test • 7 vignette based essay questions • 1 markers marked all questions for all examinees • Domain-referenced perspective Calculate dependability coefficient for a sample of 10 essays!
Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e 97.57 D = = 0.61 97.57 + 261.24/ + 371.97/ 10 10 D-study (ni = 10; domain referenced)
D-studies for several item samples Generalizability Coefficient (G) 0.21 0.57 0.65 0.72 0.80 Dependability Coefficient (D) 0.13 0.44 0.52 0.61 0.70 N Essays 1 5 7 10 15
Illustration 4 • Essay test • 7 vignette based essay questions • 2 markers independently marked all questions for all examinees • Norm-referenced perspective Calculate generalizability coefficient!
D-study (ni=7; nj=2; norm referenced) Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 % of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 Source p i j pi pj ij pij,e 48.71 G = = 0.50 48.71 + 185.87/ + 33.18/ + 72.94/ 7 2 2 x 7
Illustration 5 • Essay test • 7 vignette based essay questions • 2 markers independently marked all questions for all examinees • Domain-referenced perspective Calculate dependability coefficient!
D-study (ni=7; nj=2; domain referenced) Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 % of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 Source p i j pi pj ij pij,e 48.71 D = = 0.43 48.71 + 25.12/ + 15.00/ + 185.87/ + 33.18/ + 80.00/ + 72.94/ 7 2 14 2 14 14
Illustration 6 • Essay test • 7 vignette based essay questions • 2 different markers independently marked each question for all examinees • Norm-referenced perspective Calculate generalizability coefficient!
D-study (ni=7; nj=2; norm referenced) (Judges : Items) x Persons 10.57 5.45 20.62 40.33 23.03 48.71 25.18 95.00 185.87 106.12 Estimated Var Component Perc of Total Variance Source p i j,ij pi pj,pij,e 48.71 G = = 0.52 48.71 + 185.87/ + 106.12/ 7 2 x 7
Same Marker for all essays Different Marker for each essay One Marker 0.36 0.41 0.45 0.49 One Marker 0.39 0.47 0.56 0.65 Two Markers 0.46 0.54 0.63 0.72 Two Markers 0.44 0.50 0.56 0.61 Number of Essays 5 7 10 15 D-study summary table Norm-referenced score interpretation
Another reliability index • Reliability coefficient (G & D coefficients) • Scale independent (0-1) • Non-intuitive interpretation • Standard Error of Measurement (SEM) • Intuitive interpretation • Scale dependent
E Standard Error of Measurement X = Observed score X = T + E T = True score E = Error score Reliability index = Variance of T Variance T + Variance E Standard Error of Measurement (SEM) =
45 45 45 60 60 60 50 50 50 55 55 55 65 65 65 70 70 70 75 75 75 Interpretation of SEM Suppose an examinee has a score of 60% and the SEM is 5: 65% CI 1.96 x 5 10 95% CI 2.14 x 5 11 95% CI