Generalizability Theory

Generalizability Theory Nothing more practical than a good theory!

This presentation is made by Prof. Zhao

Overview of Presentation • Classes of reliability theories • Generalizability Theory • G-study • D-study • Illustrations

Three Reliability Theories • Classical Test Theory • Generalizability Theory • Item Response Theory

Generalizability Theory • Fundamental is the concept of parallel measures (like classical test theory), but the theory allows a multitude of error sources • Generalizability concept: Reliability is dependent on the inferences (generalizations) that the investigator wishes to make with the data from the measurement

Illustration • Essay test • 7 vignette based essay questions • 2 markers independently marking all questions for all examinees • Reliability in a classical framework: • Cronbach’s alpha: 0.66 • Inter rater reliability (i.e. kappa) 0.71

Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X The larger the variance of T in relation to X, the higher the reliability

Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X = = =

Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X Reliability = Variance of T Var T + Var E

Multiple sources of error variance Reliability = Variance of T Var T + Var E Markers Essays Unexplained

Two steps in G analysis • G(eneralizability)-study: Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers) • D(ecision)-study: Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)

G-study steps • Determine facets (factors of variance) • Determine design • Random vs fixed • Crossed vs nested

A B A B C D E F G H I J K L 1 2 3 4 5 6 Crossed vs nested designs Crossed design Nested design

G-study • Determine facets (factors of variance) • Determine design • Random vs Fixed • Crossed vs nested • Collect data • Analysis of Variance (ANOVA) • Estimation of variance components

One facet design Random Crossed Illustration 1 • Essay Test • 7 vignette based open ended questions • 100 students • One marker marked all essays for all students • G-study questions? • N of factors/facets? • Random/fixed facets? • Nested or crossed?

Sources of Variance Person x Items p pi,e i

p pi,e Sources of Variance Person x Items i

p pi,e i Sources of Variance Person x Items

pi,e Sources of Variance Person x Items p

+ + 2 (Xpi) = 2p 2i 2pi,e ^ ^ ^ ^ Variance component estimation (one facet design) An observed score for a person on an item (Xpi): Xpi = [Overall mean] + p -  [Person effect] + i -  [Item effect] + pi - p - i -  [Residual] Each of these effects have an average (always 0) and a variance (2). The latter ones are the variance components. The variance of all observes scores Xpi across all persons and items: ^

Variance components P x I design Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e

A B A B C D E F G H I J K L 1 2 3 4 5 6 Crossed vs nested designs Crossed design Nested design

Sources of Variance Items : Persons p i,pi,e

Variance components I : P design 13.35 86.65 261.24 371.97 Estimated Variance Component 35.75 50.90 Percentage of Total Variance 97.57 663.21 Source p i,pi,e i pi,e

97.57 663.21 13.35 86.65 p i,pi,e Variance components I : P design 13.35 86.65 261.24 371.97 Estimated Variance Component 35.75 50.90 Percentage of Total Variance 97.57 663.21 Source p i,pi,e i pi,e

Sources of Variance Person x Items x Judges pi p i pij,e ij pj j

2i 2ij 2pi 2pj 2j 2pij,e 2 (Xpij) = 2p ^ ^ ^ ^ ^ ^ ^ ^ + + + + + + Variance component estimation (two facet design) An observed score for a person on an item (Xpi): Xpi = [Overall mean] + p -  [Person effect] + j -  [Item effect] + i -  [Judge effect] + pj - p - j +  [Person by judge effect] + pi - p - i +  [Person by item effect] + ij - j - i +  [Judge x item effect] + pij - pj - pi - ij + p + j + i -  [Residual] The variance of observes scores Xpi across all persons and items:

Variance components P x I x J design Estimated Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 Percentage of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 Source p i j pi pj ij pij,e

Two steps in G analysis • G(eneralizability)-study: Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers) • D(ecision)-study: Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)

Interpretation of scores • Norm-oriented perspective Scores have relative meaning; scores have meaning in relation to each other • Domain-oriented perspective Scores have absolute meaning to the domain of measurement • Mastery-oriented perspective Scores have meaning in relation to a cut-off score (reliability of decisions, not of scores)

Fundamental Equation X = Observed score X = T + E T = True score E = Error score Reliability = Variance of T Variance of X Reliability = Variance of T Var T + Var E

Illustration 1 • Essay test • 7 vignette based essay questions • 1 markers marked all questions for all examinees • Norm-referenced perspective Calculate generalizability coefficient!

Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e D-study (ni = 7; norm-referenced) T 97.57 = G = = 0.65 T + E 97.57 + 371.97 /7

Illustration 2 • Essay test • 7 vignette based essay questions • 1 markers marked all questions for all examinees • Domain-referenced perspective Calculate dependability coefficient!

Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e D-study (ni = 7; domain referenced) 97.57 D = = 0.52 97.57 + 261.24/ + 371.97/ 7 7

Illustration 3 • Essay test • 7 vignette based essay questions • 1 markers marked all questions for all examinees • Domain-referenced perspective Calculate dependability coefficient for a sample of 10 essays!

Estimated Variance Component 97.57 261.24 371.97 Percentage of Total Variance 13.35 35.75 50.90 Standard Error 19.02 112.98 17.60 Source p i pi,e 97.57 D = = 0.61 97.57 + 261.24/ + 371.97/ 10 10 D-study (ni = 10; domain referenced)

D-studies for several item samples Generalizability Coefficient (G) 0.21 0.57 0.65 0.72 0.80 Dependability Coefficient (D) 0.13 0.44 0.52 0.61 0.70 N Essays 1 5 7 10 15

Illustration 4 • Essay test • 7 vignette based essay questions • 2 markers independently marked all questions for all examinees • Norm-referenced perspective Calculate generalizability coefficient!

D-study (ni=7; nj=2; norm referenced) Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 % of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 Source p i j pi pj ij pij,e 48.71 G = = 0.50 48.71 + 185.87/ + 33.18/ + 72.94/ 7 2 2 x 7

Illustration 5 • Essay test • 7 vignette based essay questions • 2 markers independently marked all questions for all examinees • Domain-referenced perspective Calculate dependability coefficient!

D-study (ni=7; nj=2; domain referenced) Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 % of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 Source p i j pi pj ij pij,e 48.71 D = = 0.43 48.71 + 25.12/ + 15.00/ + 185.87/ + 33.18/ + 80.00/ + 72.94/ 7 2 14 2 14 14

Illustration 6 • Essay test • 7 vignette based essay questions • 2 different markers independently marked each question for all examinees • Norm-referenced perspective Calculate generalizability coefficient!

D-study (ni=7; nj=2; norm referenced) (Judges : Items) x Persons 10.57 5.45 20.62 40.33 23.03 48.71 25.18 95.00 185.87 106.12 Estimated Var Component Perc of Total Variance Source p i j,ij pi pj,pij,e 48.71 G = = 0.52 48.71 + 185.87/ + 106.12/ 7 2 x 7

Same Marker for all essays Different Marker for each essay One Marker 0.36 0.41 0.45 0.49 One Marker 0.39 0.47 0.56 0.65 Two Markers 0.46 0.54 0.63 0.72 Two Markers 0.44 0.50 0.56 0.61 Number of Essays 5 7 10 15 D-study summary table Norm-referenced score interpretation

Another reliability index • Reliability coefficient (G & D coefficients) •  Scale independent (0-1) •  Non-intuitive interpretation • Standard Error of Measurement (SEM) •  Intuitive interpretation •  Scale dependent

E Standard Error of Measurement X = Observed score X = T + E T = True score E = Error score Reliability index = Variance of T Variance T + Variance E Standard Error of Measurement (SEM) =

45 45 45 60 60 60 50 50 50 55 55 55 65 65 65 70 70 70 75 75 75 Interpretation of SEM Suppose an examinee has a score of 60% and the SEM is 5: 65% CI 1.96 x 5  10 95% CI 2.14 x 5  11 95% CI

Generalizability Theory

Generalizability Theory

Presentation Transcript

The Generalizability Theory -- Cronbach et al. 1972

Theory

Application of Generalizability Theory to Concept-Map Assessment Research

Theory

Theory

EPSY 546: LECTURE 3 GENERALIZABILITY THEORY AND VALIDITY

Generalizability and Dependability of Direct Behavior Ratings (DBRs)

Theory

Theory

Generalizability of Goal Recognition Models in Narrative-Centered Learning Environments

Theory

Equity Theory Justice Theory

Generalizability, Reliability and Validity

Theory

Theory

Theory

Theory

Theory

$PDF$/READ/DOWNLOAD Generalizability Theory: A Primer (Measurement Methods for the Social Science)

(PDF) Generalizability Theory: A Primer (Measurement Methods for the Social Scie