400 likes | 495 Views
Cognitive Diagnosis as Evidentiary Argument. Robert J. Mislevy Department of Measurement, Statistics, & Evaluation University of Maryland, College Park, MD October 21, 2004 Presented at the Fourth Spearman Conference, Philadelphia, PA, Oct. 21-23, 2004.
E N D
Cognitive Diagnosis as Evidentiary Argument Robert J. Mislevy Department of Measurement, Statistics, & Evaluation University of Maryland, College Park, MD October 21, 2004 Presented at the Fourth Spearman Conference, Philadelphia, PA, Oct. 21-23, 2004. Thanks to Russell Almond, Charles Davis, Chun-Wei Huang, Sandip Sinharay, Linda Steinberg, Kikumi Tatsuioka, David Williamson, and Duanli Yan.
Introduction • An assessment is a particular kind of evidentiary argument. • Parsing a particular assessment in terms of the elements of an argument provides insights into more visible features such as tasks and statistical models. • Will look at cognitive diagnosis from this perspective.
Toulmin's (1958) structure for arguments Reasoning flows from data (D) to claim (C) by justification of a warrant (W), which in turn is supported by backing (B). The inference may need to be qualified by alternative explanations (A), which may have rebuttal evidence (R) to support them.
Specialization to assessment • The role of psychological theory: • Nature of claims & data • Warrant connecting claims and data: “If student were x, would probably do y” • The role of probability-based inference: “Student does y; what is support for x’s?” • Will look first at assessment under behavioral perspective, then see how cognitive diagnosis extends the ideas.
Behaviorist Perspective The evaluation of the success of instruction and of the student’s learning becomes a matter of placing the student in a sample of situations in which the different learned behaviors may appropriately occur and noting the frequency and accuracy with which they do occur. D.R. Krathwohl & D.A. Payne, 1971, p. 17-18.
: Sue's probability of C correctly answering a 2- digit subtraction problem p with borrowing is W :Sampling theory machinery unless : [e.g., observational A for reasoning from true errors, data errors, proportion for correct since misclassification of n responses in targeted responses or . situations to observed counts performance situations, distractions, etc.] so and D1j D2j structure : Sue's D11 D2j : Sue's structure D11 D2j : Sue's structure answer to answer to and contents and contents answer to Item j and contents Item j of Item j Item j of Item j of Item j The claim addresses the expected value of performance of the targeted kind in the targeted situations.
: Sue's probability of C correctly answering a 2- digit subtraction problem p with borrowing is W :Sampling theory machinery unless : [e.g., observational A for reasoning from true errors, data errors, proportion for correct since misclassification of n responses in targeted responses or . situations to observed counts performance situations, distractions, etc.] so and D1j D2j structure : Sue's D11 D2j : Sue's structure D11 D2j : Sue's structure answer to answer to and contents and contents answer to Item j and contents Item j of Item j Item j of Item j of Item j The student data address the salient features of the responses.
: Sue's probability of C correctly answering a 2- digit subtraction problem p with borrowing is W :Sampling theory machinery unless : [e.g., observational A for reasoning from true errors, data errors, proportion for correct since misclassification of n responses in targeted responses or . situations to observed counts performance situations, distractions, etc.] so and D1j D2j structure : Sue's D11 D2j : Sue's structure D11 D2j : Sue's structure answer to answer to and contents and contents answer to Item j and contents Item j of Item j Item j of Item j of Item j The task data address the salient features of the stimulus situations (i.e., tasks).
: Sue's probability of C correctly answering a 2- digit subtraction problem p with borrowing is W :Sampling theory machinery unless : [e.g., observational A for reasoning from true errors, data errors, proportion for correct since misclassification of n responses in targeted responses or . situations to observed counts performance situations, distractions, etc.] so and D1j D2j structure : Sue's D11 D2j : Sue's structure D11 D2j : Sue's structure answer to answer to and contents and contents answer to Item j and contents Item j of Item j Item j of Item j of Item j The warrant encompasses definitions of the class of stimulus situations, response classifications, and sampling theory.
Statistical Modeling of Assessment Data • Claims in terms of values of unobservable variables in student model (SM)--characterize student knowledge. • Data modeled as depending probabilistically on SM vars. • Estimate conditional distributions of data given SM vars. • Bayes theorem to infer SM variables given data.
Specialization to cognitive diagnosis • Information-processing perspective foregrounded in cognitive diagnosis • Student model contains variables in terms of, e.g., • Production rules at some grain-size • Components / organization of knowledge • Possibly strategy availability / usage • Importance of purpose
Responses consistent with the"subtract smaller from larger" bug “Buggy arithmentic”: Brown & Burton (1978); VanLehn (1990)
Some Illustrative Student Models in Cognitive Diagnosis • Whole number subtraction: • ~ 200 production rules (VanLehn, 1990) • Can model at level of bugs (Brown & Burton) or at the level of impasses (VanLehn) • John Anderson’s ITSs in algebra, LISP • ~ 1000 production rules • 1-10 in play at a given time • Reverse-engineered large-scale tests • ~10-15 skills • Mixed number subtraction (Tatsuoka) • ~5-15 production rules / skills
Mixed number subtraction • Based on example from Prof. Kikumi Tatsuoka (1982). • Cognitive analysis & task design • Methods A & B • Overlapping sets of skills under methods • Bayes nets described in Mislevy (1994): • Five “skills” required under Method B. • Conjunctive combination of skills • DINA stochastic model
Skill 1: Basic fraction subtraction Skill 2: Simplify/Reduce Skill 3: Separate whole number from fraction Skill 4: Borrow from whole number Skill 5: Convert whole number to fractions
C: Sue's configuration of production rules for operating in the domain (knowledge and skill) is K W0: Theory about how persons with since configurations { K 1,..., K m} would be likely to respond to items with so different salient features. and : Sue's probability of : Sue's probability of C C ... answering a Class 1 answering a Class n subtraction problem with subtraction problem with borrowing is p 1 borrowing is p n :Sampling :Sampling W W theory theory since since for items with for items with feature set feature set so so defining Class 1 defining Class n and and : : D11j D21j D1nj D2nj Sue's Sue's structure structure D11 D2j D11 D2j D11 D2j D11 D2j ... answer to answer to and contents and contents Item j, Class 1 Item j, Class n of Item j, Class1 of Item j, Class n of Item j of Item j of Item j of Item j
C: Sue's configuration of production rules for operating in the domain (knowledge and skill) is K W0: Theory about how persons with since configurations { K 1,..., K m} would be likely to respond to items with so different salient features. and : Sue's probability of : Sue's probability of C C ... answering a Class 1 answering a Class n subtraction problem with subtraction problem with Like behaviorist inference at level of behavior in classes of structurally similar tasks. borrowing is p 1 borrowing is p n :Sampling :Sampling W W theory theory since since for items with for items with feature set feature set so so defining Class 1 defining Class n and and : : D11j D21j D1nj D2nj Sue's Sue's structure structure D11 D2j D11 D2j D11 D2j D11 D2j ... answer to answer to and contents and contents Item j, Class 1 Item j, Class n of Item j, Class1 of Item j, Class n of Item j of Item j of Item j of Item j
C: Sue's configuration of production rules for operating in the domain (knowledge and skill) is K W0: Theory about how persons with since configurations { K 1,..., K m} would be likely to respond to items with so different salient features. and : Sue's probability of : Sue's probability of C C ... answering a Class 1 answering a Class n subtraction problem with subtraction problem with borrowing is p 1 borrowing is p n Structural patterns among behaviorist claims are data for inferences about unobservable production rules that govern behavior. :Sampling :Sampling W W theory theory since since for items with for items with feature set feature set so so defining Class 1 defining Class n and and : : D11j D21j D1nj D2nj Sue's Sue's structure structure D11 D2j D11 D2j D11 D2j D11 D2j ... answer to answer to and contents and contents Item j, Class 1 Item j, Class n of Item j, Class1 of Item j, Class n of Item j of Item j of Item j of Item j
C: Sue's configuration of production rules for operating in the domain (knowledge and skill) is K W0: Theory about how persons with since configurations { K 1,..., K m} would be likely to respond to items with so different salient features. and : Sue's probability of : Sue's probability of C C ... answering a Class 1 answering a Class n subtraction problem with subtraction problem with borrowing is p 1 borrowing is p n :Sampling :Sampling W W • This level distinguishes cognitive diagnosis from subscores. • A typical (but not necessary) difference is that cognitive diagnosis has many-to-many relationship between observable variables and student-model variables. As partitions, subscores have 1-1 relationships between scores and inferential targets. theory theory since since for items with for items with feature set feature set so so defining Class 1 defining Class n and and : : D11j D21j D1nj D2nj Sue's Sue's structure structure D11 D2j D11 D2j D11 D2j D11 D2j ... answer to answer to and contents and contents Item j, Class 1 Item j, Class n of Item j, Class1 of Item j, Class n of Item j of Item j of Item j of Item j
Structural and stochastic aspects of inferential models • Structural model relates student model variables (qs) to observable variables (xs) • Conjunctive, disjunctive, mixture • Complete vs incomplete (e.g., fusion model) • The Q matrix (next slide) • Stochastic model addresses uncertainty • Rule based; logical with noise • Probability-based inference (discrete Bayes nets, extended IRT models) • Hybrid (e.g., Rule Space)
The Q-matrix (Fischer, Tatsuoka) Features Items • qjk is extent Feature k pertains to Item j • Special case: 0/1 entries and a 1-1 relationship between features and student-model variables.
Conjunctive structural relationship • Person i: qi = (qi1, qi2, …, qiK) • Each qik =1 if person possesses “skill”, 0 if not. • Task j: qj= (qj1, qj2, …, qjK) • A qjk= 1 if item j “requires skill k”, 0 if not. • Iij = 1 if (qjk =1 Þqik =1) for allk, 0 if (qjk =1 butqik =0) for anyk.
Conjunctive structural relationship:No stochastic model • Pr(xij =1| qi , qj ) = Iij • No uncertainty about x given q. • There is uncertainty about q given x, even if no stochastic part, due to competing explanations (Falmagne): xij = {0,1} just gives you partitioning into all qs that cover of qj, vs. those that miss with respect to at least one skill.
Conjunctive structural relationship:DINA stochastic model • Now there is uncertainty about x given q: Pr(xij =1| Iij =0) = pj0 -- False positive Pr(xij =1| Iij =1) = pj1 -- True positive • Likelihood over n items: • Posterior :
The particular challenge of competing explanations • Triangulation • Different combinations of data fail to support some alternative explanations of responses, and reinforce others. • Why was an item requiring Skills 1 & 2 wrong? • Missing Skill 1? Missing Skill 2? A slip? • Try items requiring 1 & 3, 2 & 4, 1& 2 again. • Degree design supports inferences • Test design as experimental design
Basic fraction subtraction Bayes net for mixed number subtraction(Method B) (Skill 1) 6/7 - 4/7 Item 6 2/3 - 2/3 Item 8 Simplify/reduce (Skill 2) Convert whole number to fraction (Skill 5) Mixed number skills Separate whole Borrow from Structural aspects: The logical conjunctive relationships among skills, and which sets of skills an item requires. Latter determined by its qj vector. number from whole number fraction (Skill 4) (Skill 3) Skills 1 & 3 3 2 4 1 7/8 - 5/7 - 4/7 Skills 1, 3, & Skills 1 & 2 Item 9 Item 16 4 3 3 4/5 - 2/5 Item 14 Skills Skills 1, 3, 4, 7 11/8 - 1/8 3/5 - 4/5 1,2,3,&4 & 5 Item 17 Item 12 Skills 1, 2, 3, 3 2 4 2 4 1 2 1/2 - 3/2 1/3 - 4/3 1/3 - 5/3 - 1/3 4, & 5 Item 4 Item 11 Item 20 Item 15 4 2 4 2 4/12 - 7/12 1/10 - 8/10 Item 10 Item 18 3 2 4 3 - 1/5 - 4/3 Item 7 Item 19
Basic fraction subtraction Bayes net for mixed number subtraction(Method B) (Skill 1) 6/7 - 4/7 Item 6 2/3 - 2/3 Item 8 Simplify/reduce (Skill 2) Convert whole number to fraction (Skill 5) Mixed number skills Separate whole Borrow from number from whole number fraction (Skill 4) (Skill 3) Stochastic aspects, Part 1: Empirical relationships among skills in population (red). Skills 1 & 3 3 2 4 1 7/8 - 5/7 - 4/7 Skills 1, 3, & Skills 1 & 2 Item 9 Item 16 4 3 3 4/5 - 2/5 Item 14 Skills Skills 1, 3, 4, 7 11/8 - 1/8 3/5 - 4/5 1,2,3,&4 & 5 Item 17 Item 12 Skills 1, 2, 3, 3 2 4 2 4 1 2 1/2 - 3/2 1/3 - 4/3 1/3 - 5/3 - 1/3 4, & 5 Item 4 Item 11 Item 20 Item 15 4 2 4 2 4/12 - 7/12 1/10 - 8/10 Item 10 Item 18 3 2 4 3 - 1/5 - 4/3 Item 7 Item 19
Basic fraction subtraction Bayes net for mixed number subtraction(Method B) (Skill 1) 6/7 - 4/7 Item 6 2/3 - 2/3 Item 8 Simplify/reduce (Skill 2) Convert whole number to fraction (Skill 5) Mixed number skills Separate whole Borrow from number from whole number fraction (Skill 4) (Skill 3) Stochastic aspects, Part 2: Measurement errors for each item (yellow). Skills 1 & 3 3 2 4 1 7/8 - 5/7 - 4/7 Skills 1, 3, & Skills 1 & 2 Item 9 Item 16 4 3 3 4/5 - 2/5 Item 14 Skills Skills 1, 3, 4, 7 11/8 - 1/8 3/5 - 4/5 1,2,3,&4 & 5 Item 17 Item 12 Skills 1, 2, 3, 3 2 4 2 4 1 2 1/2 - 3/2 1/3 - 4/3 1/3 - 5/3 - 1/3 4, & 5 Item 4 Item 11 Item 20 Item 15 4 2 4 2 4/12 - 7/12 1/10 - 8/10 Item 10 Item 18 3 2 4 3 - 1/5 - 4/3 Item 7 Item 19
Bayes net for mixed number subtraction Probabilities before observations
Bayes net for mixed number subtraction Probabilities after observations
Bayes net for mixed number subtraction For mixture of strategies across people
Extensions (1) • More general … • Student models (continuous vars, uses) • Observable variables (richer, times, multiple) • Structural relationships (e.g., disjuncts) • Stochastic relationships (e.g., NIDA, fusion) • Model-tracing temporary structures (VanLehn)
Extensions (2) • Strategy use • Single strategy (as discussed above) • Mixture across people (Rost, Mislevy) • Mixtures within people (Huang: MV Rasch) • Huang’s example of last of these follows…
What are the forces at the instant of impact? 20 mph 20 mph • A. The truck exerts the same amount of force on the car as the car exerts on the truck. • B. The car exerts more force on the truck than the truck exerts on the car. • C. The truck exerts more force on the car than the car exerts on the truck. • D. There’s no force because they both stop.
What are the forces at the instant of impact? 10 mph 20 mph • A. The truck exerts the same amount of force on the car as the car exerts on the truck. • B. The car exerts more force on the truck than the truck exerts on the car. • C. The truck exerts more force on the car than the car exerts on the truck. • D. There’s no force because they both stop.
What are the forces at the instant of impact? 10 mph 1 mph • A. The truck exerts the same amount of force on the fly as the fly exerts on the truck. • B. The fly exerts more force on the truck than the truck exerts on the fly . • C. The truck exerts more force on the fly than the fly exerts on the truck. • D. There’s no force because they both stop.
The Andersen/Rasch Multidimensional Model for m strategy categories is an integer between 1 and m; is the strategy person i uses for item j; is the pth element in the person i’s vector-valued parameter; is the pth element in the item j’s vector-valued parameter.
Conclusion: The Importance of Coordination… • Among psychological model, task design, and analytic model • (KWSK “assessment triangle”) • Tatsuoka’s work is exemplary in this respect: • Grounded in psychological analyses • Grainsize & character tuned to learning model • Test design tuned to instructional options
Conclusion: The Importance of Coordination… • With purpose, constraints, resources • Lower expectations for retrofitting existing tests designed for different purposes, under different perspectives & warrants. • Information & Communication Technology (ICT) project at ETS • Simulation-based tasks • Large scale • Forward design