290 likes | 426 Views
A Machine Learning Approach for Automatic Student Model Discovery. Nan Li, N oboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University. Student Model. A set of knowledge components ( KCs )
E N D
A Machine Learning Approach for Automatic Student Model Discovery Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University
Student Model • A set of knowledge components (KCs) • Encoded in intelligent tutors to model how students solve problems • Example: What to do next on problems like 3x=12 • A key factor behind instructional decisions in automated tutoring systems
Student Model Construction • Traditional Methods • Structured interviews • Think-aloud protocols • Rational analysis • Previous Automated Methods • Learning factor analysis (LFA) • Proposed Approach • Use a machine-learning agent, SimStudent, to acquire knowledge • 1 production rule acquired => 1 KC in student model (Q matrix) Require expert input. Highly subjective. Within the search space of human-provided factors. Independent of human-provided factors.
A Brief Review of SimStudent • A machine-learning agent that • acquires production rules from • examples & problem solving experience • given a set of feature predicates & functions
Production Rules • Skill divide (e.g. -3x = 6) • What: • Left side (-3x) • Right side (6) • When: • Left side (-3x) does not have constant term => • How: • Get-coefficient (-3) of left side (-3x) • Divide both sides with the coefficient • Each production rule is associated with one KC • Each step (-3x = 6) is labeled with one KC, decided by the production applied to that step • Original model required strong domain-specific operators, like Get-coefficient Does not differentiate important distinctions in learning (e.g., -x=3 vs -3x = 6)
Deep Feature Learning • Expert vs Novice (Chi et al., 1981) • Example: What’s the coefficient of -3x? • Expert uses deep functional features to reply -3 • Novice may use shallow perceptual features to reply 3 • Model deep feature learning using machine learning techniques • Integrate acquired knowledge into SimStudent learning • Remove dependence on strong operators & split KCs into finer grain sizes
Feature Recognition asPCFG Induction • Underlying structure in the problem Grammar • Feature Non-terminal symbol in a grammar rule • Feature learning task Grammar induction • Student errors Incorrect parsing
Learning Problem • Input is a set of feature recognition records consisting of • An original problem (e.g. -3x) • The feature to be recognized (e.g. -3 in -3x) • Output • A probabilistic context free grammar (PCFG) • A non-terminal symbol in a grammar rule that represents target feature
A Two-Step PCFG Learning Algorithm • Greedy Structure Hypothesizer: • Hypothesizes grammar rules in a bottom-up fashion • Creates non-terminal symbols for frequently occurred sequences • E.g. – and 3, SignedNumber and Variable • Viterbi Training Phase: • Refinesrule probabilities • Occur more frequently Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990)
Example of Production Rules Before and After integration • Extend the “What” Part in Production Rule • Original: • Skill divide (e.g. -3x = 6) • What: • Left side (-3x) • Right side (6) • When: • Left side (-3x) does not have constant term • => • How: • Get coefficient (-3) of left side (-3x) • Divide both sides with the coefficient (-3) • Extended: • Skill divide (e.g. -3x = 6) • What: • Left side (-3, -3x) • Right side (6) • When: • Left side (-3x) does not have constant term • => • How: • Get coefficient (-3) of left side (-3x) • Divide both sides with the coefficient (-3) • Fewer operators • Eliminate need for domain-specific operators
Original: Skill divide (e.g. -3x = 6) What: Left side (-3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get coefficient (-3) of left side (-3x) Divide both sides with the coefficient (-3)
Experiment Method • SimStudent vs. Human-generated model • Code real student data • 71 students used a Carnegie Learning Algebra I Tutor on equation solving • SimStudent: • Tutored by a Carnegie Learning Algebra I Tutor • Coded each step by the applicable production rule • Used human-generated coding in case of no applicable production • Human-generated model: • Coded manually based on expertise
How well two models fit with real student data • Used Additive Factor Model (AFM) • An instance of logistic regression that • Uses each student, each KC and KC by opportunity interaction as independent variables • To predict probabilities of a student making an error on a specific step
An Example of Split in Division • Human-generated Model • divide: Ax=B & -x=A • SimStudent • simSt-divide: Ax=B • simSt-divide-1: -x=A -x=A Ax=B
Production Rules for Division • Skill simSt-divide (e.g. -3x = 6) • What: • Left side (-3, -3x) • Right side (6) • When: • Left side (-3x) does not have constant term • How: • Divide both sides with the coefficient (-3) • Skill simSt-divide-1 (e.g. -x = 3) • What: • Left side (-x) • Right side (3) • When: • Left side (-x) is of the form -v • How: • Generate one (1) • Divide both sides with -1
An Example without Spit in Divide Typein • Human-generated Model • divide-typein • SimStudent • simSt-divide-typein
SimStudentvsSimStudent + Feature Learning • SimStudent • Needs strong operators • Constructs student models similar to human-generated model • Extended SimStudent • Only requires weak operators • Split KCs into finer grain sizes based on different parse trees • Does Extended SimStudent produce a KC model that better fits student learning data?
Results • Significance Test • SimStudent outperforms the human-generated model in 4260 out of 6494 steps • p < 0.001 • SimStudent outperforms the human-generated model across 20 runs of cross validation • p < 0.001
Summary • Presented an innovative application of a machine-learning agent, SimStudent, for an automatic discovery of student models. • Showed that a SimStudent generated student model was a better predictor of real student learning behavior than a human-generate model.
Future Studies • Test generality in other datasets in DataShop • Apply this proposed approach in other domains • Stoichiometry • Fraction addition
Feature Recognition asPCFG Induction • Underlying structure in the problem Grammar • Feature Non-terminal symbol in a grammar rule • Feature learning task Grammar induction • Student errors Incorrect parsing
Learning Problem • Input is a set of feature recognition records consisting of • An original problem (e.g. -3x) • The feature to be recognized (e.g. -3 in -3x) • Output • A probabilistic context free grammar (PCFG) • A non-terminal symbol in a grammar rule that represents target feature
A Computational Model of Deep Feature Learning • Extended a PCFG Learning Algorithm (Li et al., 2009) • Feature Learning • Stronger Prior Knowledge: • Transfer Learning Using Prior Knowledge
A Two-Step PCFG Learning Algorithm • Greedy Structure Hypothesizer: • Hypothesizes grammar rules in a bottom-up fashion • Creates non-terminal symbols for frequently occurred sequences • E.g. – and 3, SignedNumber and Variable • Viterbi Training Phase: • Refinesrule probabilities • Occur more frequently Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990)
Feature Learning • Build most probable parse trees • For all observation sequences • Select a non-terminal symbol that • Matches the most training records as the target feature
Transfer Learning Using Prior Knowledge • GSH Phase: • Build parse trees based on some previously acquired grammar rules • Then call the original GSH • Viterbi Training: • Add rule frequency in previous task to the current task 0.5 0.33 0.5 0.66