Student simulation and evaluation

Student simulation and evaluation DOD meeting Hua Ai (hua@cs.pitt.edu) 03/03/2006

Outline • Motivations • Backgrounds • Corpus • Student Simulation Model • Comparisons • Conclusions & Future Work

Motivations • For larger corpus • Reinforcement Learning (RL) is used to learn the best policy for spoken dialogue systems automatically • Best strategy may often not even be present in small dataset • For cheaper corpus • Human subjects are expensive

Dialog Manager Simulated User Reinforcement Learning Strategy Dialog Corpus Simulation models Strategy learning using a simulated user (Schatzmann et al., 2005)

Backgrounds (1) • Education community • Focusing on changes of student’s inner-brain knowledge representation forms • Usually not dialogue based • Simulated students for (Venlehn et al., 1994) • tutor training • Collaborative learning

Backgrounds (2) • Dialogue community • Focusing on interactions and dialogue behaviors • Simulated users have limited actions to take • (Schatzmann et al., 2005) • Simulating on DA level

Corpus (1) • Spoken dialogue physics tutor (ITSPOKE)

(T) Question (T) Question (S) Answer (S) Answer Dialogue (T) Q (S) A … Dialogue (T) Q (S) A … Essay revision Essay revision Dialogue Dialogue Corpus (2) 5 problems • Tutoring procedure … …

Corpus (3) • Tutor’s behaviors • Defined in KCD (Knowledge Construction Dialogues) Correct Incorrect/ Partially Correct

Corpus (4) f03:s05 Different groups of subjects

Simulation Models (1) • Simulating on word level • Student’s have more complex behaviors • DA info alone isn’t enough for the system • Two models trained on two corpus 03ProbCorrect ProbCorrect f03 03Random 05ProbCorrect Random s05 05Random

Simulation Models (2) • ProbCorrect Model • Simulates average knowledge level of real students • Simulate meaningful dialogue behaviors • Random Model • Non-sense • As a contrast

Real corpus question1 Answer1_1 (c) Answer1_2 (ic) Answer1_3 (ic) question2 Answer2_1 (c) Answer2_2 (ic) Candidate Ans: For question1 c:ic = 1:2 c: Answer1_1 ic: Answer1_2 Answer1_3 For question2 c:ic = 1:1 c: Answer2_1 ic Answer2_2 • ProbCorrect Model: • Question 1 • Answer: • Choose to give a c/ic answer with the same average probability as real student • Randomly choose one answers from the corresponding answer set ProbCorrect Model

HC03&05 Question1 Answer1_1 Answer1_2 Answer1_3 Answer1_4 Question2 Answer2_1 Answer2_2 Candidate Ans: 1) Answer1_1 2) Answer1_2 3) Answer1_3 4) Answer1_4 5) Answer2_1 6) Answer2_2 Big random Model: Question i: Answer: any of the 6 answers with the same probability (Regardless the question!) Random Model

Experiments • Comparisons between real corpora • Comparisons between real & simulated corpora • Comparisons between simulated corpora

Real Corpora Comparisons (1) • Evaluation metrics • High-level dialog features • Dialog style and cooperativeness • Dialog Success Rate and Efficiency • Learning Gains

Real corpora comparisons (2) • High-level dialog features

Real corpora comparisons (3) • Dialogue style features

Real corpora comparisons (3) • Dialogue success rate

Real corpora comparisons (4) • Learning gains features

Results • Differences captured by these simple metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., 2005) • Differences could be due to different user population

Real Vs Simulated Corpora Comparisons

Results (1) • Most of the measurements are able to distinguish between Random and ProbCorrect model • ProbCorrect model generates more realistic behaviors • We can’t conclude on the power of these metrics since the two simulated corpus are really different

Results (2) • Differences between real and random models are captured clearly, but differences between real and ProbCorrect is not clear • We don’t expect this simple model to give very real corpus. It’s surprising that the differences are small

Results (3) • S05 variety > f03 variety  05probCorrect variety > 03probCorrect variety • However, we don’t get significantly more varieties in the simulated corpus than the real ones • Could be the computer tutor is simple (c/ic) • We’re using the same candidate answer set

Results (4) • ProbCorrect models trained on different real corpora are quite different • The ProbCorrect model is more similar to the real corpus it is trained from than to the other real corpus

Comparisons between simulated dialogues with different dialogue structure

Results • Larger differences between the two simulated corpora in prob7 than in prob34 • Dialogue structure of prob34 is more restricted • The power of these simple metrics is restricted by the dialogue structure

Conclusions • The simple measurements can distinguish between • real corpora • Different population • simulated and real corpora • To different extent • simulated corpora • Different models • Trained on different corpora • Limited to different Dialog structure

Future work • Explore “deep” evaluation metrics • Test simulated corpus on policy • More simulation models • More human features • Emotion, learning • Special cases • Quick learners, slow learners

Student simulation and evaluation

Student simulation and evaluation

Presentation Transcript

Evaluation: Student

Student magazine Evaluation

Evaluation of Student Learning

Airway Simulation Evaluation Tutorial

Student Computer Simulation

GRADING AND STUDENT EVALUATION

Evaluation Methodology and Simulation Scenarios

Simulation and Evaluation Framework for Manycore Architectures

Student Evaluation and Psychological Services Advisories

Student Evaluation Advisory

Simulation and Evaluation of Various Block Assignments

Evaluation methodology and simulation scenarios

Student, Teacher, and Principal Evaluation

Student Evaluation and Psychological Services Advisories

Innovative Student Evaluation

Analysis and Evaluation of Student Data

Evaluation Criteria and Simulation Scenarios

Student Teacher Evaluation

ECE 466/658: Performance Evaluation and Simulation

Energy Efficiency Evaluation and Simulation Model

Student simulation and evaluation

Evaluation methodology and simulation scenarios