560 likes | 690 Views
Boredom Across Activities, and Across the Year, within Reasoning Mind. William L. Miller, Ryan Baker, Mathew Labrum, Karen Petsche , Angela Z. Wagner. In recent years. Increasing interest in modeling more about students than just what they know. In recent years.
E N D
Boredom Across Activities, and Across the Year, within Reasoning Mind William L. Miller, Ryan Baker, Mathew Labrum, Karen Petsche, Angela Z. Wagner
In recent years • Increasing interest in modeling more about students than just what they know
In recent years • Increasing interest in modeling more about students than just what they know • Can we assess a broad range of constructs
In recent years • Increasing interest in modeling more about students than just what they know • Can we assess a broad range of constructs • In a broad range of contexts
Boredom • A particularly important construct to measure
Boredom is • Common in real-world learning (D’Mello, 2013)
Boredom is • Common in real-world learning (D’Mello, 2013) • Associated with worse learning outcomes in the short-term (Craig et al., 2004; Rodrigo et al., 2007)
Boredom is • Common in real-world learning (D’Mello, 2013) • Associated with worse learning outcomes in the short-term (Craig et al., 2004; Rodrigo et al., 2007) • Associated with worse course grades and standardized exam performance (Pekrun et al., 2010; Pardos et al., 2013)
Boredom is • Common in real-world learning (D’Mello, 2013) • Associated with worse learning outcomes in the short-term (Craig et al., 2004; Rodrigo et al., 2007) • Associated with worse course grades and standardized exam performance (Pekrun et al., 2010; Pardos et al., 2013) • Associated with lower probability of going to college, years later (San Pedro et al., 2013)
Online learning environments • Offer great opportunities to study boredom in context • Very fine-grained interaction logs that indicate everything the student did in the system
Automated boredom detection • Can we detect boredom in real time, while a student is learning? • Can we detect boredom retrospectively, from log files?
Automated boredom detection • Can we detect boredom in real time, while a student is learning? • Can we detect boredom retrospectively, from log files? • Would allow us to study affect at a large scale • Figure out which content is most boring, in order to improve it
Affect Detection: Physical Sensors? • Lots of work shows that affect can be detected using physical sensors • Tone of voice (Litman & Forbes-Riley, 2005) • EEG (Conati & McLaren, 2009) • Posture sensor and video (D’Mello et al., 2007) • It’s hypothesized – but not yet conclusively demonstrated – that using physical sensors may lead to better performance than interaction logs alone
Sensor-free affect detection • Easier to scale to the millions of students who use online learning environments • In settings that do not have cameras, microphones, and other physical sensors • Home settings • have parents bought equipment? • can they set it up and maintain it? • Classroom settings • can school maintain equipment? • do students intentionally destroy equipment? • parent concerns and political climate
Sensor-free boredom detection • Has been developed for multiple learning environments • Problem solving tutors (Baker et al. 2012; Pardos et al. 2013) • Dialogue tutors (D’Mello et al. 2008) • Narrative virtual learning environments (Sabourin et al. 2011; Baker et al. 2014) • Science simulations (Paquette et al., 2014) • The principles of affect detection are largely the same across environments • But the behaviors associated with boredom differ considerably between environments
This talk • We discuss our work to develop sensor-free boredom detection for Reasoning Mind Genie 2 (Khachatryan et al, 2014) • Self-paced blended learning mathematics curriculum for elementary school students • Youngest population for sensor-free affect detection so far • Used by approximately 100,000 students a year
Reasoning Mind Genie 2 • Combines • Guided Study with a pedagogical agent “Genie” • Speed Games that support development of fluency • Used in schools 3-5 days a week for 45-90 minutes per day
(a) (c) Reasoning Mind Genie 2 (b)
Reasoning Mind Genie 2 • Better affect and more on-task behavior than most pedagogies, online or offline (Ocumpaughet al., 2013) • Still a substantial amount of boredom • Reducing boredom is a key goal
Role for affect detection • If we can detect boredom in log files • We can determine which content is more boring, and improve that content
Related Work • Evidence that specific design features associated with boredom in Cognitive Tutors for high school algebra (Doddannara et al., 2013)
Related Work • Evidence that specific design features associated with boredom in Cognitive Tutors for high school algebra (Doddannara et al., 2013) • Evidence that some disengaged behaviors increase during the year (Beck, 2005) • Important to verify that differences in affect due to actual content/design, not time of year
Approach to Boredom Detection • Collect “ground truth” data on student boredom, using field observations • Synchronize log data to field observations • Distill meaningful data features of log data, hypothesized to relate to boredom • Develop automated detector using classification algorithm • Validate detector for new students/new lessons/new populations
BROMP 2.0 Field Observations(Ocumpaugh et al., 2012) • Conducted through Android app HART (Baker et al., 2012) • Protocol designed to reduce disruption to student • Some features of protocol: observe with peripheral vision or side glances, hover over student not being observed, 20-second “round-robin” observations of several students, bored-looking people are boring • Inter-rater reliability around 0.8 for behavior, 0.65 for affect • 64 coders now certified in USA, Philippines, India
Data collection • 408 elementary school students
Data collection • Diverse sample important for model generalizability (Ocumpaugh et al., 2014) • 11 different 8th grade classes • 6 schools • 2 urban in Texas, predominantly African-American • 1 urban in Texas, predominantly Latino • 1 suburban in Texas, predominantly White • 1 suburban in Texas, mixed ethnicity/race • 1 rural in West Virginia, predominantly White
Affect coding • 3 expert coders observed each student using BROMP • Coded 5 categories of affect • Engaged Concentration • Boredom • Confusion • Frustration • ? • 4891 observations collected in RM classrooms
Building detectors • Observations were synchronized with the logs of the students interactions with RM, using HART app and internet time server • For each observation, a set of 93 meaningful features describing the student’s behavior was engineered • Computed on actions occurring during or preceding an observation (up to 20 seconds before)
Features: Examples • Individual action features • Whether an action was correct or not • How long the action took • Features across all past activity • Fraction of previous attempts on the current skill the student has gotten correct • Other known models applied to logs • Probability student knows skill (Bayesian Knowledge Tracing) • Carelessness • Moment-by-Moment Learning Graph
Automated detector of boredom • Detectors were built using RapidMiner 5.3 • For each algorithm the best features were selected using forward selection/backward elimination • Data was re-sampled to have more equal class frequencies; models were evaluated on original class distribution • Detectors were validated using 10-foldstudent-level cross-validation
Automated detector of boredom • Detectors were built using 4 machine learning algorithms that have been successful for building affect detectors in the past: • J48 • JRip • Step Regression • Naïve Bayes
Best One • Detectors were built using 4 machine learning algorithms that have been successful for building affect detectors in the past: • J48 • JRip • Step Regression • Naïve Bayes
Machine learning • Performance of the detectors was evaluated using • A’ • Given two observations, probability of correctly identifying which one is an example of a specific affective state and which one is not • A’ of 0.5 is chance level and 1 is perfect • Identical to Wilcoxon statistic • Very similar to AUC ROC (Area Under the Receiver-Operating Characteristic Curve)
Results • A’ = 0.64 • Compared to similar detectors in other systems, validated in similar stringent fashion
Using detectors • Model applied to entire year of data from these classrooms • 2,974,944 actions by 462 students • Includes 54 additional students not present during observations • Aggregation over pseudo-confidences rather than binary predictions • Retains more information
Apparent downward trend • Is it statistically significant?
Apparent downward trend • Is it statistically significant? • Yes. • Students are less bored later in the year • F-test controlling for student • p<0.001
Is it practically significant? • No. • r = -0.06
Is it practically significant? • No. • r = -0.06 • With large enough samples, anythingis statistically significant
Kind of a positive thing • At minimum, students aren’t getting more bored as the year goes on • In other systems, students get more disengaged as the year goes on (Beck, 2005) • And the overall level of boredom (~14%) is not very high
Beyond this • Curriculum is self-paced
Beyond this • Curriculum is self-paced • Which means that predicting boredom by date may obscure real variation
Beyond this • Curriculum is self-paced • Which means that predicting boredom by date may obscure real variation • Instead, look at boredom by learning objective
Predicting boredom by objective • p<0.001 • r=0.343
If we cluster objectives into two groups • “High boredom” • “Low boredom” • Ignoring the one point in between the two groups • Cohen’s D = 0.67