Some Useful Design Tactics for Mining ITS Data

Some Useful Design Tactics for Mining ITS Data • Jack Mostow • Project LISTEN (www.cs.cmu.edu/~listen) • Carnegie Mellon University • Funding: National Science Foundation • ITS 04 Workshop on Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes, Maceio, Brazil 1

Outline • Project LISTEN’s Reading Tutor • Modify tutor to get mineable data • Map data stream to analyzable data set • Mine data set to discover insights 2

Project LISTEN’s Reading Tutor (video) 3

Project LISTEN’s Reading Tutor (video) • John Rubin (2002). The Sounds of Speech (Show 3). On Reading Rockets (Public Television series commissioned by U.S. Department of Education). Washington, DC: WETA. • Available at www.cs.cmu.edu/~listen. 4

Tutoring: Dr. Joseph Beck, mining tutorial data Prof. Albert Corbett, cognitive tutors Prof. Rollanda O’Connor, reading Prof. Kathy Ayres, stories for children Joe Valeri, activities and interventions Becky Kennedy, linguist Listening: Dr. Mosur Ravishankar, recognizer Dr. Evandro Gouvea, acoustic training John Helman, transcriber Programmers: Andrew Cuneo, application Karen Wong, Teacher Tool Field staff: Dr. Roy Taylor Kristin Bagwell Julie Sleasman Grad students: Hao Cen, HCI Cecily Heiner, MCALL Peter Kant, Education Shanna Tellerman, ETC Plus: Advisory board Research partners DePaul UBC U. Toronto Schools Thanks to fellow LISTENers 5

2003-2004 database: 9 schools > 200 computers > 50,000 sessions > 1.5M tutor responses > 10M words recognized Embedded experiments Randomized trials Project LISTEN’s Reading Tutor: A rich source of experimental data 6

Modify tutor to get mineable data • Log operations at grain size and level of interest • Click <x, y> at time t: motor control • Click “Goldilocks”: item selection • Reify operations to log them analyzably • Handwriting or speech  typed input • Freehand drawing  graphical palette (Geometry Tutor) • Free-form responses  menu selection (Self 88) • Natural language  sentence starters (Goodman 03) • Time student and tutor actions • Time allocation reflects motivation (ITS 02) • Hasty responses indicate guessing (TICL 04) • Latency reflects automaticity (TICL 04) 7

Modify tutor: add relevant data • Randomize tutorial decisions • What skill to test, what help to give • Probe skills • Assess cognitive development (Arroyo 00) • Test vocabulary words (IJAIE 01) • Insert automated comprehension questions (TICL 04) • Import student data • Gender, age, IQ (Shute 96) • Prior knowledge (Corbett 00) • Pretest scores (TICL 04) • Hand-label when appropriate • Transcribe (some) spoken input (FLET 04) 8

Modify tutor: an example • Randomize: explain some new words but not others. • Probe: test each new word the next day. • Did kids do better on explained vs. unexplained words? • Overall: NO; 38%  36%, N = 3,171 trials (IJAIE 01). • Rare, 1-sense words tested 1-2 days later: YES! 44% >> 26%, N = 189. 9

Map data stream to data set:structure data into a single type • Data stream: heterogeneous events over time • Data set: elements with the same features • Segment into shorter episodes • Tutorial action(s) + student response (Beck 00) • Slice into narrower strands • Successive encounters of a specific word (AMLDP 98) • Successive instances of a specific skill (learning curves) • Measure aggregated events • Allocation of time among activities (ITS 02) • Formulate data as experimental trials • Context where the trial occurred • Decision made in this trial • Outcome based on subsequent events 10

Map data stream to data set:Formulate data as experimental trials Student isreading a story ‘People sit down and …’ Student needs help on a word Student clicks ‘read.’ Tutor chooses what help to give Decision (randomized) Student continues reading ‘… read a book.’ Time passes… Student sees word in a later sentence ‘I love to read stories.’ Outcome: read fluently? • Data stream: Context: 11

Map data stream to data set: trials • Context: Decision: Outcome: 12

Mine data set to make discoveries • Count outcome frequency • Success rate of each help type (ICALL 04) • Fit a parametric model • Knowledge tracing (Corbett 95) • Train a model • Statistics, e.g. regression (TICL 04) • Machine learning, e.g. decision trees (AIED 01) 13

Best: Rhymes With 69.2% ± 0.4% Worst: Recue 55.6% ± 0.4% Compare within level to control for word difficulty. Supplying the word helped best in the short term… But rhyming hints had longer lasting benefits. Count outcome frequency: which help types worked best? 14

Summary: modify, map, mine. • Modify tutor to make data mineable. • Log, reify, time, hand-label, import, probe, randomize. • Map data streams to data sets. • Segment, slice, measure. • Mine data set to make discoveries. • Count, fit, train. • See videos, papers, etc. at www.cs.cmu.edu/~listen. • Thank you! Questions? 15

Modify tutor to get mineable data word features 16

Structure of Reading Tutor database Reading Tutor Student Login List readers Session List stories Pick stories Story Encounter Show one sentence at a time Read sentence Sentence Encounter Listens and helps Read each word Word Encounter 17

Context where the trial occurred Decision made in this trial Outcome based on subsequent events Map data stream to data set: formulate data as experimental trials 18

Try to predict subset Grade 1-2 level 1-6 prior encounters Selected data 53 students 175,961 words 29,278 help requests Train predictive model Count help requests 5x Predict other kids’ data 71% accuracy Learning curves for students’ help requests 19

Whole word: 24,841 Say In Context 56,791 Say Word Decomposition: 6,280 Syllabify 14,223 Onset Rime 19,677 Sound Out 22,933 One Grapheme Analogy: 13,165 Rhymes With 13,671 Starts Like Semantic: 14,685 Recue 2,285 Show Picture 488 Sound Effect Which types stood out? Best: Rhymes With 69.2% ± 0.4% Worst: Recue 55.6% ± 0.4% Count outcome frequency(average success rate 66.1%) Example: ‘People sit down and read a book.’ 20

Some Useful Design Tactics for Mining ITS Data

Some Useful Design Tactics for Mining ITS Data

Presentation Transcript

Some Useful Distributions

Comparison of data systems – some useful features

Some Useful Instructions

Data Mining Principles (required for cw, useful for any project…) - a reminder (?)

Mining for ADE Data

Some Useful Circuits

A Web Crawler Design for Data Mining

Data Mining Principles (required for cw, useful for any project…) - a reminder (?)

VODKA: as VO-tool can be useful for data mining science

Some useful links

Document Data Mining Design Review

Data mining: some basic ideas

Data mining and its immense use

Some Useful Tips for Success| Pahals.in

Some Proven Tactics For Forex Trading Beginners

A Useful Guide Before Choosing Neural Networks For Data Mining

Data Mining for Data Streams

Some Useful Design Tactics for Mining ITS Data

Some Examples of Visualization in Data Mining

Some Useful Distributions

Data Mining and Its Applications

Some Useful Terms