240 likes | 249 Views
Explore how data mining methods can be used to analyze chats in educational settings, providing adaptive feedback and visualizing learner actions. Learn about chat classifications, experimental methods, and results. Discover how terms and grammar patterns can be utilized as features for automated analysis.
E N D
Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes Anjo Anjewierden, Bas Kollöffel and Casper Hulshof AnjoAnjewierden hdddddtp://anjo.blogs.com Department of Instructional Technology Faculty of Behavourial Sciences University of Twente The Netherlands
Overview (1) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Motivation • Chats can structure collaborative learning • Doing vs. doing and discussing with other learners • Current use of chats is limited to • Logging the messages for later analysis • Our goals related to chat analysis • Provide adaptive feedback based on on-line analysis of the chats • Make the learner part of the simulation by visualising her actions and behaviour (e.g. through avatars)
Approach • Define models by which messages can be classified • One model is based on term usage • Another model is based on the grammar • Later we want to combine the models to find "semantic patterns" • Applying the models to each message of a particular chat it can be assigned a class • Aggregation of class assignments over time is what an avatar can visualise
Learning environment • Both learners see the same simulation on two different screens • One learner can run the simulation • Learners use chat to discuss: • Simulations to run, variable settings, etc. • Interpretation of the results of simulations • Which answer to give to a question • etc.
Overview (2) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Classifications of chats • Which functions should we distinguish in chat messages? • We use a classification proposed by Gijlers and De Jong (2005): • Regulative: planning, monitoring, agreeing, etc. • Domain: transformative • Technical: about the learning environment • Social: greetings, compliments and other off-task
Examples • Regulative: • Ok // Yes // Next • I think the answer is 3 • Perhaps we should try again • Domain: • The momentum becomes negative • Speed of the red ball is 2 m/s • Technical: • Move the mouse to the right • Social: • Well done partner
Data used • Chats collected by Nadira Saab for her Ph.D. research (University of Amsterdam, 2005) • Domain: simulations related to collisions (e.g. momentum for elastic and inelastic collisions) • Language: Dutch • 78 chat sessions • 16879 chat messages
Data normalisation • Messages are extremely noisy • Misspellings (accidental and on purpose) • Chat language (w8 = wait) • See paper for Dutch examples • Messages have been manually corrected to obtain words that can be found in the dictionary • Grammar has not been corrected
Overview (3) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Types of features • For each class one can define • Characterising terms (domain: speed, increases) • Grammatical patterns: • the speed increases (<article> <noun> <verb>) • I think (<personal pronoun> <verb>) • Both terms and syntactic patterns are used by humans to classify the messages • Data mining • Discover the terms and patterns automatically
Words as features • Each word in a message is a feature • Order is not taken into account • Smileys, !, ?, integers are separate words • Example • The answer is 5!!!! :-) • Features: { answer, is, the, #, !, <smiley> } • (where # is any integer)
Grammar as features • Each message is parsed by a part-of-speech (POS) tagger • Determines role words play in a message (noun, verb, etc.) • POS-sequences are a feature, if: • They occur at least 20 times, and • They do not fully overlap a longer sequence • Example: • the speed: {<article>, <noun>, <article> <noun>} • Remove full overlaps: {<article> <noun>}
Naive Bayes classifier • Standard Naive Bayes classifier is used • Once for the word features • Once for the grammar features • See paper for technical details
Overview (4) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Experiment • Four researchers each classified 400 messages • Randomly selected with a bias towards longer messages (nearly all short messages are regulative) • 1280 unique messages were classified • Expert manually checked whether the classifications were "correct" • Result was used to create two classification models (words, grammar) using Naive Bayes
Overview (5) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Overview (6) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Conclusions • Automatic classification of messages • Naive Bayes works surprisingly well • Even for a small feature set per item (chat) • And for a large number of features over all items • Sufficiently accurate for • The classes we used • Visualising aggregated learner behaviour through avatars • Misspellings are a source of concern
Future work • Combining manual and automatic classification • Started: see interaction classification tool • Can speed up chat coding in general (also for research) • Find "semantic patterns" in chats • Based on combining information from the word and grammar models • Relate these "semantic patterns" to learner actions in the simulation environment
Thank you! • And thanks to • Nadira Saab • Hannie Gijlers • Petra Hendrikse • Sylvia van Borkulo • Jan van der Meij • Wouter van Joolingen • and the anonymous reviewers