1 / 24

Anjo Anjewierden hdddddtp://anjo.blogs

Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes. Anjo Anjewierden, Bas Kollöffel and Casper Hulshof. Anjo Anjewierden hdddddtp://anjo.blogs.com. Department of Instructional Technology

fnaquin
Download Presentation

Anjo Anjewierden hdddddtp://anjo.blogs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes Anjo Anjewierden, Bas Kollöffel and Casper Hulshof AnjoAnjewierden hdddddtp://anjo.blogs.com Department of Instructional Technology Faculty of Behavourial Sciences University of Twente The Netherlands

  2. Overview (1) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions

  3. Motivation • Chats can structure collaborative learning • Doing vs. doing and discussing with other learners • Current use of chats is limited to • Logging the messages for later analysis • Our goals related to chat analysis • Provide adaptive feedback based on on-line analysis of the chats • Make the learner part of the simulation by visualising her actions and behaviour (e.g. through avatars)

  4. Approach • Define models by which messages can be classified • One model is based on term usage • Another model is based on the grammar • Later we want to combine the models to find "semantic patterns" • Applying the models to each message of a particular chat it can be assigned a class • Aggregation of class assignments over time is what an avatar can visualise

  5. Inquiry learning

  6. Learning environment • Both learners see the same simulation on two different screens • One learner can run the simulation • Learners use chat to discuss: • Simulations to run, variable settings, etc. • Interpretation of the results of simulations • Which answer to give to a question • etc.

  7. Overview (2) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions

  8. Classifications of chats • Which functions should we distinguish in chat messages? • We use a classification proposed by Gijlers and De Jong (2005): • Regulative: planning, monitoring, agreeing, etc. • Domain: transformative • Technical: about the learning environment • Social: greetings, compliments and other off-task

  9. Examples • Regulative: • Ok // Yes // Next • I think the answer is 3 • Perhaps we should try again • Domain: • The momentum becomes negative • Speed of the red ball is 2 m/s • Technical: • Move the mouse to the right • Social: • Well done partner

  10. Data used • Chats collected by Nadira Saab for her Ph.D. research (University of Amsterdam, 2005) • Domain: simulations related to collisions (e.g. momentum for elastic and inelastic collisions) • Language: Dutch • 78 chat sessions • 16879 chat messages

  11. Data normalisation • Messages are extremely noisy • Misspellings (accidental and on purpose) • Chat language (w8 = wait) • See paper for Dutch examples • Messages have been manually corrected to obtain words that can be found in the dictionary • Grammar has not been corrected

  12. Overview (3) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions

  13. Types of features • For each class one can define • Characterising terms (domain: speed, increases) • Grammatical patterns: • the speed increases (<article> <noun> <verb>) • I think (<personal pronoun> <verb>) • Both terms and syntactic patterns are used by humans to classify the messages • Data mining • Discover the terms and patterns automatically

  14. Words as features • Each word in a message is a feature • Order is not taken into account • Smileys, !, ?, integers are separate words • Example • The answer is 5!!!! :-) • Features: { answer, is, the, #, !, <smiley> } • (where # is any integer)

  15. Grammar as features • Each message is parsed by a part-of-speech (POS) tagger • Determines role words play in a message (noun, verb, etc.) • POS-sequences are a feature, if: • They occur at least 20 times, and • They do not fully overlap a longer sequence • Example: • the speed: {<article>, <noun>, <article> <noun>} • Remove full overlaps: {<article> <noun>}

  16. Naive Bayes classifier • Standard Naive Bayes classifier is used • Once for the word features • Once for the grammar features • See paper for technical details

  17. Overview (4) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions

  18. Experiment • Four researchers each classified 400 messages • Randomly selected with a bias towards longer messages (nearly all short messages are regulative) • 1280 unique messages were classified • Expert manually checked whether the classifications were "correct" • Result was used to create two classification models (words, grammar) using Naive Bayes

  19. Overview (5) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions

  20. Results by demonstration

  21. Overview (6) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions

  22. Conclusions • Automatic classification of messages • Naive Bayes works surprisingly well • Even for a small feature set per item (chat) • And for a large number of features over all items • Sufficiently accurate for • The classes we used • Visualising aggregated learner behaviour through avatars • Misspellings are a source of concern

  23. Future work • Combining manual and automatic classification • Started: see interaction classification tool • Can speed up chat coding in general (also for research) • Find "semantic patterns" in chats • Based on combining information from the word and grammar models • Relate these "semantic patterns" to learner actions in the simulation environment

  24. Thank you! • And thanks to • Nadira Saab • Hannie Gijlers • Petra Hendrikse • Sylvia van Borkulo • Jan van der Meij • Wouter van Joolingen • and the anonymous reviewers

More Related