220 likes | 250 Views
Towards Identifying Unresolved Discussions in Student Online Forums. Jihie Kim, Jia Li, and Taehwan Kim Information Sciences Institute/ University of Southern California http://ai.isi.edu/discourse jihie@isi.edu. “ Talk to as many other people as possible.
E N D
Towards Identifying Unresolved Discussions in Student Online Forums Jihie Kim, Jia Li, and Taehwan Kim Information Sciences Institute/ University of Southern California http://ai.isi.edu/discourse jihie@isi.edu
“Talk to as many other people as possible. CS is learned by talking to others, not by reading, or so it seems to me now.” -- Advice from an undergraduate computer science student http://www-scf.usc.edu/~csci402/
Discussion Board and Corpora Extensible open-source discussion board (phpBB) serves as a platform for bridging ISI research and USC teaching practice 15 semesters running… CS and Engineering courses Undergrad/Graduate USC/Non-USC Almost 800 students Over 8000 messages
Student Messages in an Undergraduate Operating Systems Course Text is incoherent and ungrammatical. Problem description: Non-factoid questions are difficult to identify, dependent on context, and may include multiple sentences or paragraphs. Answers require explanations.
Thread Length Distribution # of threads Data from an undergraduate CS Course # of messages # of messages Data from a graduate CS Course Threads are often very short, many consisting of only 1-2 messages Students jump into programming details without understanding larger picture or related concepts TA and instructors are not always available to fully guide interactions Need of Discussion Assessment and Scaffolding
PedDiscourse Research Discussion Assessment Which discussions need instructor attention? Who is asking and answering questions? What topics are discussed when? Discussion Scaffolding Promote reflection Promote collaboration among students
Individual messages Topic, quantity Relations among messages Response/Replies Roles that a message play Discussion threads Thread lengths and quantity Discussion Topic Discussion Focus … Related course data Notes, web pages, readings Assignments and projects Modeling discussion threads . . .
Discussion Assessment Which discussions need instructor attention? Identify roles that individual messages play (ques, ans, ack, etc.) Analyze patterns of message roles Find discussion threads without an answer for the initial question
Roles of individual messages Use Searle’s theory of Speech Acts (Searle, 1969) to model threaded discussions Speech Acts Choose SAs to use Question (QUES), Answer or Suggestion (ANS-SUG), Correction or Objection (Neg-Ack), ….. Provide relationship between a pair of messages Multiple SA’s per pair of messages in thread A single message can be related (via SAs) with multiple messages
Speech Acts (SAs) in a discussion thread QUES The Professor gave us 2 methods for forking threads from the main program. One was ....... The other was to ......... When you fork a thread where does it get created and take its 8 pages from? Do you have to calculate ......? If so how? Where does it store its PCReg .......? Any suggestions would be helpfule. S1 ANS-SUG read the student documentation for the Fork syscall S2 ISSUE, QUES I am still confused. I understand it is in the same address space as the parent process, where do we allocate the 8 pages of mem for it? And how do we keep track of .....? … I am sure it is a simple concept that I am just missing. S1 ANS-SUG If you use the first implementation...., then you'll have a hard limit on the number of threads....If you use the second implementation, you need to.... Either way, you'll need to implement the AddrSpace::NewStack() function and make sure that there is memory available. S3
Speech Act categories explored Code 1 Kappa: 0.54 Code 3 Code 2 Kappa: 0.70 Kappa: 0.58
Data cleaning and pre-processing Discussion data Noisy, Incoherent High variations – messages may contain answers or suggestions in the form of questions Informal dialect used by students Data pre-processing – Tokenization, Stemming, other filtering steps applied (e.g. Removing programming code existing within messages, pluralized words,…etc….) Data Categorization Transform/Replace commonly occurring words/word-sequences with categories Apostrophe words ( ‘re, ‘ve, ‘m…) Technical terms existing within messages replaced by TECH_TERM - (from commonly used technical terms in course) Don’t replace pronouns (“you can” in ANS vs. “I can”)
Features for SA Classification F1: Cue phases and their positions (e.g. “Thank” position) F2: Message Position F3: Previous Message Information F4: Poster Class F5: Poster Change F6: Message Length Example TBL rules
Profiling discussion threads with SAs (Q1) Were all questions answered? (Y/N) (Q2) Were there any issues or confusion? (Y/N) (Q3) Were those issues or confusions resolved? (Y/N)
Thread classification with SA classifiers (Q1) Were all questions answered? (Y/N) (Q2) Were there any issues or confusion? (Y/N) (Q3) Were those issues or confusions resolved? (Y/N) • Feature Set1: Whether there was an [SA] in the thread • Feature Set2: Whether the last message in the thread included [SA] • SVM Classification results with • human annotated SAs (b) SVM Classification results with system generated SAs
Direct thread classification without SA classifiers (Q1) Were all questions answered? (Y/N) (Q2) Were there any issues or confusion? (Y/N) (Q3) Were those issues or confusions resolved? (Y/N) • F1’: cue phrases and their positions (last message or not) in the thread • With SAs (b) Direct classification
Summary and Discussion Identifying unresolved discussions • Discerning speech acts (SAs) in student online discussions • Classify discussion threads with SA as features • Compare SA-based classification and direct thread classification with phrase features • SA-based features may help some difficult cases • E.g. Longer threads with more than one questions raised
Related Work • Pedagogical/tutorial dialogue Instructional discourse modeling (Yuan et al., 2008; Graesser et al., 2005; McLaren et al., 2007; Boyer et al., 2008; Fossati 2008; Litman et al., 2003) • Dialogue modeling in email messages or blog (e.g. AAAI 2008 workshop on Enhanced Messaging) • Email speech acts • Requests and commitments • Handling noisy data and high variance in text (Knoblock et al., 2007) • Course topic and task modeling using information extraction techniques (Roy et al. 2008; Jovanovic et al., 2006 ) • Trace student e-learning activities (Israel and Aiken, 2007; Dringus and Ellis, 2005)
Ongoing Work: Discussion Assessment • Discussion thread pattern and phase analysis • question, understanding, solving and closing • Discussion topic analysis • Coherency of discussion topics • Student profiling • Information providers (peer mentors) vs. information seekers • Information flow and influence network among participants • Use of workflows (distributed systems) for large-scale assessment • E.g. participation changes over several semesters
Supported by National Science Foundation (NSF) More details available at http://ai.isi.edu/discourse Email: jihie@isi.edu