1 / 43

Cohesion and Learning in a Tutorial Spoken Dialog System

Cohesion and Learning in a Tutorial Spoken Dialog System. Art Ward Diane Litman. Outline. Tutoring Goals 4 issues in measuring cohesion Why they’re interesting How we test them Results. Natural Language Dialog Tutoring. Human tutors are better than classroom instruction (Bloom 84)

Download Presentation

Cohesion and Learning in a Tutorial Spoken Dialog System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cohesion and Learning in a Tutorial Spoken Dialog System Art Ward Diane Litman

  2. Outline • Tutoring • Goals • 4 issues in measuring cohesion • Why they’re interesting • How we test them • Results

  3. Natural Language Dialog Tutoring • Human tutors are better than classroom instruction (Bloom 84) • Intelligent Tutoring Systems (ITSs) hope to replicate this advantage • Is Dialog important to learning? • Dialog acts: question answering, explanatory reasoning, deep student answers (Graesser et al. 95, Forbes-Riley et al. 05) • Difficult to automatically tag dialog input, so: • Automatically detectable dialog features • Average turn length, etc. (Litman et al. 04) • We look at Cohesion • Lexical Co-occurrence between turns

  4. Goals and Results • Goals • Want to find if cohesion is correlated with learning in our tutoring dialogs. • If it is, may inform ITS design • Want to find a computationally tractable measure of cohesion • So can be used in a real-time tutor • Results • Do find strong correlations with learning • For low pre-testers • For interactive (tutor to student) measures of cohesion • Robust to multiple measures of lexical cohesion

  5. 4 Issues • Why/How identify cohesion in dialogs? • Do students of different skill levels respond to cohesion in the same way? (Is there an aptitude/treatment interaction?) • Is Interactivity Important? • What other processing steps help?

  6. Issue 1: How identify cohesion in dialogs? • Why might cohesion be important in tutoring? • McNamara & Kintsch (96) • Students read high & low coherence text • High coherence text was low coherence version altered to: • Use consistent referring expressions • Identify anaphora • Supply background information • Interaction between pre-test score & response to textual coherence • Low pre-testers learned more from more coherent text • High pre-testers learned LESS from more coherent text

  7. Measuring Cohesion • Measurements from Computational Linguistics • Hearst(94) topic segmentation, text • Word-count similarity of spans of text • Olney & Cai (05) topic segmentation, tutorial dialog • Several measures, including Hearst’s • Morris & Hirst (91) Lexical Chains • Thesaurus entries • Barzilay & Eldihad (97) Automatic Lexical Chains • WordNet senses • We develop measures similar to Hearst’s • But novel in that: • Applied to dialog rather than text, used to find correlations with learning

  8. Issue 1: How identify cohesion in dialogs? Defining Cohesion • Halliday and Hassan (76) • Grammatical vs Lexical Cohesion • Lexical Cohesion • Reiteration • Exact word repetition • Synonym repetition • Near Synonym repetition • Super-ordinate class • General referring noun • Cohesion measured by counting “cohesive ties” • Two words joined by a cohesive device (i.e. reiteration)

  9. Issue 1: How identify cohesion in dialogs? Defining Cohesion • Halliday and Hassan (76) • Grammatical vs Lexical Cohesion • Lexical Cohesion • Reiteration • Exact word repetition • Synonym repetition • Near Synonym repetition • Super-ordinate class • General referring noun • Cohesion measured by counting “cohesive ties” • Two words joined by a cohesive device (i.e. reiteration)

  10. Issue 1: How identify cohesion in dialogs? • How we measure Lexical Cohesion • We count cohesive ties between turns • Tokens (with stop words) • (token = “word”) • Tokens (stop words removed) • (Stops = high frequency, low information words) • Stems (stop words removed)

  11. Stems • Stem = non-inflected core of a word • Porter Stemmer • Allows us to find ties between various inflected forms of the same word in adjacent turns. • “Turns” are tutor and student contributions to Tutoring Dialogs collected by the ITSPOKE group.

  12. Applying Cohesion measures to our Corpora: example

  13. Applying Cohesion measures to our Corpora: example

  14. Applying Cohesion measures to our Corpora: example

  15. Applying Cohesion measures to our Corpora: example

  16. Issue 2: Is there an aptitude/treatment interaction? • Why there might be: • McNamara & Kintsch • How we test it: • Mean pre-test split • All students • Above-mean pretest students (“high” pre-testers) • Below-mean pretest students (“low” pre-testers)

  17. Issue 3: Is interactivity Important? • Why it might be: • Chi et al. (01) • Tutor centered, Student centered, Interactive • Deep learning through self construction • Not tutor actions alone • Litman & Forbes-Riley (05) • Learning correlated with both: • student utterances that display reasoning • tutor questions that require reasoning • How we test it: • Interactive corpus – compare tutor to student turns • Tutor–only corpus • Student–only corpus

  18. Issue 4: What other processing steps help? • Tried several on training corpus: • Removing stop words • N-turn spans • Selecting “substantive” turns • TF-IDF normalization • Turn-normalized counts • (Raw tie count / # of turns in dialog) • Found final options on training corpus: • One turn spans, turn normalization, no TF-IDF, no substantive turn selection • All reported results use these options • Tested options on new corpus

  19. Where did the corpora come from? • ITSPOKE is a speech-enabled version of Why-2 Atlas (VanLehn et al. 02) • Qualitative physics • Tutoring Cycle • Student reads instructional materials • Takes a pre-test • Starts Interactive tutoring cycle • Problem • Essay • Tutor evaluates essay, engages in dialog • Revise essay • Repeat • Takes a post-test

  20. Tutoring Corpora • Transcripts of tutoring sessions • Training corpus (fall 2003): • 20 students, 5 problems each • 95 dialogs (5 had no dialog) • 13 low pre-testers, 7 high pre-testers • Testing corpus (spring 2005): • 34 students, 5 problems each • 163 dialogs (7 had no dialog) • 18 low pre-testers, 16 high pre-testers

  21. Results: Aptitude/Treatment • Test: partial correlation of post-test & cohesion count, controlling for pre-test • Cohesion correlated with learning for low pre-test students • Not for high pre-test students • Little difference between types of measurement • Less significant on testing data, “token with stops” level reduced to a trend

  22. Results: Aptitude/Treatment • Test: partial correlation of post-test & cohesion count, controlling for pre-test • Cohesion correlated with learning for low pre-test students • Not for high pre-test students • Little difference between types of measurement • Slightly less significant on testing data

  23. Results: Aptitude/Treatment • Test: partial correlation of post-test & cohesion count, controlling for pre-test • Cohesion correlated with learning for low pre-test students • Not for high pre-test students • Little difference between types of measurement • Less significant on testing data, “token with stops” level reduced to a trend

  24. Results: Aptitude/Treatment • Test: partial correlation of post-test & cohesion count, controlling for pre-test • Cohesion correlated with learning for low pre-test students • Not for high pre-test students • Little difference between types of measurement • Less significant on testing data, “token with stops” level reduced to a trend

  25. Results: Aptitude/Treatment • Test: partial correlation of post-test & cohesion count, controlling for pre-test • Cohesion correlated with learning for low pre-test students • Not for high pre-test students • Little difference between types of measurement • Less significant on testing data, “token with stops” level reduced to a trend

  26. Results: Aptitude/Treatment • Test: partial correlation of post-test & cohesion count, controlling for pre-test • Cohesion correlated with learning for low pre-test students • Not for high pre-test students • Little difference between types of measurement • Less significant on testing data, “token with stops” level reduced to a trend

  27. Results: Aptitude/Treatment • Test: partial correlation of post-test & cohesion count, controlling for pre-test • Cohesion correlated with learning for low pre-test students • Not for high pre-test students • Little difference between types of measurement • Less significant on testing data, “token with stops” level reduced to a trend

  28. Results: Aptitude/Treatment (2003 data) • No significant difference between amounts of (turn normalized) cohesion in high and low pre-test groups. • Difference in correlation between high and low pre-testers not due to different amounts of cohesion.

  29. Results: Interactivity (2003) • Cohesion between tutor utterances is not correlated with learning

  30. Results: Interactivity (2003) • No evidence that cohesion between student productions is correlated with learning (but student utterances are very short with computer tutor)

  31. Discussion • Both high and low pre-testers successfully learned from these dialogs • Our measure of lexical cohesion seems to reflect only what the low pre-testers do to learn, not correlated with what high pre-testers do. • McNamara & Kintsch also found a positive correlation for low pre-testers, but a negative correlation for high pre-testers.

  32. Discussion • Our measures are slightly different: • McNamara & Kintsch: Manipulated coherence in text • Reader does not contribute to coherence • Coherence is the extent to which semantic relations are spelled out in the text, rather than inferred by the reader. • Low pre-testers probably learned because high coherence text allowed them to make inferences they couldn’t from the low cohesion text. • Low pre-testers & low coherence: didn’t know the terms • High coherence may allow a greater number of successful inferences for their low pre-testers • Our work: Dialog • Student does contribute to cohesion • Higher cohesion means using more of same terms • Speculation: High cohesion may indicate the number of successful inferences our low pre-testers already made. • High pre-testers already know the terms, so new inferences are not involved in using them.

  33. Summary • We have taken automatically computable measures of cohesion from computational linguistics • Applied them to tutorial dialog • Found correlations with student learning

  34. Conclusions • Simple, automatically computable measures of lexical cohesion correlate with learning • But only for students with low pre-test scores, even though low and high pre-testers showed similar amounts of cohesion. • Correlation is robust to differences in type of measurement • It’s the cohesion between student and tutor that’s important

  35. Future Work • Short term: • Cohesion may also be related with learning in high pre-testers, but we’re measuring the wrong kind of cohesion • Work underway to try “sense” level measures • Halliday & Hassan’s “synonym” levels of reiteration • “Acceleration” & “speeding up” • New issues: • Word sense disambiguation (one sense per discourse?) • Or measuring it in the wrong places • Try finding cohesion at impasses (VanLehn 03) • Try finding change in cohesion over time (Pickering & Garrod 04) • Is it the dialog, or the essay? • Long term: • Test by manipulating cohesion in ITSPOKE

  36. Thanks • Diane Litman • ITSPOKE group

  37. Questions?

  38. Cohesion vs Coherence • Cohesive Devices • Things that “tie” different parts of a discourse together: • Anaphora, repetition, etc… • But still may not make sense: • John hid Bill’s car keys. He likes spinach. (Jurafsky & Martin 00) • Coherence relations • Semantic relations between utterances. • Result, Explanation, elaboration, etc. (Hobbs 79)

  39. Britton & Gulgoz 91 • Original text: Air war in the North, 1965 By the fall of 1964, Americans in both Saigon and Washington had begun to focus on Hanoi as the source of the continuing problem in the south. • Modified text: Air war in North Vietnam, 1965 By the beginning of 1965, Americans in both Saigon and Washington had begun to focus on Hanoi, capital of North Vietnam, as the source of the continuing problems in the south.

More Related