1 / 23

Annotating Students’ Understanding of Science Concepts

Annotating Students’ Understanding of Science Concepts. Rodney D. Nielsen, Wayne Ward, James Martin, and Martha Palmer Center for Computational Language and Education Research University of Colorado, Boulder. Annotating Fine-Grained Entailments.

marvene
Download Presentation

Annotating Students’ Understanding of Science Concepts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotating Students’ Understanding of Science Concepts Rodney D. Nielsen, Wayne Ward, James Martin, and Martha Palmer Center for Computational Language and Education Research University of Colorado, Boulder

  2. Annotating Fine-Grained Entailments • Question: Kate said: “An object has to move to produce sound.” Do you agree with her? Why or why not? • Reference answer: Agree. Vibrations are movements and vibrations produce sound. • Learner answer: I do not agree because a radio does not move to make sound. • The student agrees Contradicted • Vibrations are movement Unaddressed • Vibrations produce something Different Argument • Something produces sound Expressed

  3. Recognizing Textual Entailment • Hypothesis: Agree. Vibrations are movements and vibrations produce sound. • Text: I do not agree because a radio does not move to make sound. • The student agrees False • Vibrations are movement Unknown • Vibrations produce something Unknown • Something produces sound True

  4. Prior Work • Automated Tutors • Aleven et al. 2001; Graesser et al., 2001; Jordan et al., 2004; Koedinger et al. 1997; Makatchev et al., 2004; Peters et al., 2004; Pon Berry et al., 2004; Roll et al., 2005; Rose et al., 2003; VanLehn et al., 2005 • Constructed Response Scoring • Callear et al., 2001; Leacock and Chodorow, 2003; Mitchell et al., 2002 & 2003; Pullman, 2005; Sukkarieh, 2003 & 2005 • PASCAL RTE (Dagan, Glickman and Magnini, 2005) • Differences / Weakness • Course grained entailment – yes/no or grade: 0-2 points • Question-specific systems • Hand-crafted dialog control, parsers, knowledge-based ontologies, logic representations, and or rules • Require 100-500 responses per question

  5. object det det subject nmod nmod A long string produces a low pitch. Necessity of Finer-Grained Analysis • Imagine a tutor only knowing that there is some unspecified part of the reference answer that we are not sure the student understands • Reference Answer: A long string produces a low pitch. • Break the reference answer down into low-level facets derived from a dependency parse and thematic roles • NMod(string, long) The string is long. • Agent(produces, string) A string is producing something. • Product(produces, pitch) A pitch is being produced. • NMod(pitch, low) The pitch is low. • Assess whether an understanding of each facet is implicated by the student’s response

  6. Representing Fine-Grained Semantics Assess the relationship between the student’s answer and the reference answer facets at a finer grain Reference Ans: A long string produces a low pitch. NMod(string, long) Agent(produces, string) Product(produces, pitch) NMod(pitch, low) Expressed Expressed Expressed Unaddressed A long string produces a pitch. Yes Yes Yes No Assumed Expressed Expressed Different Argument It produces a loud pitch. Assumed Expressed Expressed Contradiction Expressed It produces a high pitch.

  7. The Focus of This Effort • Low level facets of reference answer • Finer-grained relationship to the facets

  8. The Corpus • Assessing Science Knowledge (ASK): Full Option Science System • Berkeley, Lawrence Hall of Science national assessment project (NSF) • 16 science teaching and learning modules, Grades 3-6 • 287 constructed response questions • 15,400 total student responses • 146,000 facet entailment annotations

  9. Annotation Process • Step 1: FOSS/ASK reference answers were manually decomposed into constituent facets • Ref Answer: The string is tighter, so the pitch is higher. • Be(string, tighter) The string is tighter. • Be(pitch, higher) The pitch is higher. • Cause(X, Y) X is caused by Y • Step 2: Learner answers are annotated to indicate whether and how each facet was addressed • Learner Answer: The string is tighter, so there is less tension so the pitch gets higher. • Be(string, tighter) The string is tighter. Self-Contra • Be(pitch, higher) The pitch is higher. Expressed • Cause(X, Y) X is caused by Y Expressed

  10. Reference Answer Decomposition • Begin with a manual dependency parse of the reference answer vc vmod sbar prd nmod sub vmod vmod pmod sub vmod The brass ring would not stick to the nail because the ring is not iron. • Then raise main verbs, remove unimportant dependencies, incorporate copulas, prepositions and negation into dependency labels, and utilize thematic role labels theme_not cause_because be_not nmod destination_to_not The brass ring would not stick to the nail because the ring is not iron.

  11. Reference Answer Markup • Final facets for Ref Answer: The brass ring would not stick to the nail because the ring is not iron. • NMod(ring, brass) The ring is brass. • Theme_not(stick, ring) The ring does not stick. • Destination_to_not(stick, nail) Something does not stick to the nail. • Be_not(ring, iron) The ring is not iron. • Cause_because(stick, is) X is caused by Y theme_not cause_because be_not nmod destination_to_not The brass ring would not stick to the nail because the ring is not iron.

  12. Answer Annotation Labels • Assumed: Facets that are assumed to be understood a priori based on the question • Expressed: Any facet directly expressed or inferred by simple reasoning • Inferred: Facets inferred by pragmatics or nontrivial logical reasoning • Contra-Expr: Facets directly contradicted by negation, antonymous expressions and their paraphrases • Contra-Infr: Facets contradicted by pragmatics or complex reasoning • Self-Contra: Facets that are both contradicted and implied (self contradictions) • Diff-Arg: The core relation is expressed, but it has a different modifier or argument • Unaddressed: Facets that are not addressed at all by the student’s answer

  13. Annotation – Expressed & Inferred • Question: Kate said: “An object has to move to produce sound.” Do you agree with her? Why or why not? • Reference Answer: Agree. Vibrations are movements and vibrations produce sound. • Root(root, agree) student agrees Expressed • Be(vibration, movement) vibration is movement Inferred • Agent(produce, vibrations) vibrations produce something Expressed • Patient(produce, sound) something produces sound Expressed • Student Answer: Yes because it has to vibrate to make sounds.

  14. Annotation – Contradictions • Question: Darla tied one end of a string around a doorknob and held the other end in her hand. When she plucked the string (pulled and let go quickly) she heard a sound. How would the pitch change if Darla pulled the string tighter? • Reference Answer: When the string is tighter, the pitch will be higher. • Be(string, tighter) The string is tighter. Assumed • Be(pitch, higher) The pitch is higher. Contra-Expr • Cause(X, Y) X is caused by Y Assumed • Student Answer: it will be low the pitch change

  15. Annotation – Unaddressed • Question: … Write a note to David to tell him why the pitch gets higher rather than lower • Ref Ans: The string is tighter, so the pitch is higher. The string between the cup and table is not longer. • … • Be_not(string, longer) The string is not longer Unaddressed • Student Answer: David pitch is not happening tension is happening okay so calm down.

  16. Labels • Assumed: Facets that are assumed to be understood a priori based on the question • Expressed: Any facet directly expressed or inferred by simple reasoning • Inferred: Facets inferred by pragmatics or nontrivial logical reasoning • Contra-Expr: Facets directly contradicted by negation, antonymous expressions and their paraphrases • Contra-Infr: Facets contradicted by pragmatics or complex reasoning • Self-Contra: Facets that are both contradicted and implied (self contradictions) • Diff-Arg: The core relation is expressed, but it has a different modifier or argument • Unaddressed: Facets that are not addressed at all by the student’s answer

  17. Inter-annotator Agreement • In most disagreements (57%) one annotator chose Unaddressed • 49% were between Unaddressed and Understood • 35% of disagreements were between the labels implying understanding • Only 2.3% of disagreements are between Understood and Contradicted Fine-Grn: all labels kept separate Tutor: combine {Expressed, Inferred & Assumed} and {Contra-Expr & Contra-Infr}, others separate Y/N: combine {Expressed, Inferred & Assumed} v. {everything else}

  18. Assessment Technology Overview • Start with hand-generated reference answer facets • Automatically parse reference & learner answer and automatically extract representation • Generate machine learning feature vectors indicative of the student’s understanding of each facet • From answers, their parses, the relations between these, and corpus co-occurrence statistics • Train a machine learning classifier on the training set feature vectors • Use classifier to assess the test set answers, assigning one of five Tutor-Labels for each RA facet

  19. Results (C4.5 decision tree) • Results on Tutor-Labels are: • 24.4, 8.1 and 15.4% over most frequent class baseline • 19.4, 3.1 and 5.9% over lexical baseline (All Unseen Modules facets adjudicated, about half of other modules adjudicated)

  20. Conclusions • New assessment paradigm to enable more effective tutoring dialog management • Facet break down: enables the tutor to provide feedback relevant specifically to the appropriate part of the reference answer • Additional labels: facilitate understanding the type of mismatch between the reference answer/hypothesis and the student’s answer/text

  21. Conclusions • Corpus of annotated answers • Substantial agreement: 86.2% on Tutor-Labels, 0.728 Kappa • About 146K facet annotations • Only corpus of fine-grained inference information • Freely available • Will support alternative approaches to the Recognizing Textual Entailment task

  22. Conclusions • Answer Assessment System • Evaluation according to new paradigm • Within domain performance: • 24% over majority class baseline • Out-of-domain performance: • 15% over majority class baseline • First system to address out-of-domain assessment • First successful assessment of Grade 3-6 constructed responses

  23. Thanks! • This work was partially funded by Award Numbers: • NSF 0551723, • IES R305B070434, and • NSF DRL-0733323.

More Related