1 / 24

Recognizing Discourse Structure: Text

Recognizing Discourse Structure: Text. Discourse & Dialogue CMSC 35900-1 October 16, 2006. Roadmap. Discourse structure TextTiling (Hearst 1994) RST Parsing (Marcu 1999) Contrasts & Issues. Discourse Structure Models. Attention, Intentions (G&S) Rhetorical Structure Theory (RST). G&S

mickey
Download Presentation

Recognizing Discourse Structure: Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognizing Discourse Structure:Text Discourse & Dialogue CMSC 35900-1 October 16, 2006

  2. Roadmap • Discourse structure • TextTiling (Hearst 1994) • RST Parsing (Marcu 1999) • Contrasts & Issues

  3. Discourse Structure Models • Attention, Intentions (G&S) • Rhetorical Structure Theory (RST)

  4. G&S Participants Mono, dial, multi-party Relations 2: Dom, SP Purpose: infinite Speech issues Address some Granularity Variable Minimum: phrase/clause RST Participants Monologue Relations Myriad Speech issues Largely ignored Granularity Clausal Contrasts

  5. Similarities • Structure • Predominantly hierarchical, tree-structured • Some notion of core intentions/topics • Intentions of speaker and hearer • Co-constraint of discourse structure and ling form

  6. Questions • Which would be better for generation? • For recognition? • For summarization? • Why?

  7. Recognizing Discourse Structure • Decompose text into subunits • Questions: • What type of structure is derived? • Sequential spans, hierarchical trees, arbitrary graphs • What is the granularity of the subunits? • Clauses? Sentences? Paragraphs? • What information guides segmentation? • Local cue phrases? Lexical cohesion? • How is the information modeled? Learned? • How do we measure success?

  8. TextTiling (Hearst 1994) • Identify discourse structure of expository text • Hypothesized structure • Small number of main topics • Sequence of subtopical discussions • Could be marked by headings and subheading • Non-overlapping, contiguous multi-paragraph blocks • Linear sequence of segments • Just extents not (summary of) contents • Useful for WSD, IR, summary • Assume small coherent units

  9. Characterizing Topic Structure • Presumes “piecewise” structure (Skorochod’ko 72) • Sequences of densely interrelated discussions • Linked linearly • Allows multiple concurrent active themes • Segment at points of sudden radical change • Change in subject matter: e.g. earth/moon vs binary stars • Characterize similarity based on lexical cohesion • Repeated terms – literal or semantically related

  10. Core Algorithm • Tokenization: • Pseudo-sentences of fixed lengths: 20 words • Stemmed, stopped • Block similarity computation • Blocks: Groups of pseudo-sentences; length, k • k=6 is good for many texts • Gap: between-block position, possible segment • Compute similarity between adjacent blocks

  11. Block Similarity • Term weight w= term frequency in block • Compute each gap similarity measure • Perform average smoothing

  12. Boundary Detection • Based on changes in similarity scores • For each gap, move left and right until similarity stops increasing • Compute difference b/t peak & current • Sum peak differences • Largest scores -> Boundaries • Require some minimum separation (3) • Threshold depth for boundary

  13. Evaluation • Contrast with reader judgments • Alternatively with author or task-based • 7 readers, 13 articles: “Mark topic change” • If 3 agree, considered a boundary • Run algorithm – align with nearest paragraph • Contrast with random assignment at frequency • Auto: 0.66, 0.61; Human:0.81, 0.71 • Random: 0.44, 0.42

  14. Discussion • Overall: Auto much better than random • Often “near miss” – within one paragraph • 0.83,0.78 • Issues: Summary material • Often not similar to adjacent paras • Similarity measures • Is raw tf the best we can do? • Other cues?? • Other experiments with TextTiling perform less well – Why?

  15. RST Parsing (Marcu 1999) • Learn and apply classifiers for • Segmentation and parsing of discourse • Discourse structure • RST trees • Fine-grained, hierarchical structure • Clause-based units • Inter-clausal relations: 71 relations: 17 clusters • Mix of intention, informational relations

  16. Corpus-based Approach • Training & testing on 90 RST trees • Texts from MUC, Brown (science), WSJ (news) • Annotations: • Identify “edu”s – elementary discourse units • Clause-like units – key relation • Parentheticals – could delete with no effect • Identify nucleus-satellite status • Identify relation that holds – I.e. elab, contrast…

  17. RST Parsing Model • Shift-reduce parser • Assumes segmented into edus • Edu -> Edt – minimal discourse tree unit, undefined • Promotion set: self • Shift: Add next input edt to stack • Reduce: pop top 2 edts, combine in new tree • Update: status, relation, promotion set • Push new tree on stack • Requires 1 shift op and 6 reduce • Reduce ops: ns, sn, nn -> binary; below ops: non-binary • Learn: relation+ops: 17*6 + 1

  18. Learning Segmentation • Per-word: Sentence, edu, or parenthetical bnd • Features: • Context: 5 word window, POS: 2 before, after • Discourse marker present? • Abbreviations? • Global context: • Discourse markers, commas & dashes, verbs present • 2417 binary features/example

  19. Segmentation Results • Good news: Overall:~97% accuracy • Contrast: Majority (none): ~92% • Contrast: “DOT”= bnd: 93% • Comparable to alternative approaches • Bad news: • Problem cases: Start of parenthetical, edu

  20. Learning Shift-Reduce • Construct sequence of actions from tree • For a configuration: • 3 top trees on stack, next edt in input • Features: # of trees on stack, in input • Tree characteristics: size, branching, relations • Words and POS of 1st and last 2 lexemes in spans • Presence and position of any discourse markers • Previous 5 parser actions • Hearst-style semantic similarity across trees • Similarity of WordNet measures

  21. Classifying Shift-Reduce • C4.5 classifier • Overall: 61% • Majority: 50%, Random: 27% • End-to-end evaluation: • Good on spans and N/S status • Poor on relations

  22. Discussion • Noise in segmentation degrades parsing • Poor segmentation -> poor parsing • Need sufficient training data • Subset (27) texts insufficient • More variable data better than less but similar data • Constituency and N/S status good • Relation far below human

More Related