1 / 13

Progress Update

Progress Update. Lin Ziheng. Outline. Update Summarization Opinion Summarization Discourse Analysis. Update Summarization. TAC 2008 update summarization task slightly differ from the DUC 2007 update task

rafael-day
Download Presentation

Progress Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progress Update Lin Ziheng

  2. Outline • Update Summarization • Opinion Summarization • Discourse Analysis

  3. Update Summarization • TAC 2008 update summarization task slightly differ from the DUC 2007 update task • The documents will be from the AQUAINT-2 collection rather than the AQUAINT collection • Cluster format: • There will only be two sets per cluster (Set A and Set B) • Each document set will have exactly 10 documents • The summary for document Set A should be a regular topic-focused summary • The summary for Set B should be written under the assumption that the user has already read all the documents in Set A

  4. Tarsqi: a tool for event/time anchoring/ordering • Recognizes events and times • Creates event/event, event/time, time/time temporal links John fell after Mary pushed him. They heard an explosion on Monday, but not in 2007. This reminded them of the 1968war, which ravaged the countryside in 1969. He slept on Friday night. She hopes to succeed before noon. Gonzalez said he would resign on Tuesday. He thought it was a great deal. John leavestoday. John does not leavetoday.

  5. D1 D2 D1 D2 1 1 5 1 5 Graph Layering 3 4 Tarsqi 2 6 2 6 3 7 3 7 2 9 4 8 4 8 6 9 9 5 7 8

  6. D0703A-A

  7. BFS

  8. Topmost layering

  9. Optimal layering

  10. Opinion Summarization • Input: • Output: a summary for each target that summarizes the answers to the questions • <target id = "9902" text = "Time Magazine 2005 Person of the Year"> • <q id = "9902.1" type= "SquishyList"> • Why did readers support Time's inclusion of Bono for Person of the Year? • </q> • <q id = "9902.2" type= "SquishyList"> • Why did readers not support the inclusion of Bill Gates as Person of the Year? • </q> • <q id = "9902.3" type= "SquishyList"> • Why did readers not support the inclusion of Melinda Gates as Person of the Year? • </q> • <doc id = "BLOG06-20051222-014-0013437834" /> • <doc id = "BLOG06-20051224-070-0016186787" /> • <doc id = "BLOG06-20051225-087-0014047570" /> • <doc id = "BLOG06-20051225-022-0003271778" /> • <doc id = "BLOG06-20051223-002-0006769403" /> • <doc id = "BLOG06-20051222-023-0003513393" /> • <doc id = "BLOG06-20051228-009-0011259747" /> • <doc id = "BLOG06-20051221-029-0028769327" /> • </target>

  11. Existing opinion corpus: Movie Review corpus • Document level: • 1000 +ve documents and 1000 –ve documents • Problem: coarse grain level • Sentence level: • 5331 +ve sentences and 5331 –ve sentences • Problem: not enough data • We collected data from productreview.com.au and rateitall.com • Fine grain: • Productreview.com.au: each review has pros, cons, overall, and a rating • Rateitall.com: each review has a rating • Large datasets • Productreview.com.au: 2.4G • Rateitall.com: 2.0G • http://wing.comp.nus.edu.sg/~hung/productreview/ • http://wing.comp.nus.edu.sg/~hung/rateitall/

  12. Discourse Analysis • Penn Discourse Treebank 2.0 • Based on PTB 2 • 18459 Explicit relations,16053 Implicit relations • TEMPORAL(950::3696) • Asynchronous (697::2090) • precedence • succession • Synchronous (251::1594) • CONTINGENCY (4255::3417) • Cause (4172::2240) • reason • Result • Pragmatic Cause (83::13) • Justification • Condition (1::1416) • hypothetical • general • unreal present • unreal past • factual present • factual past • Pragmatic Condition (1::67) • relevance • implicit assertion • COMPARISON (2503::5589) • Contrast (2120::3928) • juxtaposition • opposition • Pragmatic Contrast (4::32) • Concession (223::1213) • expectation • contra-expectation • Pragmatic Concession (1::15) • EXPANSION (8861::6423) • Conjunction (3534::5320) • Instantiation (1445::302) • Restatement (3206::162) • specification • equivalence • generalization • Alternative (185::351) • conjunctive • disjunctive • chosen alternative • Exception (2::14) • List (400::250)

  13. Marcu and Echihabi baseline • Used word-pairs in a Naive Bayes model • Wellner et al. baseline • Used totally 7 feature classes • Claimed that proximity and connective are the most useful feature classes • prox: 0.60 • prox + conn: 0.7677 • I only implemented prox and conn in the baseline system

More Related