130 likes | 296 Views
Progress Update. Lin Ziheng. Outline. Update Summarization Opinion Summarization Discourse Analysis. Update Summarization. TAC 2008 update summarization task slightly differ from the DUC 2007 update task
E N D
Progress Update Lin Ziheng
Outline • Update Summarization • Opinion Summarization • Discourse Analysis
Update Summarization • TAC 2008 update summarization task slightly differ from the DUC 2007 update task • The documents will be from the AQUAINT-2 collection rather than the AQUAINT collection • Cluster format: • There will only be two sets per cluster (Set A and Set B) • Each document set will have exactly 10 documents • The summary for document Set A should be a regular topic-focused summary • The summary for Set B should be written under the assumption that the user has already read all the documents in Set A
Tarsqi: a tool for event/time anchoring/ordering • Recognizes events and times • Creates event/event, event/time, time/time temporal links John fell after Mary pushed him. They heard an explosion on Monday, but not in 2007. This reminded them of the 1968war, which ravaged the countryside in 1969. He slept on Friday night. She hopes to succeed before noon. Gonzalez said he would resign on Tuesday. He thought it was a great deal. John leavestoday. John does not leavetoday.
D1 D2 D1 D2 1 1 5 1 5 Graph Layering 3 4 Tarsqi 2 6 2 6 3 7 3 7 2 9 4 8 4 8 6 9 9 5 7 8
Opinion Summarization • Input: • Output: a summary for each target that summarizes the answers to the questions • <target id = "9902" text = "Time Magazine 2005 Person of the Year"> • <q id = "9902.1" type= "SquishyList"> • Why did readers support Time's inclusion of Bono for Person of the Year? • </q> • <q id = "9902.2" type= "SquishyList"> • Why did readers not support the inclusion of Bill Gates as Person of the Year? • </q> • <q id = "9902.3" type= "SquishyList"> • Why did readers not support the inclusion of Melinda Gates as Person of the Year? • </q> • <doc id = "BLOG06-20051222-014-0013437834" /> • <doc id = "BLOG06-20051224-070-0016186787" /> • <doc id = "BLOG06-20051225-087-0014047570" /> • <doc id = "BLOG06-20051225-022-0003271778" /> • <doc id = "BLOG06-20051223-002-0006769403" /> • <doc id = "BLOG06-20051222-023-0003513393" /> • <doc id = "BLOG06-20051228-009-0011259747" /> • <doc id = "BLOG06-20051221-029-0028769327" /> • </target>
Existing opinion corpus: Movie Review corpus • Document level: • 1000 +ve documents and 1000 –ve documents • Problem: coarse grain level • Sentence level: • 5331 +ve sentences and 5331 –ve sentences • Problem: not enough data • We collected data from productreview.com.au and rateitall.com • Fine grain: • Productreview.com.au: each review has pros, cons, overall, and a rating • Rateitall.com: each review has a rating • Large datasets • Productreview.com.au: 2.4G • Rateitall.com: 2.0G • http://wing.comp.nus.edu.sg/~hung/productreview/ • http://wing.comp.nus.edu.sg/~hung/rateitall/
Discourse Analysis • Penn Discourse Treebank 2.0 • Based on PTB 2 • 18459 Explicit relations,16053 Implicit relations • TEMPORAL(950::3696) • Asynchronous (697::2090) • precedence • succession • Synchronous (251::1594) • CONTINGENCY (4255::3417) • Cause (4172::2240) • reason • Result • Pragmatic Cause (83::13) • Justification • Condition (1::1416) • hypothetical • general • unreal present • unreal past • factual present • factual past • Pragmatic Condition (1::67) • relevance • implicit assertion • COMPARISON (2503::5589) • Contrast (2120::3928) • juxtaposition • opposition • Pragmatic Contrast (4::32) • Concession (223::1213) • expectation • contra-expectation • Pragmatic Concession (1::15) • EXPANSION (8861::6423) • Conjunction (3534::5320) • Instantiation (1445::302) • Restatement (3206::162) • specification • equivalence • generalization • Alternative (185::351) • conjunctive • disjunctive • chosen alternative • Exception (2::14) • List (400::250)
Marcu and Echihabi baseline • Used word-pairs in a Naive Bayes model • Wellner et al. baseline • Used totally 7 feature classes • Claimed that proximity and connective are the most useful feature classes • prox: 0.60 • prox + conn: 0.7677 • I only implemented prox and conn in the baseline system