230 likes | 245 Views
This research paper presents the Pythy Summarization System developed by Microsoft Research for the DUC 2007. It includes an overview of the system, training methods, feature inventory, ranking models, and dynamic scoring techniques.
E N D
The Pythy Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki, and Lucy Vanderwende Microsoft Research April 26, 2007
DUC Main Task Results • Automatic Evaluations (30 participants) • Human Evaluations • Did pretty well on both measures
Overview of Pythy • Linear sentence ranking model • Learns to rank sentences based on: • ROUGE scores against model summaries • Semantic Content Unit (SCU) weights of sentences selected by past peers • Considers simplified sentences alongside original sentences
Sentences PYTHY Training Simplified Sentences Docs Docs Targets Ranking/ Training ROUGE Oracle Pyramid/ SCU ROUGE X 2 Model Docs Docs Feature inventory
Sentences PYTHY Testing Simplified Sentences Docs Docs Search Model Dynamic Scoring Docs Docs Summary Feature inventory
Sentences PYTHY Training Simplified Sentences Docs Docs Targets Ranking Training ROUGE Oracle Pyramid/ SCU ROUGE X 2 Model Sentence Simplification Docs • Extension of simplification method for DUC06 • Provides sentence alternatives, rather than deterministically simplify a sentence • Uses syntax-based heuristic rules • Simplified sentences evaluated alongside originals • In DUC 2007: • Average new candidates generated: 1.38 per sentence • Simplified sentences generated: 61% of all sents • Simplified sentences in final output: 60% Docs Feature inventory
Sentences PYTHY Training Simplified Sentences Docs Docs Targets Ranking Training ROUGE Oracle Pyramid/ SCU ROUGE X 2 Model Sentence-Level Features Docs • SumFocus features: SumBasic (Nenkova et al 2006) + Task focus • cluster frequency and topic frequency • only these used in MSR DUC06 • Other content word unigrams: headline frequency • Sentence length features (binary features) • Sentence position features (real-valued and binary) • N-grams (bigrams, skip bigrams, multiword phrases) • All tokens (topic and cluster frequency) • Simplified Sentences (binary and ratio of relative length) • Inverse document frequency (idf) Docs Feature inventory
Sentences PYTHY Training Simplified Sentences Docs Docs Targets Ranking Training ROUGE Oracle Pyramid/ SCU ROUGE X 2 Model Pairwise Ranking Docs • Define preferences for sentence pairs • Defined using human summaries and SCU weights • Log-linear ranking objective used in training • Maximize the probability of choosing the better sentence from each pair of comparable sentences Docs [Ofer et al. 03], [Burges et al. 05] Feature inventory
Sentences PYTHY Training Simplified Sentences Docs Docs Targets Ranking Training ROUGE Oracle Pyramid/ SCU ROUGE X 2 Model Rouge Oracle Metric Docs • Find an oracle extractive summary • the summary with the highest average ROUGE-2 and ROUGE-SU4 scores • All sentences in the oracle are considered “better” than any sentence not in the oracle • Approximate greedy search used for finding the oracle summary Docs Feature inventory
Sentences PYTHY Training Simplified Sentences Docs Docs Targets Ranking Training ROUGE Oracle Pyramid/ SCU ROUGE X 2 Model Pyramid-Derived Metric Docs • University of Ottawa SCU-annotated corpus (Copeck et al 06) • Some sentences in 05 & 06 document collections are: • known to contain certain SCUs • known not to contain any SCUs • Sentence score is sum of weights of all SCUs • for un-annotated sentences, the score is undefined • A sentence pair is constructed for training s1 >s2 iff w(s1)>w(s2) Docs Feature inventory
Sentences PYTHY Training Simplified Sentences Docs Docs Targets Ranking Training ROUGE Oracle Pyramid/ SCU ROUGE X 2 Model Model Frequency Metrics Docs • Based on unigram and skip bigram frequency • Computed for content words only • Sentence siis “better” than sj if Docs Feature inventory
Sentences PYTHY Training Simplified Sentences Docs Docs Targets Ranking Training ROUGE Oracle Pyramid/ SCU ROUGE X 2 Model Combining multiple metrics Ranking Training Docs Feature inventory • From ROUGE oracle all sentences in oracle summary better than other sentences • From SCU annotations sentences with higher avg SCU weights better • From model frequency sentences with words occurring in models better • Combined loss: adding the losses according to all metrics Docs
Sentences PYTHY Testing Simplified Sentences Docs Docs Search Model Dynamic Scoring Docs Docs Summary Feature inventory
Search Dynamic Sentence Scoring Dynamic Scoring • Eliminate redundancy by re-weighting • Similar to SumBasic (Nenkova et al 2006), re-weighting given previously selected sentences • Discounts for features that decompose into word frequency estimates
Search Search Dynamic Scoring • The search constructs partial summaries and scores them: • The score of a summary does not decompose into an independent sum of sentence scores • Global dependencies make exact search hard • Used multiple beams for each length of partial summaries • [McDonald 2007]
Impact of Sentence Simplification • Trained on 05 data, tested on O6 data
Impact of Sentence Simplification • Trained on 05 data, tested on O6 data
Impact of Sentence Simplification • Trained on 05 data, tested on O6 data
Evaluating the Metrics Trained on 05 data, tested on 06 data Includes simplified sentences
Evaluating the Metrics Trained on 05 data, tested on 06 data Includes simplified sentences
Update Summarization Pilot • SVM novelty classifier trained on TREC 02 & 03 novelty track
Summary and Future Work • Summary • Combination of different target metrics for training • Many sentence features • Pair-wise ranking function • Dynamic scoring • Future work • Boost robustness • Sensitive to cluster properties (e.g., size) • Improve grammatical quality of simplified sentences • Reconcile novelty and (ir)relevance • Learn features over whole summaries rather than individual sentences