300 likes | 315 Views
Understand Hidden Markov Modeling for summarization tasks, advantages of decomposition methods in summarization, paraphrasing operations, and studying human-generated summaries vs. automated processes. Investigation into improving summary generation techniques.
E N D
Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
Announcements • Questions? • Homework 3 will be assigned in 1 week • Until then: • Finalize your group and your plans • Collect readings and resources • Plan for Today • Hidden Markov Modeling • Jing, 2002 paper
Getting into Technology Problem Human Behavior * Hidden Markov Models Solution Design Technology Component Technology Tool for understanding the process of generating summaries Today’s focus Problem?
Hidden Markov Modeling • Different from typical markov models because states not directly observable • From one sequence of observations, more than one sequence of states is possible • Viterbi search is used at decoding time to identify the most likely sequence of states
Hidden Markov Modeling • Pattern • y1 y1 y1 y3 y4 • State Sequences • x1 x2 x1 x2 x1 • x1 x2 x1 x2 x3
Simplistic Summarization • Select a subset of sentences from the source document or documents • Present them in the same order in which they appeared in the source
Less Simplistic Summarization • Select a subset of sentences from the source document or documents • Paraphrase those sentences • Present them in the same or different order in which they appeared in the source
Advantages of Solving the Decomposition Problem • Gain insight into desirable generation techniques for summarization • They could have provided more analysis to this end • Automatically produce training data for extraction based summarization approaches
Paraphrase Operations • Sentence reduction • Sentence combination • Syntactic transformation • Lexical paraphrasing • Generalization or specification
Student Quote • They say "based on careful analysis of human-written summaries", which suggests that they sat in a room by themselves reading summaries and original texts, trying to figure out what human summarizers do. Why didn't they just go out and talk to some real people?
If someone asked you how you generate a summary what would you tell them?
Paraphrase Operations • Sentence reduction • Sentence combination • Syntactic transformation • Lexical paraphrasing • Generalization or specification • Can you think of another way of paraphrasing that is not mentioned here?
Sentence Reduction • Non-essential phrases are removed • What counts as non-essential?
Sentence Combination • Merge sentences, typically after reducing both • How you merge depends on overlap between sentences • When is it advantageous to merge?
Syntactic Transformation • Changing the syntactic structure • Which syntactic transformations are allowed? • Do these two sentences mean the same thing?
Lexical Paraphrasing • Replacing a phrase with something that means the same thing • “hits the nail on the head” versus “fit squarely into” • What counts as a lexical paraphrase?
Generalization or Specification • Similar to lexical paraphrasing
Problem Formulation • Identify the most likely position in the document (if any) of each summary word • Then apply the decomposition operations
Evaluations • Alignment • How accurately can this approach align summary sentences with document sentences • Only tests the HMM • Decomposition • Humans judged whether decomposition was correct • Only tests decomp operators • Portability evaluation – test of generality
Alignment • Used 10 documents paired with human written summaries • Other humans looked at the pairs and matched summary sentences to document sentences • Precision, Recall, and F-measure can be computed by comparing these extracts with the automatic ones • Error analysis: problems with creative rewordings or when irrelevant sentences contain summary words
Decomposition • 50 summaries from telecommunications corpus • Ran decomposition program • 93.8% of sentences were correctly decomposed • Seems like a weak definition of correct decomposition • Correct pairing between sentences • Correctly identified where phrases came from
Portability • Test on a new type of data • Performed well
But what did we learn about how humans generate summaries? • Analyzed 300 human written summaries • 19% of summary sentences did not have a matching sentence • 42% matched a single sentence • Often along with sentence reduction • 36% were created by combining 2 or 3 sentences • 3% created by combining more than that
What would be interesting next steps?