1 / 48

Summarization and Personal Information Management

Summarization and Personal Information Management. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Announcements. Questions? Homework 2 assigned today and due in 1 week Plan for Today Hyland Chapter Hidden Markov Modeling Jing, 2002 paper.

lconnell
Download Presentation

Summarization and Personal Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. Announcements • Questions? • Homework 2 assigned today and due in 1 week • Plan for Today • Hyland Chapter • Hidden Markov Modeling • Jing, 2002 paper

  3. Getting into Technology Problem Human Behavior * Hidden Markov Models Solution Design Technology Component Technology Tool for understanding the process of generating summaries Today’s focus Problem?

  4. Hyland Chapter

  5. Always know what problem you are trying to solve!!!

  6. According to Hyland,What is the problem that abstracts solve?

  7. What do we get from Hyland? • Methodology for understanding how humans write abstracts • Important: Acknowledgement of the social context in which abstracts are written • Mixed methods: interviews, rhetorical analysis, comparative analysis, interpretation • Not shown: collocational analysis

  8. Hyland’s Coding Scheme • Note that selected rhetorical strategy says something about what the writer assumes about the audience • What do you remember from that?

  9. What’s your analysis: • 1 - Purpose • 2 – Introduction, Purpose+Method, Method • 3 - Product • 4 – Product • 5 - Product • 6 – Product, Conclusion

  10. What’s your analysis:

  11. What’s your analysis: • 7 – Purpose, Introduction, Method • 8 – Purpose, Introduction, Conclusion • 9 – Purpose, Method, Introduction • 10 – Product, Introduction, Conclusion

  12. What’s your analysis:

  13. Comparison Across Genres

  14. Comparison Across Genres

  15. Quotes

  16. Change over time

  17. Change over time

  18. Homework Two • Taking into account the feedback you received on assignment 1, refine the focus of your term project • State what is the problem you are trying to solve now • Assignment 2 focuses on Rhetorical Analysis • Find some data to work with – for the assignment you’ll need 3 examples of what you are trying to summarize. This can be 3 documents or 3 collections of documents • Design a coding scheme like Hyland did and do a rhetorical analysis of your data. If you are working on 3 collections of documents, just do a sampling. You don’t have to analyze the whole of 3 collections. • Now, based on your rhetorical analysis, “generate” by hand the summary you think you should get from your 3 examples • Now argue why you think this summary should “solve” the problem you set out to solve

  19. Hidden Markov Modeling

  20. Hidden Markov Modeling • Different from typical markov models because states not directly observable • From one sequence of observations, more than one sequence of states is possible • Viterbi search is used at decoding time to identify the most likely sequence of states

  21. Hidden Markov Modeling • Pattern • y1 y1 y1 y3 y4 • State Sequences • x1 x2 x1 x2 x1 • x1 x2 x1 x2 x3

  22. Question from Nitin • From what I have understood, assigning probability values to the transition of states (P1-P6) is experimental.

  23. Resources in Wikipedia

  24. Jing, 2002

  25. Simplistic Summarization • Select a subset of sentences from the source document or documents • Present them in the same order in which they appeared in the source

  26. Less Simplistic Summarization • Select a subset of sentences from the source document or documents • Paraphrase those sentences • Present them in the same or different order in which they appeared in the source

  27. Advantages of Solving the Decomposition Problem • Gain insight into desirable generation techniques for summarization • They could have provided more analysis to this end • Automatically produce training data for extraction based summarization approaches

  28. Paraphrase Operations • Sentence reduction • Sentence combination • Syntactic transformation • Lexical paraphrasing • Generalization or specification

  29. Student Quote from Last Time • They say "based on careful analysis of human-written summaries", which suggests that they sat in a room by themselves reading summaries and original texts, trying to figure out what human summarizers do. Why didn't they just go out and talk to some real people?

  30. Sentence Reduction • Non-essential phrases are removed • What counts as non-essential?

  31. Sentence Combination • Merge sentences, typically after reducing both • How you merge depends on overlap between sentences • When is it advantageous to merge?

  32. Syntactic Transformation • Changing the syntactic structure • Which syntactic transformations are allowed? • Do these two sentences mean the same thing?

  33. Lexical Paraphrasing • Replacing a phrase with something that means the same thing • “hits the nail on the head” versus “fit squarely into” • What counts as a lexical paraphrase?

  34. Generalization or Specification • Similar to lexical paraphrasing

  35. Problem Formulation • Identify the most likely position in the document (if any) of each summary word • Then apply the decomposition operations

  36. Example

  37. Remember: Connection with Statistical MT

  38. Evaluations • Alignment • How accurately can this approach align summary sentences with document sentences • Only tests the HMM • Decomposition • Humans judged whether decomposition was correct • Only tests decomp operators • Portability evaluation – test of generality

  39. Alignment • Used 10 documents paired with human written summaries • Other humans looked at the pairs and matched summary sentences to document sentences • Precision, Recall, and F-measure can be computed by comparing these extracts with the automatic ones • Error analysis: problems with creative rewordings or when irrelevant sentences contain summary words

  40. Alignment

  41. Decomposition • 50 summaries from telecommunications corpus • Ran decomposition program • 93.8% of sentences were correctly decomposed • Seems like a weak definition of correct decomposition • Correct pairing between sentences • Correctly identified where phrases came from

  42. Portability • Test on a new type of data • Performed well

  43. But what did we learn about how humans generate summaries? • Analyzed 300 human written summaries • 19% of summary sentences did not have a matching sentence • 42% matched a single sentence • Often along with sentence reduction • 36% were created by combining 2 or 3 sentences • 3% created by combining more than that

  44. What would be interesting next steps?

  45. Idea from Nitin • Also as we have seen from the Hyland chapter abstracts tend to implicitly map the actual meta-discourse structure of the entire document(P-M-Pr etc) we can use this structure in the heuristic to assign relevant probabilities according to the document position of word, e.g. coming from introduction section versus coming from methods section. This would allow the HMM to realistically model the transition probabilities accomodating the information about the discourse structure of the original document,.

  46. Questions?

More Related