1 / 23

Summarization and Personal Information Management

Summarization and Personal Information Management. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Announcements. Questions? Plan for Today Paice article on Cue Phrases

jbono
Download Presentation

Summarization and Personal Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. Announcements • Questions? • Plan for Today • Paice article on Cue Phrases • Next time we’ll again talk about old work, but then starting next Tuesday we’ll get into more recent techniques • Critique of Summary design

  3. Quote from paper • The alternative of picking sentences from here and there in a document is an unnatractive proposition

  4. Paice on Cue Phrases • Luhn 1958: consider certain words as keywords, and select sentences with a high density of them • Baxendale 1958: position of a sentence in a document is an important factor

  5. Paice on Cue Phrases • Edmundson 1969: compared different strategies • Location method: sentences at beginning or ending of paper, first sentence in a paragraph, or sentence right under a significant heading • Cue method: used cue phrases that mark important sentences • Key method: Luhn 58 approach • Title method: weighted words higher if they were either in the title or some significant heading

  6. Paice on Cue Phrases • Earl 1970: investigated whether syntax made predictions, but her syntactic patterns were too specific and she didn’t get any generalization • Skorokhod’ko 1972: pointed out that different genres structure their texts differently, and so the approach needs to vary from one genre to another • Taylor 1977: construct a semantic network from the text and then generate a summary from a maximally connected subnetwork

  7. Paice on Cue Phrases • Rush et al., 1971 and Pollack and Zamora, 1975, mark words with status as likely to be important or not, and remove the sentences most likely to be unimportant • Care taken to avoid dangling references • Trim off extraneous text • Aggregate sentences where possible to remove redundancies • Karasev 1978: similar cue phrase approach

  8. Types of Abstracts • Indicative Abstracts: just tell you what the article is about • Informative Abstracts: give you an overview of the content • Critical and Comparative Abstracts: like a book review, etc.

  9. Exophoric Links • These are links that show that two sentences “go together” • Extracts that include chunks of text where the sentences were adjacent already in the initial text will be more coherent

  10. 4 Stage Process • Identify and weight indicator phrases • Aggregate regions of text with exophoric references • Most highly weighted aggregates are selected • Texts are trimmed to remove extraneous text, etc. • ** Uses techniques from prior work, but only first two have been evaluated

  11. Notes on Cue Phrases • If they were listed exhaustively, there would be several thousand • We use templates that represent “paradigm cases” • Work like “semantic grammars” • The actual phrases typically include some extra “fluff”, so each template comes with a “skip limit” • Stemming also helps • Some words in a template may carry more weight

  12. Aggregating Sentences • Sentences more strongly related to other sentences within the same paragraph • Less to adjacent paragraphs • Even less to those in more distant paragraphs

  13. Exophoric References • Reference resolution is necessary • “this” in “this paper” (not exophoric since it refers to the paper rather than something in the paper) • We did … and this was a good thing (this not exophoric because it’s resolved within the sentence) • Cataphora versus anaphora: both exophoric, but point in different directions • Discourse connectives such as “First”, “However”, Moreover” are also exophoric

  14. Neutralizers • References to figures and tables • References to other documents • An algorithm of this kind is found in (XXX, 1980) • What makes references hard is that people have such a variety of styles of using them, not all of which conform to any standard of “correctness”

  15. Student Comment • I knew the facts the automatic abstract process is extracting, but they were not necessarily the most salient facts for me when I selected this paper as important for reading. Surprisingly, very little of what made the paper interesting to me in the first place was captured in the abstract by this technique.

  16. Student Comment • I hypothesize this is because this paper contained lots of useful "background facts" which were relevant, while the specific test results were of only secondary importance. If I already knew everything about the subject, the abstracts would probably contain the information that I was most interested in. This could be an example of different perspectives on a document.

  17. Student Comment • After reading the paper on literature abstract generation, I feel that most of the techniques cannot be generalized over a long period of time as the scientific jargon changes. Some of the templates might not be able to keep up with the ever evolving method of scientific article meta-discourse. • How much time was there between when the paper was written and when your example paper was written?

  18. Student Comment • Cue phrases worked pretty well. In fact all the sentences in the abstracted matched the templates as presented in the paper. I could still find the cue phrases in other parts of the paper. Hence the generated abstract would contain almost all the sentences of the original abstract. • But what’s the problem?

  19. Student Comment • The cue phrase technique has worked well since the list seems quite exhaustive. The original abstract is mentioning almost the same points as mentioned in this abstracts. There is a problem in tense usage and repetition of cue phrases. Another major point is that the original abstract has a natural flow of sentences while this one lacks good coherence.

  20. Critique

  21. Homework 1 (Due Jan 25, 8pm) • 1 page write-up, posted to Drupal • Feel free to post comments in response to write-ups submitted by your class mates • Select one of the Grand Challenges • Describe the scenario you are targeting • What is the main problem in connection with information overload here? • What is your proposed solution and why do you think it will work? • Mock up an example summary to illustrate your idea

  22. Critique • Feedback from your peers should help you decide how to formulate your project proposal • Due one week from today • Same format as homework assignment

  23. Questions?

More Related