230 likes | 417 Views
Speech Summarization. Julia Hirschberg (thanks to Sameer Maskey for some slides) CS4706. Summarization Distillation.
E N D
Speech Summarization Julia Hirschberg (thanks to Sameer Maskey for some slides) CS4706
Summarization Distillation • ‘…the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks) [Mani and Maybury, 1999] • Why summarize? Too much data!
Types of Summarization • Indicative • Describes the document and its contents • Informative • ‘Replaces’ the document • Extractive • Concatenate pieces of existing document • Generative • Creates a new document • Document compression
[Salton, et al., 1995] Sentence Extraction Similarity Measures [McKeown, et al., 2001] Extraction Training w/ manual Summaries SOME SUMMARIZATION TECHNIQUES BASED ON TEXT (LEXICAL FEATURES) [Hovy & Lin, 1999] Concept Level Extract concepts units [Witbrock & Mittal, 1999] Generate Words/Phrases [Maybury, 1995] Use of Structured Data
Sentence Extraction/Similarity measures [Salton, et al. 1995] • Extract sentences by their similarity to a topic sentence and their dissimilarity to sentences already in summary (Maximal Marginal Relativity) • Similarity measures • Cosine Measure • Vocabulary Overlap • Topic word overlap • Content Signatures Overlap
Concept/content level extraction [Hovy & Lin, 1999] • Present key-words as summary • Builds concept signatures by finding relevant words in 30,000 WSJ documents, each categorized into different topics • Phrase concatenation of relevant concepts/content • Sentence planning for generation
Feature-based statistical models [Kupiec, et al., 1995] • Create manual summaries • Extract features • Train statistical model using various ML techniques • Use the trained model to score each sentence in the test data • Extract N highest-scoring sentences • Where S is summary given k features Fj and P(Fj) & P(Fj|s of S) can be computed by counting occurrences
Structured Database [Maybury, 1995] • Summarize text represented in structured form: database, templates • E.g. generation of a medical history from a database of medical ‘events’ • Link analysis (semantic relations within the structure) • Domain dependent importance of events
Comparing Speech and Text Summarization • Alike • Identifying important information • Some lexical, discourse features • Extraction or generation or compression • Different • Speech Signal • Prosodic features • NLP tools? • Segments – sentences? • Generation? • Errors • Data size
Text vs. Speech Summarization (NEWS) Speech Signal Speech Channels - phone, remote satellite, station Transcripts - ASR, Close Captioned Error-free Text Transcript- Manual Many Speakers - speaking styles Lexical Features Some Lexical Features Segmentation -sentences Structure -Anchor, Reporter Interaction Story presentation style Prosodic Features -pitch, energy, duration NLP tools Commercials, Weather Report
Speech Summarization Today • Mostly extractive: • Words, sentences, content units • Some compression methods • Generation-based summarization difficult • Text or synthesized speech?
Generation or Extraction? • SENT27 a trial that pits the cattle industry against tv talk show host oprah winfrey is under way in amarillo , texas. • SENT28 jury selection began in the defamation lawsuit began this morning . • SENT29 winfrey and a vegetarian activist are being sued over an exchange on her April 16, 1996 show . • SENT30 texas cattle producers claim the activists suggested americans could get mad cow disease from eating beef . • SENT31 and winfrey quipped , this has stopped me cold from eating another burger • SENT32 the plaintiffs say that hurt beef prices and they sued under a law banning false and disparaging statements about agricultural products • SENT33 what oprah has done is extremely smart and there's nothing wrong with it she has moved her show to amarillo texas , for a while • SENT34 people are lined up , trying to get tickets to her show so i'm not sure this hurts oprah . • SENT35 incidentally oprah tried to move it out of amarillo . she's failed and now she has brought her show to amarillo . • SENT36 the key is , can the jurors be fair • SENT37 when they're questioned by both sides, by the judge , they will be asked, can you be fair to both sides • SENT38 if they say , there's your jury panel • SENT39 oprah winfrey's lawyers had tried to move the case from amarillo , saying they couldn't get an impartial jury • SENT40 however, the judge moved against them in that matter … story summary
[Christensen et al., 2004] Sentence extraction with similarity measures [Hori C. et al., 1999, 2002] , [Hori T. et al., 2003] Word scoring with dependency structure SPEECH SUMMARIZATION TECHNIQUES [Koumpis & Renals, 2004] Classification [He et al., 1999] User access information [Zechner, 2001] Removing disfluencies [Hori T. et al., 2003] Weighted finite state transducers
Content/Context sentence level extraction for speech summary [Christensen et al., 2004] • Find sentences similar to the lead topic sentences • Use position features to find the relevant nearby sentences after detecting a topic sentence • where Sim is a similarity measure between two sentences or a sentence and a document (D) and E is the set of sentences already in the summary • Choose a new sentence which is most like D and most different from E
Weighted finite state transducers for speech summarization [Hori T. et al., 2003] • Summarization includes speech recognition, paraphrasing, sentence compaction integrated into single Weighted Finite State Transducer • Decoder can use all knowledge sources in one-pass strategy • Speech recognition using WFST • Where H is state network of triphone HMMs, C is triphone connection rules, L is pronunciation and G is trigram language model • Paraphrasing can be looked at as a kind of machine translation with translation probability P(W|T) where W is source language and T is the target language • If S is the WFST representing translation rules and D is the language model of the target language speech summarization can be looked at as the following composition Speech Translator H C L G S D Speech recognizer Translator
User Access Identifies What to Include [He et al., 1999] • Summarize lectures or shows by extracting parts that have been viewed the longest • Needs multiple users of the same show, meeting or lecture for training • E.g. To summarize lectures compute the time spent on each slide • Summarizer based on user access logs did as well as summarizers that used linguistic and acoustic features • Average score of 4.5 on a scale of 1 to 8 for the summarizer (subjective evaluation)
Word level extraction by scoring/classifying words [Hori C. et al., 1999, 2002] • Score each word in the sentence and extract a set of words to form a sentence whose total score is the product/sum of the scores of each word • Example: • Word Significance score (topic words) • Linguistic Score (bigram probability) • Confidence Score (from ASR) • Word Concatenation Score (dependency structure grammar) Where M is the number of words to be extracted, and I C T are weighting factors for balancing among L, I, C, and T r
Segmentation Using Discourse Cues [Maybury, 1998] • Discourse Cue-Based Story Segmentation • Discourse Cues in CNN • Start and end of broadcast • Anchor/Reporter handoff, Reporter/Anchor handoff • Cataphoric Segment (“still ahead …”) • Time Enhanced Finite State Machine representing discourse states such as anchor segment, reporter segment, advertisement • Other features: named entities, part of speech, discourse shifts “>>” speaker change, “>>>” subject change
CU: Summarization without Words: Does importance of ‘what’ is said correlates with ‘how’ it is said? • Hypothesis: “Speakers change their amplitude, pitch, speaking rate to signify importance of words, phrases, sentences.” • If so, then the prediction labels for sentences predicted using acoustic features (A) should correlate with labels predicted using lexical features (L) • In fact, this seems to be true (corr .74 between precitions of A and L
Is It Possible to Build ‘good’ Automatic Speech Summarization Without Any Transcripts? • Just using A+S without any lexical features we get 6% higher F-measure and 18% higher ROUGE-avg than the baseline
Evaluation using ROUGE • F-measure too strict • Predicted summary sentences must match summary sentences exactly • What if content is similar but not identical? • ROUGE(s)…
ROUGE metric • Recall-Oriented Understudy for Gisting Evaluation (ROUGE) • ROUGE-N (where N=1,2,3,4 grams) • ROUGE-L (longest common subsequence) • ROUGE-S (skip bigram) • ROUGE-SU (skip bigram counting unigrams as well) • Does ROUGE solve the problem?
Next Class • Emotional speech • HW 4 assigned