1 / 20

Comparing Lexical, Acoustic/Prosodic, Structural and Discourse Features for Speech Summarization

Comparing Lexical, Acoustic/Prosodic, Structural and Discourse Features for Speech Summarization. Sameer Maskey, Julia Hirschberg Department of Computer Science Columbia University, New York, NY Eurospeech 2005 Presented by Yi-Ting. References.

Download Presentation

Comparing Lexical, Acoustic/Prosodic, Structural and Discourse Features for Speech Summarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing Lexical, Acoustic/Prosodic, Structural and Discourse Features for Speech Summarization Sameer Maskey, Julia Hirschberg Department of Computer Science Columbia University, New York, NY Eurospeech 2005 Presented by Yi-Ting

  2. References • Sameer Maskey, Julia Hirschberg, “Comparing Lexical, Acoustic/Prosodic, Structural and Discourse Features for Speech Summarization”, Eurospeech 2005. • Inoue, A., Mikami, T., Yamashita, Y. “Improvement of Speech Summarization Using Prosodic Information”, Proc. Speech Prosody, 2004, Japan. • Marcu, D. “Discourse trees are good indicators of importance in text” In Advances in Automatic Text Summarization,pages 123-136, 1999.

  3. Outline • Introduction • The Corpus and Annotations • Features and Method • Experiments and Results • Conclusion

  4. Introduction • Most text-based summarization systems rely upon lexical, syntactic, and positional information. • News broadcasts contain additional source of information that text news stories typically do not. • No study has examined the relative contribution of different feature classes. • Proposed new types of features in some categories and compare and contrast the utility of the different feature types for the summarization of Broadcast News stories.

  5. The Corpus and Annotations • Summarization system currently operates on Broadcast News shows from the TDT-2 corpus. • To build this system, used 216 stories from 20 CNN shows from the TDT-2 corpus. • Used manual transcripts, Dragon ASR transcripts. • Manual transcriptions of 96 shows were further annotated with named entities, openings and closings, headlines, interviews and sound-bytes.

  6. Features and Method • Lexical Features • Person names (NEⅠ)、organization names (NEⅡ)、and place names (NEⅢ) • Total number of named entities. • Number of words in the current, previous and following sentence.

  7. Finds that timeLen, minDB and maxDB are particular discriminatory Features and Method • Prosodic/Acoustic Features • speaking rate (the ratio of voiced/total frames) • F0 minimum, maximum, and mean • F0 range and slope • minimum, maximum, and mean RMS energy(minDB, maxDB, meanDB) • RMS slope (slopeDB) • Sentence duration (timeLen=endtime)

  8. Features and Method • Prosodic/Acoustic Features-F0

  9. Features and Method • Prosodic/Acoustic Features-RMS Calculation • Root mean square (RMS) is a fundamental measurement of the magnitude of an ac signal.

  10. Features and Method • Prosodic/Acoustic Features • Normalized features were produced by dividing each feature by the average of the feature values for each speaker. • Normalized acoustic features performed better than raw values. • Motivation for including duration feature is twofold:very short segments are not likely to contain important information and very long segments may not be useful to include in a summaries.

  11. Features and Method • Structural Features • Broadcast News programs exhibit similar structure, particularly broadcasts of the show from the same news channel. • normalized/sentence position in turn, speaker type, next-speaker, previous-speaker type, speaker change, turn position in the show and sentence position in the show. • speaker type feature is binary, “reporter or not”.

  12. Features and Method • Discourse Features • Having explored a discourse feature by computing a measure of ‘givenness’ in stories. • This feature is a score that ranges between -1 and 1. • Equation: ni is the number of ‘new’ noun stems in sentence i, d is the total number of unique noun stems in the story; si is the number of noun stems in the sentence I that have already been seen in the story; t is the total nmber of num stems in the story.

  13. Experiments and Results • Found that the contribution of our various feature classes could best be examined using a Bayesian Network classifier with 10-fold cross validation, on features selected by a procedure that combines subset evaluation with rank search and best-first search. • Constructed a baseline for this task by concatenating the first 23% of sentences from each show.

  14. Experiments and Results

  15. Experiments and Results • Performed feature selection on our entire set of features using a selection algorithm. • F-measure with just these features is 0.53 which is only 1% lower than the highest F-measure shown in Figure1.

  16. Experiments and Results • ROUGE (Recall-Oriented Understudy for Gisting Evaluation): • Rouge measures overlap units between automatic and manual summaries. • ROUGE-N、ROUGE-L、ROUGE-S、ROUGE-SU

  17. Experiments and Results

  18. Conclusion • Presented results of an empirical study comparing different types of features that may be useful for speech summarization. • Accurate speech summarization is possible in the absence of transcription, since our acoustic/prosodic and structural features alone obtain an F-measure of 0.50 and ROUGE score of 0.76.

More Related