1 / 22

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News. Gina-Anne Levow University of Chicago SIGHAN July 25, 2004. Roadmap. The Problem: Mandarin Story Segmentation The Tools: Prosodic and Text Cues Mandarin Chinese Individual Results Integrating Cues

aggie
Download Presentation

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Prosodic and Text Featuresfor Segmentation ofMandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004

  2. Roadmap • The Problem: Mandarin Story Segmentation • The Tools: Prosodic and Text Cues • Mandarin Chinese • Individual Results • Integrating Cues • Conclusion & Future Work

  3. The Problem:Mandarin Speech Topic Segmentation • Separate audio stream into component topics

  4. Why Segment? • Enables language understanding tasks • Information Retrieval • Only regions of interest • Summarization • Cover all main topics • Reference Resolution • Pronouns tend to refer within segments

  5. The Challenge • How do we define/measure topicality? • Are two regions on the same topic? • Fundamentally requires full understanding • How can we approach with partial understanding? • How do we identify boundaries sharply? • Association of sentences may be ambiguous • Especially, “filler”

  6. The Tools: Prosodic and Text Cues • Represent local changes at boundaries with audio • Silence!, speaker change, pitch, loudness, rate (GHN, AT&T00) • Represent topicality with text • Component words in audio stream • Possibly noisy • Many possible models (Hearst 94, Beeferman99,..) • Combining Prosody and Text • Human annotators more accurate, confident if use BOTH transcribed text and original audio!! (Swerts 97) • English broadcast news (Tur et al, 2001)

  7. Data and Processing • Broadcast News • Topic Detection and Tracking TDT3 corpus • Voice of America broadcast news • ASR transcription • Manually segmented – known boundaries • ~4,000 stories, ~750K words • Acoustic analysis (Praat) • Automatic pitch, intensity tracking • Smoothed, speaker-normalized, per-word

  8. Acoustic-Prosodic Cues • Languages differ in use of intonation • E.g. English: declarative fall, question rise • Chinese: pitch contour determines word meaning • At segment boundaries??? • Surprisingly similar, though not identical • Significantly lower pitch at end of segment • Significantly lower amplitude at end of segment • Significantly longer duration at end of segment

  9. Acoustic-Prosodic Contrasts Mandarin Normalized Pitch Mandarin Normalized Intensity

  10. Learning Boundaries • Decision tree classifier (Quinlan C4.5) • Classification problem • For each word, classify as final/non-final • Features • Acoustic-Prosodic: • Duration, Pitch, Loudness, Silence • Word average, Between-word difference

  11. Text Boundary Features • Text • Information retrieval style • Cosine similarity between weighted term vectors • tf*idf in 50-word windows • Cue phrases • N-gram features • Identified by BoosTexter (Schapire & Singer, 2000) • E.g. “Voice of America”, “Audience”, “Reporting”

  12. Classification Results • Balanced training and test sets • Results on held-out subsets • Acoustic cues only • 95.6% accuracy • Text cues (+ silence) • 95.6% accuracy • Combined text and prosody • 96.4% accuracy • Typically, false alarms twice as common as miss

  13. Joint Decision Tree < <

  14. Feature Assessment • Role of silence • Useful in both text and acoustic classifiers • More necessary for text • Text captures topicality, not locality • Can not identify boundaries sharply • Prosodic cues: • Localize boundaries • Multiple supporting cues: intensity, pitch: contrastive use

  15. Issue: False Alarms • Evaluate representative sample • Boundary <<< Non-boundary • 95.6% accuracy • 2% miss, 4.4% false alarms • Non-boundary frequent • False alarms frequent

  16. Voting Against False Alarms • Error analysis: • Construct per-feature classifiers: • Prosody-only, text-only, silence-only • Compare classifiers: per-feature, joint • Joint + 0,1 per-feature classifer FALSE ALARM • Approach: Voting • Require joint + 2 per-feature classifiers • Result: 1/3 reduction in false alarms • ~97% accuracy: 2.8% miss, 3.15% false alarm

  17. Conclusion • Mandarin broadcast news segmentation • Identify topicality and boundary locality • Integrate text and acoustic cues • Text similarity: vector space model, n-gram cues • Prosodic cues: Silence, intensity, pitch, duration • Robust across range of languages • Provide supporting and orthogonal information • Majority agreement of per-feature classifiers: • 1/3 fewer alarms

  18. Current & Future Work • Improving the model of topicality • Richer text similarity models; broader acoustic models • Alternative classifiers • Preliminary experiments: • Boosting, Boosted Decision trees, MaxEnt • Comparable • Alternative integration strategies • Hierarchical subtopic segmentation • Broadcast news • Dialogue: human-computer, human-human • Integration with multi-modal features: e.g. gesture, gaze

  19. Acoustic-Prosodic Contrasts English Normalized Intensity Mandarin Normalized Pitch Mandarin Normalized Intensity English Normalized Pitch

  20. Text Decision Tree

  21. Prosodic Decision Tree

  22. The Problem:Speech Topic Segmentation • Separate audio stream into component topics On "World News Tonight" this Thursday, another bad day on stock markets, all over the world global economic anxiety. || Another massacre in Kosovo, the U.S. and its allies prepare to do something about it. Very slowly. || And the millennium bug, Lubbock Texas prepares for catastrophe, India sees only profit.||

More Related