1 / 24

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Recognizing Structure: Sentence, Speaker, andTopic Segmentation. Julia Hirschberg CS 4706. Today. Recognizing structural information in speech Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora. Today.

lcramer
Download Presentation

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognizing Structure: Sentence, Speaker, andTopic Segmentation Julia Hirschberg CS 4706

  2. Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora

  3. Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora

  4. Recall: Discourse Structure for Speech Generation • Theoretical accounts (e.g. Grosz & Sidner ’86) • Empirical studies • Text vs. speech • How can they help in recognition? • Features to test • Acoustic/prosodic features • Lexical features

  5. Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora

  6. Indicators of Structure in Text • Cue phrases: now, well, first • Pronominal reference • Orthography and formatting -- in text • Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99): • Domain dependent • Domain independent

  7. Methods of Text Segmentation • Lexical cohesion methods vs. multiple source • Vocabulary similarity indicates topic cohesion • Intuition from Halliday & Hasan ’76 • Features: • Stem repetition • Entity repetition • Word frequency • Context vectors • Semantic similarity • Word distance • Methods: • Sliding window

  8. Lexical chains • Clustering • Combine lexical cohesion with other cues • Features • Cue phrases • Reference (e.g. pronouns) • Syntactic features • Methods • Machine Learning from labeled corpora

  9. Choi 2000: Text Segmentation • Implements leading methods and compares new algorithm to them on corpus of 700 concatenated documents • Comparison algorithms: • Baselines: • No boundaries • All boundaries • Regular partition • Random # of random partitions • Actual # of random partitions

  10. Textiling Algorithm (Hearst ’94) • DotPlot algorithms (Reynar ’98) • Segmenter (Kan et al ’98) • Choi ’00 proposal • Cosine similarity measure • Same: 1; no overlap 0

  11. Similarity matrix  rank matrix • Minimize effect of outliers • How likely is this sentence to be a boundary, compared to other sentences? • Divisive clustering based on • D(n) = sum of rank values (sI,j) of segment n/ inside area of segment n (j-i+1) – for i,j the sentences at the beginning and end of segment n • Keep dividing the corpus • until D(n) = D(n)- D(n-1) shows little change • Choi’s algorithm has best performance (9-12% error)

  12. Utiyama & Isahara ’02: What if we have no labeled data for our domain?

  13. Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora

  14. Types of Discourse Structure in Spoken Corpora • Domain independent • Sentence/utterance boundaries • Speaker turn segmentation • Topic segmentation • Domain dependent • Broadcast news • Meetings • Telephone conversations

  15. Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora

  16. Spoken Cues to Discourse Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96

  17. Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92

  18. Finding Sentence and Topic Boundaries • Statistical, Machine Learning approaches with large segmented corpora • Features: • Lexical cues • Domain dependent • Sensitive to ASR performance • Acoustic/prosodic cues • Domain independent • Sensitive to speaker identify

  19. Shriberg et al ’00: Prosodic Cues • Prosody cues perform as well or better than text-based cues at sentence and topic segmentation -- and generalize better? • Goal: identify sentence and topic boundaries at ASR-defined word boundaries • CART decision trees provided boundary predictions • HMM combined these with lexical boundary predictions from LM

  20. Features • For each potential boundary location: • Pause at boundary (raw and normalized by speaker) • Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) • Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) • F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity

  21. Voice quality (halving/doubling estimates as correlates of creak or glottalization) • Speaker change, time from start of turn, # turns in conversation and gender • Trained/tested on Switchboard and Broadcast News

  22. Sentence segmentation results • Prosodic features • Better than LM for BN • Worse (on transcription) and same for ASR transcript on SB • All better than chance • Useful features for BN • Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration • Useful features for SB • Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn

  23. Topic segmentation results (BN only): • Useful features • Pause at boundary, f0 range, turn/no turn, gender, time in turn • Prosody alone better than LM • Combined model improves significantly

  24. Next Class • Identifying Speech Acts • Reading: • This chapter of J&M is a beta version • Please keep a diary for: • Any typos • Any passages you think are hard to follow • Any suggestions • HW 3a due by class (2:40pm)

More Related