240 likes | 255 Views
Recognizing Structure: Sentence, Speaker, andTopic Segmentation. Julia Hirschberg CS 4706. Today. Recognizing structural information in speech Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora. Today.
E N D
Recognizing Structure: Sentence, Speaker, andTopic Segmentation Julia Hirschberg CS 4706
Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora
Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora
Recall: Discourse Structure for Speech Generation • Theoretical accounts (e.g. Grosz & Sidner ’86) • Empirical studies • Text vs. speech • How can they help in recognition? • Features to test • Acoustic/prosodic features • Lexical features
Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora
Indicators of Structure in Text • Cue phrases: now, well, first • Pronominal reference • Orthography and formatting -- in text • Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99): • Domain dependent • Domain independent
Methods of Text Segmentation • Lexical cohesion methods vs. multiple source • Vocabulary similarity indicates topic cohesion • Intuition from Halliday & Hasan ’76 • Features: • Stem repetition • Entity repetition • Word frequency • Context vectors • Semantic similarity • Word distance • Methods: • Sliding window
Lexical chains • Clustering • Combine lexical cohesion with other cues • Features • Cue phrases • Reference (e.g. pronouns) • Syntactic features • Methods • Machine Learning from labeled corpora
Choi 2000: Text Segmentation • Implements leading methods and compares new algorithm to them on corpus of 700 concatenated documents • Comparison algorithms: • Baselines: • No boundaries • All boundaries • Regular partition • Random # of random partitions • Actual # of random partitions
Textiling Algorithm (Hearst ’94) • DotPlot algorithms (Reynar ’98) • Segmenter (Kan et al ’98) • Choi ’00 proposal • Cosine similarity measure • Same: 1; no overlap 0
Similarity matrix rank matrix • Minimize effect of outliers • How likely is this sentence to be a boundary, compared to other sentences? • Divisive clustering based on • D(n) = sum of rank values (sI,j) of segment n/ inside area of segment n (j-i+1) – for i,j the sentences at the beginning and end of segment n • Keep dividing the corpus • until D(n) = D(n)- D(n-1) shows little change • Choi’s algorithm has best performance (9-12% error)
Utiyama & Isahara ’02: What if we have no labeled data for our domain?
Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora
Types of Discourse Structure in Spoken Corpora • Domain independent • Sentence/utterance boundaries • Speaker turn segmentation • Topic segmentation • Domain dependent • Broadcast news • Meetings • Telephone conversations
Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora
Spoken Cues to Discourse Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96
Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92
Finding Sentence and Topic Boundaries • Statistical, Machine Learning approaches with large segmented corpora • Features: • Lexical cues • Domain dependent • Sensitive to ASR performance • Acoustic/prosodic cues • Domain independent • Sensitive to speaker identify
Shriberg et al ’00: Prosodic Cues • Prosody cues perform as well or better than text-based cues at sentence and topic segmentation -- and generalize better? • Goal: identify sentence and topic boundaries at ASR-defined word boundaries • CART decision trees provided boundary predictions • HMM combined these with lexical boundary predictions from LM
Features • For each potential boundary location: • Pause at boundary (raw and normalized by speaker) • Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) • Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) • F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity
Voice quality (halving/doubling estimates as correlates of creak or glottalization) • Speaker change, time from start of turn, # turns in conversation and gender • Trained/tested on Switchboard and Broadcast News
Sentence segmentation results • Prosodic features • Better than LM for BN • Worse (on transcription) and same for ASR transcript on SB • All better than chance • Useful features for BN • Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration • Useful features for SB • Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn
Topic segmentation results (BN only): • Useful features • Pause at boundary, f0 range, turn/no turn, gender, time in turn • Prosody alone better than LM • Combined model improves significantly
Next Class • Identifying Speech Acts • Reading: • This chapter of J&M is a beta version • Please keep a diary for: • Any typos • Any passages you think are hard to follow • Any suggestions • HW 3a due by class (2:40pm)