Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Recognizing Structure: Sentence, Speaker, andTopic Segmentation Julia Hirschberg CS 4706

Today • Recognizing structural information in speech • Learning from generation • Learning from text segmentation • Types of structural information • Segmentation in spoken corpora

Recall: Discourse Structure for Speech Generation • Theoretical accounts (e.g. Grosz & Sidner ’86) • Empirical studies • Text vs. speech • How can they help in recognition? • Features to test • Acoustic/prosodic features • Lexical features

Indicators of Structure in Text • Cue phrases: now, well, first • Pronominal reference • Orthography and formatting -- in text • Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99): • Domain dependent • Domain independent

Methods of Text Segmentation • Lexical cohesion methods vs. multiple source • Vocabulary similarity indicates topic cohesion • Intuition from Halliday & Hasan ’76 • Features: • Stem repetition • Entity repetition • Word frequency • Context vectors • Semantic similarity • Word distance • Methods: • Sliding window

Lexical chains • Clustering • Combine lexical cohesion with other cues • Features • Cue phrases • Reference (e.g. pronouns) • Syntactic features • Methods • Machine Learning from labeled corpora

Choi 2000: Text Segmentation • Implements leading methods and compares new algorithm to them on corpus of 700 concatenated documents • Comparison algorithms: • Baselines: • No boundaries • All boundaries • Regular partition • Random # of random partitions • Actual # of random partitions

Textiling Algorithm (Hearst ’94) • DotPlot algorithms (Reynar ’98) • Segmenter (Kan et al ’98) • Choi ’00 proposal • Cosine similarity measure • Same: 1; no overlap 0

Similarity matrix  rank matrix • Minimize effect of outliers • How likely is this sentence to be a boundary, compared to other sentences? • Divisive clustering based on • D(n) = sum of rank values (sI,j) of segment n/ inside area of segment n (j-i+1) – for i,j the sentences at the beginning and end of segment n • Keep dividing the corpus • until D(n) = D(n)- D(n-1) shows little change • Choi’s algorithm has best performance (9-12% error)

Utiyama & Isahara ’02: What if we have no labeled data for our domain?

Types of Discourse Structure in Spoken Corpora • Domain independent • Sentence/utterance boundaries • Speaker turn segmentation • Topic segmentation • Domain dependent • Broadcast news • Meetings • Telephone conversations

Spoken Cues to Discourse Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96

Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92

Finding Sentence and Topic Boundaries • Statistical, Machine Learning approaches with large segmented corpora • Features: • Lexical cues • Domain dependent • Sensitive to ASR performance • Acoustic/prosodic cues • Domain independent • Sensitive to speaker identify

Shriberg et al ’00: Prosodic Cues • Prosody cues perform as well or better than text-based cues at sentence and topic segmentation -- and generalize better? • Goal: identify sentence and topic boundaries at ASR-defined word boundaries • CART decision trees provided boundary predictions • HMM combined these with lexical boundary predictions from LM

Features • For each potential boundary location: • Pause at boundary (raw and normalized by speaker) • Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) • Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) • F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity

Voice quality (halving/doubling estimates as correlates of creak or glottalization) • Speaker change, time from start of turn, # turns in conversation and gender • Trained/tested on Switchboard and Broadcast News

Sentence segmentation results • Prosodic features • Better than LM for BN • Worse (on transcription) and same for ASR transcript on SB • All better than chance • Useful features for BN • Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration • Useful features for SB • Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn

Topic segmentation results (BN only): • Useful features • Pause at boundary, f0 range, turn/no turn, gender, time in turn • Prosody alone better than LM • Combined model improves significantly

Next Class • Identifying Speech Acts • Reading: • This chapter of J&M is a beta version • Please keep a diary for: • Any typos • Any passages you think are hard to follow • Any suggestions • HW 3a due by class (2:40pm)

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Presentation Transcript

Adding Variety with Sentence Structure

Recognizing Structure: Sentence and Topic Segmentation

Sentence Structure

Kinds of Sentence Structure

Sentence Fragments

Writing – Sentence Structure

Sentence Structure

Correct Sentence structure.

Sentence Structure

EQ: How does an author's use of sentence structure impact the meaning of the text?

The Four Types of Sentence Structure

Sentence Structure

The Sentence

STRUCTURE OF SENTENCE

Chapter 20: Parallel Structure

Sentence Structure Starters

9 Grammar Sentence Structure

Making your Sentence Structure Clear!!

VQ speaker verification with sentence codebook

When is a Sentence not a Sentence?

4. The sentence structure