Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs

Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013

Prosody • Prosody – Pitch, Intensity, Rhythm, Silence • Prosody carries information about a speaker’s intent and identity. • Here: prosodic recognition of • Speaking Style • Nativeness • Speaker

Approach • Unsupervised clustering of acoustic/prosodic features. • Sequence modeling of cluster identities

K-Means • K-means is a simple distance based clustering algorithm. • Iterative, non-deterministic (sensitive to initialization) • Must specify K. • We evaluate K between 2 and 100. Optimal value from cross-validation for each task

Dirichlet Process GMMs • Non-parametric infinite mixture model • need a prior of π – the dirichlet process • and a prior over N – a zero mean gaussian • still need to set hyper parametersαand G0 • Stick-breaking & Chinese Restaurant metaphors • Bleiand Jordan 2005Variational Inference • “Rich get Richer” Plate notation from M. Jordan 2005 NIPS tutorial

DPGMM “Rich get Richer” Artificially omit the largest cluster α= 0. 25

Prosodic Event Distribution • ToBI Prosodic Labels • Pitch Accents, Phrase Accent/Boundary Tones Accent Type Distribution Phrase Ending Distribution

Sequence Modeling • SRILM 3-gram model • Backoff & GT smoothing • Clusters learned over all material • Sequence models trained over train sets

Experiments • Classification • Train one SRILM model per class. • Classify by lowest perplexity • Outlier Detection • Train a single model. • Classifier learns a perplexity threshold • Speaking Style, Nativeness, Speaker Recognition • Evaluation • 500 samples between 10-100 syllables (~2-20 seconds) • ToBI, K-Means, DPGMM, DPGMM’ (removing the largest cluster) • 5 fold Cross-validation to learn hyperparameters

Data • Boston Directions Corpus • READ, SPONTANEOUS • 4 speakers (used for Speaker Classification) • Boston University Radio News Corpus • BROADCAST NEWS • 6 speakers • Columbia Games Corpus • SPONTANEOUS DIALOG • 13 speakers • Native Mandarin Chinese Speakers reading BURNC stories. • 4 speakers • All ToBI Labeled

Features • Villing (2004) pseudosyllabification • Syllables with mean intensity below 10dB are considered “silent” • 7 Features • Mean range normalized intensity • Mean range normalized delta intensity • Mean z-score normalized log f0 • Mean z-score normalized delta log f0 • Syllable duration • Duration of previous silence (if any) • Duration of following silence (if any)

Consistency with ToBI labels • V-Measure between • ToBI Accent Types and clusters • ToBIIntonational Phrase-ending Tones and clusters • K-means, solid line • DPGMM, gray line for reference (doesn’t vary by more than 0.001) Accenting Phrasing

Speaking Style Recognition • 4 styles: READ, SPON, BN, DIALOG • Single speaker for evaluation. Outlier Detection - Dialog Classification

Nativeness Recognition • Native (BURNC) vs. Non-Native • Single speaker for evaluation. Outlier Detection - Native Classification

Speaker Recognition • 6 BURNC Speakers • Detect f2b vs. others • 4 BDC Speakers • 6 tasks for training, 3 for testing Outlier Detection Classification

Conclusions • K-means works well to represent prosodic information • DPGMM does not work so well out-of-the-box. • Despite being non-parametric, hyperparameter setting is still critically important • Future Work • Larger acoustic/prosodic feature set. • requires pre-processing • Evaluating the universality of prosodic representations • Integration of K-means and DPGMM. • Use one to seed the other.

Thank you andrew@cs.qc.cuny.edu http://speech.cs.qc.cuny.edu

Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs

Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs

Presentation Transcript

K-means algorithm

K-means Clustering

k-Means and DBSCAN

K-Means

Scalable K-Means++

K-means and Fuzzy K-means

Hierarchical Dirichlet Process (HDP)

The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling

k-Means and DBSCAN

K-Means Clustering

K-means Clustering

Process Modeling With

Double Dirichlet Process Mixtures

Dirichlet process tutorial

Generalized Spatial Dirichlet Process Models

Hierarchical Dirichlet Process (HDP)

GMMS

Clustering: K-Means

The Nested Dirichlet Process

k-Means and DBSCAN

K-means

Double Dirichlet Process Mixtures