Look Who’s Talking Now: The Challenge of Conversational Data

Look Who’s Talking Now:The Challenge of Conversational Data Uche O. Abanulo Physics, Engineering And Geosciences Uchechukwu Abanulo

Presentation Outline • Introduction • Speaker Recognition • Challenges of Conversational Data • General Applications of Research • Modeling and Comparing Speakers • Traditional Speaker Modeling • Proposed Method • Features Used • Distances Used • Application Systems • Speaker Count • Generalized Speaker Indexing • Enhancement of Results - fusion • Summary Uche O. Abanulo Physics, Engineering And Geosciences

Introduction Introduction Modeling/Comparing Speakers Application Systems Summary

Reference Speech Feature Extraction Model Building Test Speech Feature Extraction Recognition Decision Comparison Speaker Recognition • Speaker Identification • Who is this speaker? • Speaker Verification • Is he who he claims to be? Introduction Modeling/Comparing Speakers Application Systems Summary System Output

Speaker Segmentation • Broadcast News/Conference Data • Conversational Data Introduction Modeling/Comparing Speakers Application Systems Summary

Challenges of Conversational Data • No a priori information available from participating speakers. • Training is impossible • No a priori knowledge of change points • Speakers alternate very rapidly. • Limited amounts of data for single speaker representations • Distortion • Channel noise, co-channel data Introduction Modeling/Comparing Speakers Application Systems Summary

Proposed Solutions • Selective creation of data models • Distance-Based Model Comparison • Development of application-specific system Introduction Modeling/Comparing Speakers Application Systems Summary

Applications • Criminal Activity Detection • Monitoring inmate conversations • Prevention of 3-way calls • Notification of suspicious contacts • Enhancement of keyword detection • Development of speaker databases for uncooperative people • Forensics • Voiceprints Introduction Modeling/Comparing Speakers Application Systems Summary

Applications • Commercial Services • Personalized contact with customers • Storage/Search/Retrieval of Audio Data • Conference calls Introduction Modeling/Comparing Speakers Application Systems Summary

Applications • Military Activities • Pilot-control tower communications • Detection of unidentified speakers on pilot radio channels Introduction Modeling/Comparing Speakers Application Systems Summary

Modeling and Comparing Speakers Introduction Modeling/Comparing Speakers Application Systems Summary

Traditional Speaker Modeling • Examples • Gaussian Mixture Models • Hidden Markov Models • Neural Networks • Prosody-Based Models • Disadvantages • Require large amounts of speech data • Sometimes require training procedure Introduction Modeling/Comparing Speakers Application Systems Summary

Conversational Data Modeling • Current Method • Equal segmentation of data • Indiscriminate use of data • Problems • Change points unknown • Not all speech is useful • Poor performance Introduction Modeling/Comparing Speakers Application Systems Summary

S V U V U V … U V U V S V . . . V V V V V V MEAN AND COVARIANCE MATRIX COMPUTATION MEAN AND COVARIANCE MATRIX COMPUTATION Novel Speaker Modeling Introduction Modeling/Comparing Speakers Application Systems Summary SEGMENT 1 SEGMENT M FEATURE COMPUTATION FEATURE COMPUTATION . . . MODEL 1 MODEL M

Proposed Speaker Modeling • Why voiced only? • Same speech class compared • Contains the most information • What’s the appropriate number of phonemes? • Large enough to sufficiently represent speakers • Small enough to avoid speaker overlap Introduction Modeling/Comparing Speakers Application Systems Summary

Features Considered • Linear Predictive Cepstral Coefficients • Model the vocal tract • Mel-Scale Frequency Cepstral Coefficients • Model the human auditory system Introduction Modeling/Comparing Speakers Application Systems Summary

Cepstral Analysis Frequency Analysis of Speech Excitation Component Vocal Tract Component STFT of Speech Slowly varying formants Fast varying harmonics = X Log of STFT Log of Excitation Log of Vocal Tract Component = + IDFT of Log of STFT Excitation Vocal tract + =

Distance Measurements Introduction Modeling/Comparing Speakers Application Systems Summary Different speaker distances Same speaker distances

Distance Measures • Mahalanobis Distance • Measures the separation between the means of both classes • Hotelling’s T-Square Statistics • Measures the separation between the means of both classes and takes into consideration the data lengths • Kullback-Leibler Distance • Measures the separation between the distribution of both classes • Bhattacharyya Distance • Derived from measuring the classification error between both classes • Levene’s Test • Measures absolute deviation from the center of the class distribution Introduction Modeling/Comparing Speakers Application Systems Summary

Modeling Analysis N = 20 – 4 seconds of voiced speech Introduction Modeling/Comparing Speakers Application Systems Summary

Best Number of Phonemes? Introduction Modeling/Comparing Speakers Application Systems Summary Number of Phonemes Features Used - LPCC

Application Systems Introduction Modeling/Comparing Speakers Application Systems Summary

Reference Model Selected Randomly Reference Model Selected Randomly Reference Model Selected Randomly Speaker Count System • The Residual Ratio Algorithm (RRA) • Process is repeated K-1 times for counting up to K speakers Too little data Removed, select Another model Introduction Modeling/Comparing Speakers Application Systems Summary DLR-based Model Comparison DLR-based Model Comparison . . .

RRA Examples – 2 Speakers Introduction Modeling/Comparing Speakers Application Systems Summary

RRA Examples – 3 Speakers Introduction Modeling/Comparing Speakers Application Systems Summary

Comparison TWO-SPEAKER RESIDUAL THREE-SPEAKER RESIDUAL Introduction Modeling/Comparing Speakers Application Systems Summary Residual Ratio after 2nd round of RRA Residual Ratio after 2nd round of RRA Speaker 2

Speaker Count • Experiments • HTIMIT Database • 1000 statistically generated K-speaker conversations (each) for K=1-4 • Average conversation length = 1min Introduction Modeling/Comparing Speakers Application Systems Summary

Speaker Count • Added Residual Ratio: • Sum of the residual ratios in all elimination stages. • Should be higher for greater number of speakers. Introduction Modeling/Comparing Speakers Application Systems Summary

Speaker Count Introduction Modeling/Comparing Speakers Application Systems Summary

Speaker Counting-Indexing • Models that initially matched the valid reference models are considered to be of the same speaker as the reference models. • Unmatched models are assigned to the reference models from which it has the minimum distance Introduction Modeling/Comparing Speakers Application Systems Summary

Speaker Counting /Indexing Introduction Modeling/Comparing Speakers Application Systems Summary

System Enhancement - Fusion • Distance Measures • Mahalanobis Distance • Hotelling’s T-Square Statistics • Kullback-Leibler Distance • Bhattacharyya Distance • Levene’s Test Introduction Modeling/Comparing Speakers Application Systems Summary

Correlation Analysis Draftsman’s Display - LPCC Introduction Modeling/Comparing Speakers Application Systems Summary

“Best Distance” • Optimized Fusion of Distances • Maximize inter-speaker variation • Minimize intra-speaker variation • Maximize T-test value between inter-class distance distributions Introduction Modeling/Comparing Speakers Application Systems Summary Ti = T-value corresponding to each distance

Decision Level Fusion Introduction Modeling/Comparing Speakers Application Systems Summary D1 => match D2 => no match Match = ¾ No Match = ¼ Final Decision = Match D3 => match D4 => match

Speaker Count Results Introduction Modeling/Comparing Speakers Application Systems Summary

Speaker Counting /Indexing Results Introduction Modeling/Comparing Speakers Application Systems Summary

Summary Introduction Modeling/Comparing Speakers Application Systems Summary

Research Goal To overcome the following challenges faced in differentiating between speakers participating in conversations: • No a priori information • Limited data size • No knowledge of change points • Co-channel speech Introduction Modeling/Comparing Speakers Application Systems Summary

Summary • Novel model formation technique • Conversations-based speaker differentiation systems • Distance combination techniques to enhance performance Introduction Modeling/Comparing Speakers Application Systems Summary

Conclusion A state-of-the-art speaker discrimination system for conversations has been developed which yields results which are comparable to non-conversational systems. Introduction Modeling/Comparing Speakers Application Systems Summary

Publications • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno, “Detection of a Third Speaker in Telephone Conversations”, ICSLP, INTERSPEECH 2006 • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno and S. Wenndt, “Unsupervised Indexing of Noisy conversations with Short Speaker Utterances”, IEEE Aerospace Conference. March, 2007 • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno, “A Simple Approach to Unsupervised Speaker Indexing”, IEEE ISPACS. 2006. • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno, “A Speaker Count System for Telephone Conversations”, IEEE ISPACS. 2006. Uche O. Abanulo Physics, Engineering And Geosciences

Acknowledgment • Dr. Robert Yantorno • Dr. Ananth Iyer • Air Force Research Laboratory, Rome, NY Uche O. Abanulo Physics, Engineering And Geosciences

Uche O. Abanulo Physics, Engineering And Geosciences

Look Who’s Talking Now: The Challenge of Conversational Data