440 likes | 451 Views
This presentation explores the challenges of modeling and comparing speakers using conversational data. It discusses traditional speaker modeling methods, proposed solutions, and applications in various domains. The speaker also discusses the features and distances used in speaker recognition systems.
E N D
Look Who’s Talking Now:The Challenge of Conversational Data Uche O. Abanulo Physics, Engineering And Geosciences Uchechukwu Abanulo
Presentation Outline • Introduction • Speaker Recognition • Challenges of Conversational Data • General Applications of Research • Modeling and Comparing Speakers • Traditional Speaker Modeling • Proposed Method • Features Used • Distances Used • Application Systems • Speaker Count • Generalized Speaker Indexing • Enhancement of Results - fusion • Summary Uche O. Abanulo Physics, Engineering And Geosciences
Introduction Introduction Modeling/Comparing Speakers Application Systems Summary
Reference Speech Feature Extraction Model Building Test Speech Feature Extraction Recognition Decision Comparison Speaker Recognition • Speaker Identification • Who is this speaker? • Speaker Verification • Is he who he claims to be? Introduction Modeling/Comparing Speakers Application Systems Summary System Output
Speaker Segmentation • Broadcast News/Conference Data • Conversational Data Introduction Modeling/Comparing Speakers Application Systems Summary
Challenges of Conversational Data • No a priori information available from participating speakers. • Training is impossible • No a priori knowledge of change points • Speakers alternate very rapidly. • Limited amounts of data for single speaker representations • Distortion • Channel noise, co-channel data Introduction Modeling/Comparing Speakers Application Systems Summary
Proposed Solutions • Selective creation of data models • Distance-Based Model Comparison • Development of application-specific system Introduction Modeling/Comparing Speakers Application Systems Summary
Applications • Criminal Activity Detection • Monitoring inmate conversations • Prevention of 3-way calls • Notification of suspicious contacts • Enhancement of keyword detection • Development of speaker databases for uncooperative people • Forensics • Voiceprints Introduction Modeling/Comparing Speakers Application Systems Summary
Applications • Commercial Services • Personalized contact with customers • Storage/Search/Retrieval of Audio Data • Conference calls Introduction Modeling/Comparing Speakers Application Systems Summary
Applications • Military Activities • Pilot-control tower communications • Detection of unidentified speakers on pilot radio channels Introduction Modeling/Comparing Speakers Application Systems Summary
Modeling and Comparing Speakers Introduction Modeling/Comparing Speakers Application Systems Summary
Traditional Speaker Modeling • Examples • Gaussian Mixture Models • Hidden Markov Models • Neural Networks • Prosody-Based Models • Disadvantages • Require large amounts of speech data • Sometimes require training procedure Introduction Modeling/Comparing Speakers Application Systems Summary
Conversational Data Modeling • Current Method • Equal segmentation of data • Indiscriminate use of data • Problems • Change points unknown • Not all speech is useful • Poor performance Introduction Modeling/Comparing Speakers Application Systems Summary
S V U V U V … U V U V S V . . . V V V V V V MEAN AND COVARIANCE MATRIX COMPUTATION MEAN AND COVARIANCE MATRIX COMPUTATION Novel Speaker Modeling Introduction Modeling/Comparing Speakers Application Systems Summary SEGMENT 1 SEGMENT M FEATURE COMPUTATION FEATURE COMPUTATION . . . MODEL 1 MODEL M
Proposed Speaker Modeling • Why voiced only? • Same speech class compared • Contains the most information • What’s the appropriate number of phonemes? • Large enough to sufficiently represent speakers • Small enough to avoid speaker overlap Introduction Modeling/Comparing Speakers Application Systems Summary
Features Considered • Linear Predictive Cepstral Coefficients • Model the vocal tract • Mel-Scale Frequency Cepstral Coefficients • Model the human auditory system Introduction Modeling/Comparing Speakers Application Systems Summary
Cepstral Analysis Frequency Analysis of Speech Excitation Component Vocal Tract Component STFT of Speech Slowly varying formants Fast varying harmonics = X Log of STFT Log of Excitation Log of Vocal Tract Component = + IDFT of Log of STFT Excitation Vocal tract + =
Distance Measurements Introduction Modeling/Comparing Speakers Application Systems Summary Different speaker distances Same speaker distances
Distance Measures • Mahalanobis Distance • Measures the separation between the means of both classes • Hotelling’s T-Square Statistics • Measures the separation between the means of both classes and takes into consideration the data lengths • Kullback-Leibler Distance • Measures the separation between the distribution of both classes • Bhattacharyya Distance • Derived from measuring the classification error between both classes • Levene’s Test • Measures absolute deviation from the center of the class distribution Introduction Modeling/Comparing Speakers Application Systems Summary
Modeling Analysis N = 20 – 4 seconds of voiced speech Introduction Modeling/Comparing Speakers Application Systems Summary
Best Number of Phonemes? Introduction Modeling/Comparing Speakers Application Systems Summary Number of Phonemes Features Used - LPCC
Application Systems Introduction Modeling/Comparing Speakers Application Systems Summary
Reference Model Selected Randomly Reference Model Selected Randomly Reference Model Selected Randomly Speaker Count System • The Residual Ratio Algorithm (RRA) • Process is repeated K-1 times for counting up to K speakers Too little data Removed, select Another model Introduction Modeling/Comparing Speakers Application Systems Summary DLR-based Model Comparison DLR-based Model Comparison . . .
RRA Examples – 2 Speakers Introduction Modeling/Comparing Speakers Application Systems Summary
RRA Examples – 3 Speakers Introduction Modeling/Comparing Speakers Application Systems Summary
Comparison TWO-SPEAKER RESIDUAL THREE-SPEAKER RESIDUAL Introduction Modeling/Comparing Speakers Application Systems Summary Residual Ratio after 2nd round of RRA Residual Ratio after 2nd round of RRA Speaker 2
Speaker Count • Experiments • HTIMIT Database • 1000 statistically generated K-speaker conversations (each) for K=1-4 • Average conversation length = 1min Introduction Modeling/Comparing Speakers Application Systems Summary
Speaker Count • Added Residual Ratio: • Sum of the residual ratios in all elimination stages. • Should be higher for greater number of speakers. Introduction Modeling/Comparing Speakers Application Systems Summary
Speaker Count Introduction Modeling/Comparing Speakers Application Systems Summary
Speaker Counting-Indexing • Models that initially matched the valid reference models are considered to be of the same speaker as the reference models. • Unmatched models are assigned to the reference models from which it has the minimum distance Introduction Modeling/Comparing Speakers Application Systems Summary
Speaker Counting /Indexing Introduction Modeling/Comparing Speakers Application Systems Summary
System Enhancement - Fusion • Distance Measures • Mahalanobis Distance • Hotelling’s T-Square Statistics • Kullback-Leibler Distance • Bhattacharyya Distance • Levene’s Test Introduction Modeling/Comparing Speakers Application Systems Summary
Correlation Analysis Draftsman’s Display - LPCC Introduction Modeling/Comparing Speakers Application Systems Summary
“Best Distance” • Optimized Fusion of Distances • Maximize inter-speaker variation • Minimize intra-speaker variation • Maximize T-test value between inter-class distance distributions Introduction Modeling/Comparing Speakers Application Systems Summary Ti = T-value corresponding to each distance
Decision Level Fusion Introduction Modeling/Comparing Speakers Application Systems Summary D1 => match D2 => no match Match = ¾ No Match = ¼ Final Decision = Match D3 => match D4 => match
Speaker Count Results Introduction Modeling/Comparing Speakers Application Systems Summary
Speaker Counting /Indexing Results Introduction Modeling/Comparing Speakers Application Systems Summary
Summary Introduction Modeling/Comparing Speakers Application Systems Summary
Research Goal To overcome the following challenges faced in differentiating between speakers participating in conversations: • No a priori information • Limited data size • No knowledge of change points • Co-channel speech Introduction Modeling/Comparing Speakers Application Systems Summary
Summary • Novel model formation technique • Conversations-based speaker differentiation systems • Distance combination techniques to enhance performance Introduction Modeling/Comparing Speakers Application Systems Summary
Conclusion A state-of-the-art speaker discrimination system for conversations has been developed which yields results which are comparable to non-conversational systems. Introduction Modeling/Comparing Speakers Application Systems Summary
Publications • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno, “Detection of a Third Speaker in Telephone Conversations”, ICSLP, INTERSPEECH 2006 • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno and S. Wenndt, “Unsupervised Indexing of Noisy conversations with Short Speaker Utterances”, IEEE Aerospace Conference. March, 2007 • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno, “A Simple Approach to Unsupervised Speaker Indexing”, IEEE ISPACS. 2006. • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno, “A Speaker Count System for Telephone Conversations”, IEEE ISPACS. 2006. Uche O. Abanulo Physics, Engineering And Geosciences
Acknowledgment • Dr. Robert Yantorno • Dr. Ananth Iyer • Air Force Research Laboratory, Rome, NY Uche O. Abanulo Physics, Engineering And Geosciences
Uche O. Abanulo Physics, Engineering And Geosciences