790 likes | 921 Views
Model Formation and Classification Techniques For Conversation-based Speaker Discrimination. Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D. Uchechukwu O. Ofoegbu. Acknowledgement . Advisor: Robert Yantorno, Ph.D
E N D
Model Formation and Classification Techniques For Conversation-based Speaker Discrimination Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D. Uchechukwu O. Ofoegbu
Acknowledgement Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D. My committee members, for your time and commitment to my research The Air Force Research Labs, for financially supporting most of this research work My family, for being there Dr Y, the best advisor one could hope for Members and Friends of the Speech Lab, for your valuable contributions ECE faculty and staff, for your great support The audience, for being a part of this
Presentation Outline • Introduction • Challenges of Conversational Data • General Applications of Research • Novelty of Research • Introduction • Evaluation Databases • Modeling Speakers • Traditional Speaker Modeling • Proposed Method • Features Used • Distance Used • Introduction • Evaluation Databases • Modeling Speakers • Application Systems • Unsupervised Speaker Indexing • Speaker Count • Generalized Speaker Indexing • Introduction • Evaluation Databases • HTIMIT • SWITCHBOARD • New Conversations Database • Introduction • Evaluation Databases • Modeling Speakers • Application Systems • Fusion of Distance Measures • “Optimized T Distance • Decision-Based Combination • Weighted Decision-Based Combination • Introduction • Evaluation Databases • Modeling Speakers • Application Systems • Fusion of Distance Measures • Summary • Introduction • Evaluation Databases • Modeling Speakers • Application Systems • Fusion of Distance Measures • Summary • Further Research Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D.
Introduction Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Challenges of Conversational Data • No a priori information available from participating speakers • Training is impossible • No a priori knowledge of change points • Speakers alternate very rapidly • Limited amounts of data for single speaker representations • Distortion • Channel noise, co-channel data Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Proposed Solutions • Selective creation of data models • Distance-Based Model Comparison • Development of application-specific system Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Novelty of this Research • Selective creation of data models • Distance-Based Model Comparison • Development of application-specific system Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Applications • Monitoring criminal conversations • Forensics • Automated Customer Services • Storage/Search/Retrieval of Audio Data • Military Activities • Conference calls Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Databases • Standard Speaker Discrimination Databases • HTMIT • Switchboard • Temple Conversations Database (TCD) Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Modeling Speakers Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Traditional Speaker Modeling • Examples • Gaussian Mixture Models • Hidden Markov Models • Neural Networks • Prosody-Based Models • Disadvantages • Require large amounts • Sometimes require training procedure • Relatively complex Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Conversational Data Modeling • Current Method • Equal segmentation of data • Indiscriminate use of data • Problems • Change points unknown • Not all speech is useful • Poor performance Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
S V U V U V … U V U V S V . . . V V V V V V MEAN AND COVARIANCE MATRIX COMPUTATION MEAN AND COVARIANCE MATRIX COMPUTATION Proposed Speaker Modeling Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research SEGMENT 1 SEGMENT M FEATURE COMPUTATION FEATURE COMPUTATION . . . MODEL 1 MODEL M
Proposed Speaker Modeling • Why voiced only? • Same speech class compared • Contains the most information • What’s the appropriate number of phonemes? • Large enough to sufficiently represent speakers • Small enough to avoid speaker overlap Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Features Considered • Linear Predictive Cepstral Coefficients • Model the vocal tract • Mel-Scale Frequency Cepstral Coefficients • Model the human auditory system Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Distance Measurements Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Different speaker distances Same speaker distances
Distances Used • Mahalanobis Distance • Hotelling’s T-Square Statistics • Kullback-Leibler Distance • Bhattacharyya Distance • Levene’s Test Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Analysis of Cepstral Features • Mahalanobis Distance Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Best Number of Phonemes? Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Number of Phonemes Features Used - LPCC
Application Systems Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Unsupervised Speaker Indexing • The Restrained-Relative Minimum Distance (RRMD) Approach Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research REFERENCE MODELS 0 D1,2 D1,3 … D2,1 0 D2,3 … D3,1 D3,2 0 … … 0 D1,2 D1,3 … D2,1 0 D2,3 … D3,1D3,2 0 … …
Unsupervised Speaker Indexing • The Restrained-Relative Minimum Distance (RRMD) Approach Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Observe distance Reference 1 Reference 2 Unusable Data Failed Min. Distance Failed Relative Distance Condition Passed Restraining Condition Same Speaker? Same Speaker Passed
RRMD Approach • Restraining Condition • Distance Likelihood Ratio DLR > 1 Same Speaker DLR < 1 Check Relative Distance Condition Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Reference 1 Reference 2 RRMD Approach • Relative Distance Condition • Relative Distance: Drel = dmax – dmin • Drel > threshold Same Speaker Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research dmin dmax
Experiments and Results • Experiments • HTIMIT used for obtaining likelihood ratio parameters • 1000 same speaker and 1000 different speaker utterances computed • 100 conversations from Switchboard database used for evaluation Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Indexing Results - Mahalanobis LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Indexing Results – T-Square LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Indexing Results - Bhattacharyya LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Indexing Results - Summary • Mahalanobis distance yielded best results • LPCCs outperformed MFCCs Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Reference Model Selected Randomly Reference Model Selected Randomly Reference Model Selected Randomly Speaker Count System • The Residual Ratio Algorithm (RRA) • Process is repeated K-1 times for counting up to K speakers Too little data Removed, select Another model Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research DLR-based Model Comparison DLR-based Model Comparison . . .
Speaker Count • Added Residual Ratio: • Is the sum of the residual ratios in all elimination stages • Should be higher for greater number of speakers Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Experiments and Results • Experiments • 4000 conversations generated from HTIMIT • All 40 conversations from new database used Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Speaker Count Results - HTIMIT LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Speaker Count Results - HTIMIT LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Speaker Count Results – TCD LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Speaker Count Results – TCD LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Cross Evaluation Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research HTIMIT – LPCCs with the WDBC TCD – MFCCs with the T-Square
Speaker Counting-Indexing • The Residual Ratio speaker count algorithm is applied • Test models are associated with their matching reference models • Unmatched models are assigned to the references from which it has the minimum distance. Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Speaker Counting /Indexing Results Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Solid - HTMIT; Patterned – TCD
Fusion of Distance Measures Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Correlation Analysis Draftsman’s Display - LPCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
“Best Distance” • Optimal Criteria for Fusion of Distances • Maximize inter-speaker variation • Minimize intra-speaker variation • Maximize T-test value between inter-class distance distributions Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Decision Level Fusion Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research D1 => match D2 => no match Match = ¾ No Match = ¼ Final Decision = Match D3 => match D4 => match
Weighted Decision Level Fusion Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Ti = T-value corresponding to each distance
Summary Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Research Goal • To differentiate between speakers in a conversation • To determine the number of speakers present • To determine who is speaking when • To overcome the following challenges • No a priori information • Limited data size • No knowledge of change points • Co-channel speech Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Summary of Accomplishments • Novel model formation technique • Three novel approaches for conversations-based speaker differentiation • Distance combination techniques to enhance performance Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Observations • Mahalanobis Distance, LPCCs optimal for standard databases • T-Square Distance, MFCCs optimal for new database • Best fusion technique: Weighted voting combination technique most efficient Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Conclusion • Developed system yields about 6% EER whereas state of the art speaker indexing systems yield about 10% error rate. • Methods for discrimination between speakers (speaker count or indexing) in CONVERSATIONS with more than two speakers have been introduced. Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research
Further Research Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research