750 likes | 758 Views
This dissertation explores discrimination challenges with conversational data including distance analysis, feature extraction, application systems, and proposed solutions. It addresses problems and goals related to conversation segmentation, speaker recognition, and criminal activity detection, proposing solutions to overcome limitations.
E N D
Speaker Discrimination:The Challenge of Conversational Data Dissertation Committee Advisor: Robert Yantorno, Ph.D Members: Dennis Silage, Ph.D. Brian Butz, Ph.D. Iyad Obeid, Ph.D. Eugene Kwatny, Ph.d Uchechukwu O. Ofoegbu
Presentation Outline • Problem Statement and Research Goal • Scope of Research • Distance Analysis • Feature Analysis • Data Analysis • Application Systems • Fusion of Distances • Proposal Summary Dissertation Committee Advisor: Robert Yantorno, Ph.D Members: Dennis Silage, Ph.D. Brian Butz, Ph.D. Iyad Obeid, Ph.D. Eugene Kwatny, Ph.d
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Problem Statement and Research Goal
Reference Speech Problem Statement and Research Goal Scope of Research Feature Extraction Distance Analysis Feature Analysis Data Analysis Model Building Application Systems Fusion of Distances Test Speech Feature Extraction Recognition Decision Comparison Proposal Summary Conventional Speaker Recognition • Speaker Identification • Who is this speaker? • Speaker Verification • Is he who he claims to be? System Output
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Conversation Segmentation • Broadcast News/Conference Data • Conversational Data
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Problems with Conversational Data • No a priori information available from participating speakers. • Training is impossible • No a priori knowledge of change points • Speakers alternate very rapidly. • Limited amounts of data for single speaker representations • Distortion • Channel noise, co-channel data
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Proposed Solutions • Selective creation of data models • Development of an “optimal” distance measure • Decision level fusion of distance measures • Development of application-specific system
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Scope of Research
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Criminal Activity Detection • Monitoring inmate conversations • Prevention of 3-way calls • Notification of suspicious contacts • Enhancement of keyword detection • Uncooperative data collection • Forensics • Voiceprints
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Commercial Services • Automated Customer Services • Personalized contact with customers • Search/Retrieval of Audio Data
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Homeland Security • Military Activities • Pilot-control tower communications • Detection of unidentified speakers on pilot radio channels • Terrorist Identification
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Distance Analysis
Problem Statement and Research Goal Difference between means Scope of Research Distance Analysis Standard Deviation Feature Analysis Standard Deviation Data Analysis Application Systems Fusion of Distances Proposal Summary Distance Measures • Univariate vs. Multivariate Analysis
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Distance Measures • Notations • Random variables being compared: X = [X1, X2, …, Xp]: nx by p matrix Y = [Y1, Y2, …, Yp]: ny by p matrix • Properties • Q(X, Y) ≥ 0, • Q(X, Y) = 0 iff X = Y, • Q(X, Y) = Q(Y, X), • Q(X, Y) ≤ Q(X, Z) + Q(Z,Y)
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Distance Measures • Mahalanobis Distance QMAHANALOBIS(X,Y) = (μx – μy)T Σ-1 (μx – μy) Σ = combined covariance matrix of X and Y • Hotelling’s T-Square Statistics Cik = ith row and kth column of the inverse of C
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Distance Measures • Kullback-Leibler (KL) Distance • Bhattacharya Distance
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Distance Measures • Levene’s Test • Derived from T-Square statistics as follows: • Each set of points is transformed along each vector into absolute divergence from the mean vector • The T-Square Statistic is then applied on the transformed features.
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis 10 utterances from Speaker A Utterance 1 Window Data Compute 14th Order LPCC Data Analysis Utterance 2 Compute Distance Application Systems Window Data Compute 14th Order LPCC Fusion of Distances Proposal Summary Procedural Set-up • HTIMIT database used • Average Utterance Length = 5 seconds Intra-speaker distance computations Randomly Select 2 Utterances
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Speaker A Speaker B Window Data Compute 14th Order LPCC Data Analysis Compute Distance Application Systems Window Data Compute 14th Order LPCC Fusion of Distances Proposal Summary Procedural Set-up Inter-speaker, different utterances distance computations Randomly Select Utterance Randomly Select Utterance
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Analysis of Distance Measures • Mahalanobis Distance – Gaussian Estimate
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Analysis of Distance Measures • Levene’s Test – Gamma Estimate
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Feature Analysis
Cepstral Analysis Frequency Analysis of Speech Excitation Component Vocal Tract Component STFT of Speech Slowly varying formants Fast varying harmonics = X Log of STFT Log of Excitation Log of Vocal Tract Component = + IDFT of Log of STFT Excitation Vocal tract + =
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Cepstral Features • Linear Predictive Cepstral Coefficients • Obtained Recursively from LPC Coefficients • Mel-Scale Frequency Cepstral Coefficients • Nonlinear warping of frequency axis to model the human auditory system
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Cepstral Features • Delta Cepstral Coefficients • First and Second derivatives of cepstral coefficients • Reflects dynamic information • Used as supplement to original cepstral features
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Analysis of Cepstral Features • Mahalanobis Distance
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Analysis of Cepstral Features • Levene’s Test
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Feature Combination • Proposed Investigation • What’s the best feature combination? • Will the delta and delta-delta coefficients contribute to the speaker differentiating ability of the features.
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Feature Combination Analysis • T-test Based Evaluation • Why? • Robust to the Gaussian distribution especially for amounts of data sizes and when the two samples to be compared have approximately equal values. • Unaffected by differences in the variances of the compared variables
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Data Analysis
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Traditional Speaker Modeling • Examples • Gaussian Mixture Models • Hidden Markov Models • Neural Networks • Prosody-Based Models • Disadvantages • Require large amounts • Sometimes require training procedure • Relatively complex
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Conversational Data Modeling • Current Method • Equal Segmentation of Data • Indiscriminate use of data • Poor performance • Problems • Change points unknown • Not all speech is useful
S V U V U V … U V U V S V Problem Statement and Research Goal . . . V V V Scope of Research V V V Distance Analysis Feature Analysis Data Analysis Application Systems MEAN AND COVARIANCE MATRIX COMPUTATION MEAN AND COVARIANCE MATRIX COMPUTATION Fusion of Distances Proposal Summary Proposed Speaker Modeling SEGMENT 1 SEGMENT M FEATURE COMPUTATION FEATURE COMPUTATION . . . MODEL 1 MODEL M
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Proposed Speaker Modeling • Why voiced only • Same speech class compared • Contains the most information • What’s the appropriate number of phonemes • Large enough to sufficiently represent speakers • Small enough to avoid speaker overlap
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Modeling Analysis N = 20 – 4 seconds of voiced speech
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Modeling Analysis
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Modeling Analysis N = 5 – 1 second of voiced speech
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Applications Systems
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Unsupervised Speaker Indexing • The Restrained-Relative Minimum Distance (RRMD) Approach REFERENCE MODELS 0 D1,2 D1,3 … D2,1 0 D2,3 … D3,1 D3,2 0 … … 0 D1,2 D1,3 … D2,1 0 D2,3 … D3,1D3,2 0 … …
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Unsupervised Speaker Indexing • The Restrained-Relative Minimum Distance (RRMD) Approach Observe distance Reference 1 Reference 2 Unusable Data Failed Min. Distance Failed Relative Distance Condition Passed Restraining Condition Same Speaker? Same Speaker Passed
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary RRMD Approach • Restraining Condition • Distance Likelihood Ratio DLR > 1 Same Speaker DLR < 1 Check Relative Distance Condition
Problem Statement and Research Goal Scope of Research Distance Analysis Reference 1 Reference 2 Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary RRMD Approach • Relative Distance Condition • Relative Distance: Drel = dmax – dmin • Drel > threshold Same Speaker dmin dmax
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Preliminary Results • Experiments • 245 telephone conversations from the SWITCHBOARD database, with an average length of 400 seconds. • T-Square statistics used • Ground truth obtained from Mississippi State Transcriptions
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Preliminary Results • Best N Estimation N = 5
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Preliminary Results • RRMD Experiments • Drel Varied from 0-200 • Two Errors Defined • Indexing Error Ierr = 100 – Accuracy, • Undecided Error Nu = number of detected undecided/unusable samples, Nc = number labeled as co-channel data ‘undecided error’ :
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Preliminary Results
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Reference Model Selected Randomly Reference Model Selected Randomly Reference Model Selected Randomly Data Analysis Application Systems Fusion of Distances Proposal Summary Speaker Count System • The Residual Ratio Algorithm (RRA) • Process is repeated K-1 times for counting up to K speakers Too little data Removed, select Another model DLR-based Model Comparison DLR-based Model Comparison . . .
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary RRA Examples – 2 Speakers
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary RRA Examples – 3 Speakers
Problem Statement and Research Goal Scope of Research Distance Analysis Feature Analysis Data Analysis Application Systems Fusion of Distances Proposal Summary Comparison TWO-SPEAKER RESIDUAL THREE-SPEAKER RESIDUAL Residual Ratio after 2nd round of RRA Residual Ratio after 2nd round of RRA Speaker 2