1 / 44

Look Who’s Talking Now: The Challenge of Conversational Data

This presentation explores the challenges of modeling and comparing speakers using conversational data. It discusses traditional speaker modeling methods, proposed solutions, and applications in various domains. The speaker also discusses the features and distances used in speaker recognition systems.

eliasbrown
Download Presentation

Look Who’s Talking Now: The Challenge of Conversational Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Look Who’s Talking Now:The Challenge of Conversational Data Uche O. Abanulo Physics, Engineering And Geosciences Uchechukwu Abanulo

  2. Presentation Outline • Introduction • Speaker Recognition • Challenges of Conversational Data • General Applications of Research • Modeling and Comparing Speakers • Traditional Speaker Modeling • Proposed Method • Features Used • Distances Used • Application Systems • Speaker Count • Generalized Speaker Indexing • Enhancement of Results - fusion • Summary Uche O. Abanulo Physics, Engineering And Geosciences

  3. Introduction Introduction Modeling/Comparing Speakers Application Systems Summary

  4. Reference Speech Feature Extraction Model Building Test Speech Feature Extraction Recognition Decision Comparison Speaker Recognition • Speaker Identification • Who is this speaker? • Speaker Verification • Is he who he claims to be? Introduction Modeling/Comparing Speakers Application Systems Summary System Output

  5. Speaker Segmentation • Broadcast News/Conference Data • Conversational Data Introduction Modeling/Comparing Speakers Application Systems Summary

  6. Challenges of Conversational Data • No a priori information available from participating speakers. • Training is impossible • No a priori knowledge of change points • Speakers alternate very rapidly. • Limited amounts of data for single speaker representations • Distortion • Channel noise, co-channel data Introduction Modeling/Comparing Speakers Application Systems Summary

  7. Proposed Solutions • Selective creation of data models • Distance-Based Model Comparison • Development of application-specific system Introduction Modeling/Comparing Speakers Application Systems Summary

  8. Applications • Criminal Activity Detection • Monitoring inmate conversations • Prevention of 3-way calls • Notification of suspicious contacts • Enhancement of keyword detection • Development of speaker databases for uncooperative people • Forensics • Voiceprints Introduction Modeling/Comparing Speakers Application Systems Summary

  9. Applications • Commercial Services • Personalized contact with customers • Storage/Search/Retrieval of Audio Data • Conference calls Introduction Modeling/Comparing Speakers Application Systems Summary

  10. Applications • Military Activities • Pilot-control tower communications • Detection of unidentified speakers on pilot radio channels Introduction Modeling/Comparing Speakers Application Systems Summary

  11. Modeling and Comparing Speakers Introduction Modeling/Comparing Speakers Application Systems Summary

  12. Traditional Speaker Modeling • Examples • Gaussian Mixture Models • Hidden Markov Models • Neural Networks • Prosody-Based Models • Disadvantages • Require large amounts of speech data • Sometimes require training procedure Introduction Modeling/Comparing Speakers Application Systems Summary

  13. Conversational Data Modeling • Current Method • Equal segmentation of data • Indiscriminate use of data • Problems • Change points unknown • Not all speech is useful • Poor performance Introduction Modeling/Comparing Speakers Application Systems Summary

  14. S V U V U V … U V U V S V . . . V V V V V V MEAN AND COVARIANCE MATRIX COMPUTATION MEAN AND COVARIANCE MATRIX COMPUTATION Novel Speaker Modeling Introduction Modeling/Comparing Speakers Application Systems Summary SEGMENT 1 SEGMENT M FEATURE COMPUTATION FEATURE COMPUTATION . . . MODEL 1 MODEL M

  15. Proposed Speaker Modeling • Why voiced only? • Same speech class compared • Contains the most information • What’s the appropriate number of phonemes? • Large enough to sufficiently represent speakers • Small enough to avoid speaker overlap Introduction Modeling/Comparing Speakers Application Systems Summary

  16. Features Considered • Linear Predictive Cepstral Coefficients • Model the vocal tract • Mel-Scale Frequency Cepstral Coefficients • Model the human auditory system Introduction Modeling/Comparing Speakers Application Systems Summary

  17. Cepstral Analysis Frequency Analysis of Speech Excitation Component Vocal Tract Component STFT of Speech Slowly varying formants Fast varying harmonics = X Log of STFT Log of Excitation Log of Vocal Tract Component = + IDFT of Log of STFT Excitation Vocal tract + =

  18. Distance Measurements Introduction Modeling/Comparing Speakers Application Systems Summary Different speaker distances Same speaker distances

  19. Distance Measures • Mahalanobis Distance • Measures the separation between the means of both classes • Hotelling’s T-Square Statistics • Measures the separation between the means of both classes and takes into consideration the data lengths • Kullback-Leibler Distance • Measures the separation between the distribution of both classes • Bhattacharyya Distance • Derived from measuring the classification error between both classes • Levene’s Test • Measures absolute deviation from the center of the class distribution Introduction Modeling/Comparing Speakers Application Systems Summary

  20. Modeling Analysis N = 20 – 4 seconds of voiced speech Introduction Modeling/Comparing Speakers Application Systems Summary

  21. Best Number of Phonemes? Introduction Modeling/Comparing Speakers Application Systems Summary Number of Phonemes Features Used - LPCC

  22. Application Systems Introduction Modeling/Comparing Speakers Application Systems Summary

  23. Reference Model Selected Randomly Reference Model Selected Randomly Reference Model Selected Randomly Speaker Count System • The Residual Ratio Algorithm (RRA) • Process is repeated K-1 times for counting up to K speakers Too little data Removed, select Another model Introduction Modeling/Comparing Speakers Application Systems Summary DLR-based Model Comparison DLR-based Model Comparison . . .

  24. RRA Examples – 2 Speakers Introduction Modeling/Comparing Speakers Application Systems Summary

  25. RRA Examples – 3 Speakers Introduction Modeling/Comparing Speakers Application Systems Summary

  26. Comparison TWO-SPEAKER RESIDUAL THREE-SPEAKER RESIDUAL Introduction Modeling/Comparing Speakers Application Systems Summary Residual Ratio after 2nd round of RRA Residual Ratio after 2nd round of RRA Speaker 2

  27. Speaker Count • Experiments • HTIMIT Database • 1000 statistically generated K-speaker conversations (each) for K=1-4 • Average conversation length = 1min Introduction Modeling/Comparing Speakers Application Systems Summary

  28. Speaker Count • Added Residual Ratio: • Sum of the residual ratios in all elimination stages. • Should be higher for greater number of speakers. Introduction Modeling/Comparing Speakers Application Systems Summary

  29. Speaker Count Introduction Modeling/Comparing Speakers Application Systems Summary

  30. Speaker Counting-Indexing • Models that initially matched the valid reference models are considered to be of the same speaker as the reference models. • Unmatched models are assigned to the reference models from which it has the minimum distance Introduction Modeling/Comparing Speakers Application Systems Summary

  31. Speaker Counting /Indexing Introduction Modeling/Comparing Speakers Application Systems Summary

  32. System Enhancement - Fusion • Distance Measures • Mahalanobis Distance • Hotelling’s T-Square Statistics • Kullback-Leibler Distance • Bhattacharyya Distance • Levene’s Test Introduction Modeling/Comparing Speakers Application Systems Summary

  33. Correlation Analysis Draftsman’s Display - LPCC Introduction Modeling/Comparing Speakers Application Systems Summary

  34. “Best Distance” • Optimized Fusion of Distances • Maximize inter-speaker variation • Minimize intra-speaker variation • Maximize T-test value between inter-class distance distributions Introduction Modeling/Comparing Speakers Application Systems Summary Ti = T-value corresponding to each distance

  35. Decision Level Fusion Introduction Modeling/Comparing Speakers Application Systems Summary D1 => match D2 => no match Match = ¾ No Match = ¼ Final Decision = Match D3 => match D4 => match

  36. Speaker Count Results Introduction Modeling/Comparing Speakers Application Systems Summary

  37. Speaker Counting /Indexing Results Introduction Modeling/Comparing Speakers Application Systems Summary

  38. Summary Introduction Modeling/Comparing Speakers Application Systems Summary

  39. Research Goal To overcome the following challenges faced in differentiating between speakers participating in conversations: • No a priori information • Limited data size • No knowledge of change points • Co-channel speech Introduction Modeling/Comparing Speakers Application Systems Summary

  40. Summary • Novel model formation technique • Conversations-based speaker differentiation systems • Distance combination techniques to enhance performance Introduction Modeling/Comparing Speakers Application Systems Summary

  41. Conclusion A state-of-the-art speaker discrimination system for conversations has been developed which yields results which are comparable to non-conversational systems. Introduction Modeling/Comparing Speakers Application Systems Summary

  42. Publications • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno, “Detection of a Third Speaker in Telephone Conversations”, ICSLP, INTERSPEECH 2006 • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno and S. Wenndt, “Unsupervised Indexing of Noisy conversations with Short Speaker Utterances”, IEEE Aerospace Conference. March, 2007 • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno, “A Simple Approach to Unsupervised Speaker Indexing”, IEEE ISPACS. 2006. • U. Ofoegbu (now Abanulo), A. Iyer, R. Yantorno, “A Speaker Count System for Telephone Conversations”, IEEE ISPACS. 2006. Uche O. Abanulo Physics, Engineering And Geosciences

  43. Acknowledgment • Dr. Robert Yantorno • Dr. Ananth Iyer • Air Force Research Laboratory, Rome, NY Uche O. Abanulo Physics, Engineering And Geosciences

  44. Uche O. Abanulo Physics, Engineering And Geosciences

More Related