1 / 26

Look who’s talking? Project 3.1

Look who’s talking? Project 3.1. Yannick Thimister Han van Venrooij Bob Verlinden . Project 3.1 27-01-2011 DKE Maastricht University. Contents. Speaker recognition S peech samples Voice activity d etection Feature extraction Speaker recognition Multi speaker recognition

matt
Download Presentation

Look who’s talking? Project 3.1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Look who’s talking?Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project 3.1 27-01-2011 DKE Maastricht University

  2. Contents Project 3.1 DKE - Maastricht University Speaker recognition Speech samples Voice activity detection Feature extraction Speaker recognition Multi speaker recognition Experiments and results Discussion Conclusion

  3. Speaker Recognition Project 3.1 DKE - Maastricht University • Speech containsseverallayers of info • Spoken words • Speaker identity • Speaker-related differences are a combination of anatomical differences and learned speaking habits

  4. Speech samples Project 3.1 DKE - Maastricht University • Self recorded database • 55 sentences from 11 different people • 2x2 predefined and 1 random • Pro recording and build-in laptop microphone • Database via Voxforge.org • 610 sentences from 61 different people • Varying recording microphones and environments

  5. Voice activity detection Adaptive noise estimation Project 3.1 DKE - Maastricht University Power-based Entropy-based Long term spectral divergence Frames Initial frames are noise Hangover

  6. Voice activity detection Project 3.1 DKE - Maastricht University • Power-based • Assumes that the noise is normally distributed • Calculate mean, standard deviation • For each sample n • Calculate • For each frame j • The majority of the samples

  7. Voice activity detection Project 3.1 DKE - Maastricht University • Entropy-based • Scale DFT coefficients • Entropy equals

  8. Voice activity detection Project 3.1 DKE - Maastricht University • Long term spectral divergence • L-frame window • Estimation • Divergence

  9. Voice activity detection Project 3.1 DKE - Maastricht University • Long term spectral divergence • Estimate the noise spectrum • Averages of the DFT coefficients • Calculate mean (μ) LTSD of noise frames • For each frame f • Calculate the LTSD > c μ • Update

  10. Feature extraction Project 3.1 DKE - Maastricht University • Representation of speakers • Mel frequency cepstralcoefficients • Imitateshuman hearing • Linear predictive coding • Linearfunction of previous samples

  11. MFCC Project 3.1 DKE - Maastricht University Hamming window FFT Mel-scale Log FFT

  12. LPC Project 3.1 DKE - Maastricht University Pth order linearfunctionestimated

  13. Speaker recognition Project 3.1 DKE - Maastricht University • NearestNeighbor • Euclideandistance • NeuralNetwork • Multilayerperceptron

  14. Nearestneighbor Project 3.1 DKE - Maastricht University Features comparedpairwise

  15. Neuralnetwork Project 3.1 DKE - Maastricht University

  16. Multi speaker recognition Project 3.1 DKE - Maastricht University Preprocessing using VAD Consecutive speech frames Single speaker recognition per segment

  17. Experiments VAD Project 3.1 DKE - Maastricht University Hand labeled samples Percentage of correct classified False Negatives

  18. Results VAD Project 3.1 DKE - Maastricht University • Entropy-based • Correctly classified: 65,3% • False negatives: 9,3% • Power-based • Correctly classified: 76,3% • False negatives:6,2% • Long term spectral divergence • Correctly classified: 79,0% • False negatives: 1,6%

  19. Experiments Feature extraction Project 3.1 DKE - Maastricht University • Nr. of coefficients • MFCC • Optimal: 10 • 90.9% • LPC • Optimal: 8 • 77.3%

  20. Experiments single speaker recognition Project 3.1 DKE - Maastricht University • Professional vs. Build-in laptop microphone • Silenceremoval

  21. Experimentsneuralnetwork Project 3.1 DKE - Maastricht University Optimalnumber of nodes Selfrecorded database: 25 nodes Voxforge database: 100 nodes

  22. Experimentsneuralnetwork Project 3.1 DKE - Maastricht University Cycles

  23. Experimentsmulti speaker recognition Project 3.1 DKE - Maastricht University Self-made samples Optimal settings used Neuralnetwork: 66.7% Nearestneighbor: 76.5%

  24. Discussion Project 3.1 DKE - Maastricht University Nearestneighborbetterthanneuralnetwork? Neuralnetworkbetterapplicable VAD givesnoimprovement

  25. Conclusions Project 3.1 DKE - Maastricht University LTSD is the best VAD method MFCC outperforms LPC Training and testingwith different microphonesgives significant lessaccuracy Nearestneighborworksbetterthananoptimizedneuralnetwork

  26. Questions? Project 3.1 DKE - Maastricht University

More Related