Speech and Face Recognition Semester Project Speaker Segmentation

1. Pedro Davalos & Hassan Kingravi May 9, 2007 CPSC 689-604 Speech and Face Recognition Semester Project�Speaker Segmentation�

2. Outline Goal Approach Results Conclusion

3. Automatic Speaker Segmentation Goal Input: Speech signal containing a spoken conversation between an unknown number of people Single Channel No overlap/Simultaneous speakers Minimal Background Noise Output Find the Number of Distinct Speakers Identify segments (times) where each speaker is talking

4. Approach: Algorithm

5. Algorithm: Pre-Processing

6. LPC Filter (inverse)

7. Features: MFCC�s & F0

8. Speaker Change Detection Find sudden changes in features For each point in time, find difference (distance) from current to next window High Distances represent �possible� speaker change Treat each segment as a possible speaker KL Distance:

9. Speaker Modeling & Identification Goal of Finding segments from same speaker Find Characteristics of each Segment Gaussian Means for each feature on each segment Assigning a speaker label to each segment can be achieved through k-means clustering K-means clustering addresses false positive transitions

10. Results Summary

11. Results 1 � �News7�

12. Results 2: �news7half�

13. Results 3 � �mtc_se�

14. Results 4 � �Mtc-SE-3�

15. Results 5 � �Mtc_se_3b�

16. Results 6 � �Mtc_se_3d�

17. Results 7 � �npr3a�

18. Results 8 � �npr4c�

19. Results 9 � �npr4d�

20. Results 10 � �npr3g�

21. Conclusions (1/2) Feature Extraction Accurate speaker dependent features required Ideal features would have greater variability between speakers than between phonemes Numerous available speech features did not yield adequate speaker separation Thresholds / Parameters Segmentation process involves numerous thresholds Lpc order and window, pitch estimation, num of mfccs, mfcc window, Distance function, distance window, peak estimation, peak threshold window �

22. Conclusions (2/2) Performance Segmentation proved successful with ideal conditions False positive transitions are handled by clustering Missing true transitions degrade performance by mixing speakers Future Work Estimating Number of Speakers (Clustering Optimization)

Speech and Face Recognition Semester Project Speaker Segmentation

Speech and Face Recognition Semester Project Speaker Segmentation

Presentation Transcript

Speech Recognition

Speaker Recognition

Face Recognition

Speaker Recognition

Audio-Visual Speech and Speaker Recognition

Speaker Recognition

Speaker recognition Phase 1: Detecting speech

SPEAKER RECOGNITION

Speaker Recognition

Speaker Recognition

Speech recognition

Speaker Recognition

Speech Recognition

Speech Recognition

FACE RECOGNITION

Speaker Recognition

Speaker Recognition

Speaker Recognition

Face Recognition

Speech Recognition

Isolated word, speaker independent speech recognition

Speech Recognition Final Project Resources