210 likes | 371 Views
Outline. GoalApproachResultsConclusion. Automatic Speaker Segmentation Goal. Input:Speech signal containing a spoken conversation between an unknown number of peopleSingle Channel No overlap/Simultaneous speakersMinimal Background NoiseOutputFind the Number of Distinct SpeakersIdentify
E N D
1. Pedro Davalos & Hassan Kingravi
May 9, 2007
CPSC 689-604 Speech and Face Recognition Semester Project“Speaker Segmentation”
2. Outline Goal
Approach
Results
Conclusion
3. Automatic Speaker Segmentation Goal Input:
Speech signal containing a spoken conversation between an unknown number of people
Single Channel
No overlap/Simultaneous speakers
Minimal Background Noise
Output
Find the Number of Distinct Speakers
Identify segments (times) where each speaker is talking
4. Approach: Algorithm
5. Algorithm: Pre-Processing
6. LPC Filter (inverse)
7. Features: MFCC’s & F0
8. Speaker Change Detection Find sudden changes in features
For each point in time,
find difference (distance) from current to next window
High Distances represent “possible” speaker change
Treat each segment as a possible speaker
KL Distance:
9. Speaker Modeling & Identification Goal of Finding segments from same speaker
Find Characteristics of each Segment
Gaussian Means for each feature on each segment
Assigning a speaker label to each segment can be achieved through k-means clustering
K-means clustering addresses false
positive transitions
10. Results Summary
11. Results 1 – “News7”
12. Results 2: “news7half”
13. Results 3 – “mtc_se”
14. Results 4 – “Mtc-SE-3”
15. Results 5 – “Mtc_se_3b”
16. Results 6 – “Mtc_se_3d”
17. Results 7 – “npr3a”
18. Results 8 – “npr4c”
19. Results 9 – “npr4d”
20. Results 10 – “npr3g”
21. Conclusions (1/2) Feature Extraction
Accurate speaker dependent features required
Ideal features would have greater variability between speakers than between phonemes
Numerous available speech features did not yield adequate speaker separation
Thresholds / Parameters
Segmentation process involves numerous thresholds
Lpc order and window, pitch estimation, num of mfccs, mfcc window, Distance function, distance window, peak estimation, peak threshold window …
22. Conclusions (2/2) Performance
Segmentation proved successful with ideal conditions
False positive transitions are handled by clustering
Missing true transitions degrade performance by mixing speakers
Future Work
Estimating Number of Speakers (Clustering Optimization)