270 likes | 438 Views
A Linked-HMM for Robust Voicing and Speech Detection. Presented by: Emiliano Miluzzo. why the mic is important as a sensor for a people-centric sensing approach?. In few words…. Linked-HMM for simultaneous and robust voicing and speech detection. In few words….
E N D
A Linked-HMM for Robust Voicing and Speech Detection Presented by: Emiliano Miluzzo
why the mic is important as a sensor for a people-centric sensing approach?
In few words… • Linked-HMM for simultaneous and robust voicing and speech detection
In few words… • Linked-HMM for simultaneous and robust voicing and speech detection • Targeting different experimental settings: low-sampling rates, far-field mic, ambient noise.
In few words… • Linked-HMM for simultaneous and robust voicing and speech detection • Targeting different experimental settings: low-sampling rates, far-field mic, ambient noise. • Features independent of energy.
In few words… • Linked-HMM for simultaneous and robust voicing and speech detection • Targeting different experimental settings: low-sampling rates, far-field mic, ambient noise. • Features independent of energy. • Exploit speech patterns, usually combinations of talking and silence segments.
What’s nice about the paper • The first paper presenting the application of linked-HMM for speech and voice detection.
What’s nice about the paper • The first paper presenting the application of linked-HMM for speech and voice detection. • “simple” algorithm: forward-backward algorithm, features extraction
What’s nice about the paper • The first paper presenting the application of linked-HMM for speech and voice detection. • “simple” algorithm: forward-backward algorithm, features extraction. • Experimental evaluation of some of the aspects of the proposed algorithms.
What’s nice about the paper • The first paper presenting the application of linked-HMM for speech and voice detection. • “simple” algorithm: forward-backward algorithm, features extraction. • Experimental evaluation of some of the aspects of the proposed algorithms. • I learned something useful, namely how to get rid of the impact of constant source contribution (fan, wind blowing, etc.).
How about the cons? • Fairly dense of concepts for a short paper.
How about the cons? • Fairly dense of concepts for a short paper. • Consequently, often lack of clear explanations.
How about the cons? • Fairly dense of concepts for a short paper. • Consequently, often lack of clear explanations. • Generally applicable, to mobile devices such as cell phones for example?
How about the cons? • Fairly dense of concepts for a short paper. • Consequently, often lack of clear explanations. • Generally applicable, to mobile devices such as cell phones for example? • Training with too few different individuals (just 2) – this is a supervised ML method!!
How about the cons? • Fairly dense of concepts for a short paper. • Consequently, often lack of clear explanations. • Generally applicable, to mobile devices such as cell phones for example? • Training with too few different individuals (just 2) – this is a supervised ML method!! • Not clear experimental protocol – what does “noisy conditions” mean?? • Is comparison in Fig. 3 enough to show the improvement over HMM?
Is the noise autocorrelation always effective? • What if the noise is generated by a high energy periodic noisy signal such as a motor?
Is the noise autocorrelation always effective? • What if the noise is generated by a high energy periodic noisy signal such as a motor? • This suggests that the proposed technique might …..
Is the noise autocorrelation always effective? • What if the noise is generated by a high energy periodic noisy signal such as a motor? • This suggests that the proposed technique might work better in indoor environment whereas performs more poorly on mobile devices?
Is the noise autocorrelation always effective? • What if the noise is generated by a high energy periodic noisy signal such as a motor? • This suggests that the proposed technique might work better in indoor environment whereas performs more poorly on mobile devices? • Not clear how variations of one of the features (particularly, noisy autocorrelation) would impact the overall classification result.
Few questions • How does the algorithm differentiate a singer singing a song from an actual conversation?
Few questions • How does the algorithm differentiate a singer singing a song from an actual conversation? • Maybe checking if the spectral content of the voicing part changes over time is an indication of multiple people talking
Few questions • How does the algorithm differentiate a singer singing a song from an actual conversation? • Maybe checking if the spectral content of the voicing part changes over time is an indication of multiple people talking • Does the system distinguish conversations from a pair of speakers A versus the pair of speakers B?
Few questions • How does the algorithm differentiate a singer singing a song from an actual conversation? • Maybe checking if the spectral content of the voicing part changes over time is an indication of multiple people talking • Does the system distinguish conversations from a pair of speakers A versus the pair of speakers B? • Same as above plus knowledge of the device owner voice spectral pattern would help to filter out outliers
Overall • Nice technique that could be applied to a broad set of scenarios, in my opinion mainly where computational resources are available and not many sources of (periodic) noise are present. In these cases the error is small.
Overall • Nice technique that could be applied to a broad set of scenarios, in my opinion mainly where computational resources are available and not many sources of (periodic) noise are present. In these cases the error is small. • Not sure about its applicability to mobile devices for real-time speech detection. Some of the aspects might be re-used though.
Overall • Nice technique that could be applied to a broad set of scenarios, in my opinion mainly where computational resources are available and not many sources of (periodic) noise are present. In these cases the error is small. • Not sure about its applicability to mobile devices for real-time speech detection. Some of the aspects might be re-used though. • Can a mobile-devices oriented scheme tradeoff accuracy versus speed?