100 likes | 234 Views
Compensating speaker-to-microphone playback system for robust speech recognition. So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology. Distorted speech. Clean speech.
E N D
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology
Distorted speech Clean speech Channel Additive noise Motivation • ASR in mismatched environments • Environmental information • Background noise, acoustic/transmission channel • Assume environment degradation model
Channel Impacts on feature Channel Assumption 1 • P.S • F.B. • L.S. • C.S. Channel Assumption 2
Speaker-to-Microphone compensation • Speaker-to-Microphone playback • Speaker distortion • Nonlinearity caused by voice coil • Microphone distortion • Frequency response caused by different fabrication • Nonlinearity caused by dynamic range • Ambient noise by directionality
F.E. Mapper distorted Speaker-to-Microphone mapping • Mapper train • Where and which type of mapper should be deployed? • Mapper apply Error F.E. clean + F.E. Trained Mapper distorted To recognizer
Mapping error at L.S. • Diamond, plus, cross denotes PS,FB.LS level
Recognition Experiments • Task • Phoneme recognition for 40 TIMIT phone sets • Phone accuracy = (N-D-S-I) * 100 /N • Database • HTIMIT : re-recording TIMIT sentence thru. 10 various telephone handsets • Training : 246 speaker * 8 sent. = 1968sent. • Test : 48 speaker * 8 = 384 sent. • Baseline • 3-state monophone HMM with 16 gaussian mixture
Conclusion • Speech signal distorted by low-quality speaker-to-microphone playback system can be compensated with feature mapping network • Feature mapping scheme would be useful in cases that environmental condition is tough for collecting database