30 likes | 228 Views
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING. Telephone calling. weak classifier. …. Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System Engineering, Kobe University. Research purpose. Background.
E N D
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Telephone calling weak classifier …. Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System Engineering, Kobe University Research purpose Background • Sudden and short-period noises often affect speech recognition system in real environments. • Noise reduction improve speech recognition system. • It is difficult to remove sudden and short-period noises because we do not know where the noise overlapped and what noise was. Detecting and Classifying Sudden and Short-Period Noises Purpose • If speech recognition system can detect sudden noises, it will make it possible for the system to ask the speaker to repeat the same utterance. • If it can be determined what and where noise is overlapped, these information will be useful for noise reduction or model composition. • We use the AdaBoost for noise detection and classification because it can make complex boundary. System overview AdaBoost • AdaBoost is one of method of boosting. • AdaBoost decides the weak classifiers and their weights. Well, I believe that you will ・・・・ clatter Feature extraction blue Changing data weight Wrong data weight is bigger True data weight is smaller ・・・・・ red Noise detection using AdaBoost Algorithm blue red Noisy speech overlapped by sudden noises Clean speech combine Weight weak classifier based on performance of it where, weak classifier is one-dimension linear classifier. Noise classification using AdaBoost strong classifier Smoothing Final results Classifier’s weight Weak classifier Noise detection using AdaBoost Multi-class classification using AdaBoost • We perform multi-class classification using AdaBoost in order to determine noise classes. • It is necessary to extend AdaBoost to classify multi-class Learning We labeled learning data {-1,+1}, 1 means noisy speech data label, -1 means clean speech data label. Multiple two-class classifiers are created, which distinguish one class and other classes. The class of the largest value is selected from the output values. AdaBoost makes strong classifier between clean speech frames and noisy speech frames using these data. Feature vector Label of class 1 AdaBoost class1 or other class AdaBoost class K or other class Detection AdaBoost determines this frame overlapped by noise or clean speech. ・・・ Label of class 2 …. Changing η of this equation, we adjust the number of positive errors and negative errors. Feature vector Label of class 3 AdaBoost Find a maximum value in each outputs Clean speech Noise overlapped
The frame to be noisy in detection approach Noise class1 or Other class Noise class2 or Other class Noise class K or Other class … Noise class k [SNR of -5 dB] 1.00 0.989 0.989 0.974 0.973 0.973 0.973 0.974 0.972 0.95 0.937 0.933 0.90 0.85 0.80 0.75 0.70 [SNR of 0 dB] 0.989 1.00 0.973 0.973 0.965 0.962 0.958 0.951 0.950 0.95 0.914 0.896 0.90 0.85 0.80 0.75 0.70 [SNR of 5 dB] 1.00 0.973 0.950 0.947 0.949 0.932 0.95 0.923 0.915 0.900 0.90 0.842 0.85 0.804 0.80 0.75 0.70 Classification Smoothing • Noises are separated to some classes in advance. • Classifiers are learned by AdaBoost to classify these classes. Learning • A signal interval detected by AdaBoost may result in only a few frames noise1 noise1 noise2 classification • Classifiers decide the class of noisy speech frame. Classification are applied to only the frames which are determined as noisy in detection. noise1 • These frames are removed by smoothing. • We use majority voting for smoothing. • When carrying out the smoothing of one frame, the prior three and subsequent three frames are also consideration. : i-th frame’s classification output. Comparative approach Detection • We use log likelihood ratio of GMMs. • It is the popular method for VAD (voice activity detection ) η of this equation adjust the number of positive error and negative error. Classification • We find a class which has a maximum likelihood from noisy speech GMMs. Experiments Experimental condition • Speech data 16kHz • training:210 utterances of 21 men • Testing:2104 utterances of 5 men • Noise data • 6 kinds of noise: “spray,“ " telephone,” ”tearing paper,” “pouring of a granular substance,” “bell-ringing,” “horn” • Window size 20msec Hamming window every 10-msec • Feature: 24-order log-Mel filter bank and 12-order MFCC • The number of weak classifier of AdaBoost: 500 • SNR of learning data : -5 dB ~ 5 dB • Criteria of evaluation • These have each 50 source. 20 data for training, 30 data for testing. Experimental results Future work Summary • We proposed the sudden noise detection and classification with Boosting. • Detection and classification have high performance in low SNR. • The performance using AdaBoost is better than GMM-based method. • We will detect more kinds of noises combining this method with clustering method as k-means. • We will combine noise detection and classification with noise reduction method.