220 likes | 364 Views
Modeling of Mel Frequency Features for Non Stationary Noise. I.Andrianakis P.R.White. Signal Processing and Control Group Institute of Sound and Vibration Research University of Southampton. Outline. Introduction. Mel Frequency Log Spectrum and Cepstrum.
E N D
Modeling of Mel Frequency Features for Non Stationary Noise I.Andrianakis P.R.White Signal Processing and Control Group Institute of Sound and Vibration Research University of Southampton
Outline • Introduction. Mel Frequency Log Spectrum and Cepstrum. • Distribution of the MFLS and MFC coefficients. • Physical Interpretation of the distributions. • Modeling of data with Gaussian Mixture Models and the EM algorithm. • Results. • Summary & Further work.
Introduction When working with speech or noise, often one wishes to extract some salient features of the signals so that instead of working with the whole data set to concentrate on a smaller set that conveys most significant information. Such features are the Mel Frequency Log Spectral and Cepstral Coefficients. Their favourable property is that they focus mostly on low frequency components, where most of the car or train noise energy exists, while compacting the – usually lower energy - higher frequencies. We shall present some results from our research on the application of MFLSCs and MFCCs to noise signals and their modelling with Gaussian Mixture Models.
Mel Frequency Log Spectrum and Cepstrum Mel Frequency Filter Banks Noise STFT |.|2 Log( . ) DCT( . ) Mel Frequency Log Spectrum Mel Frequency Cepstrum
Rationale Behind the Use of Mel Frequency Features Mel frequency warping focuses in low frequencies (<1Khz) where the filter bank spacing is linear. Energy above 1KHz is compacted as the filters have logarithmically increasing pass bands. Suitable for representing ambient noise (i.e. in cars and trains) because the energy is concentrated in the lower frequencies.
Rationale Behind the Use of Mel Frequency Features (II) Filter banks are closely spaced where the signal’s energy is higher.
Comparison With LPC 13 LPC Spectrum 20 Mel Spectrum PSD Train Car Frequency [Hz]
Distribution of the Mel Frequency Coefficients We are concerned with the form of the probability distribution of the Mel Frequency features, that is, the Mel Log Spectrum and the Mel Cepstrum. In the following, we shall present the distribution of MF Log Spectrum Coefficients and MF Cepstral Coefficients for various types of signals. We shall also try to give a physical explanation for the form of the distribution for each case.
‘Stationary’ Noise This is a segment of car noise and its respective spectrogram. The signal looks fairly stationary in its mean and variance, while the spectrogram shows that its frequency components do not vary with time either. We shall proceed now to examine the distribution of its Mel Frequency Features.
Mel Log Spectrum Below we can see the evolution with time of the previous signal’s Mel Log Spectrum, the kurtosis of its coefficients and some characteristic distributions. The coefficients follow almost a Gaussian distribution. 1 5 16 20 Coefficients
Mel Cepstrum This is the evolution with time of the Mel Cepstrum, the kurtosis of its coefficients and some selected distributions. The coefficients are again almost Gaussian. The high kurtosis for 1 and 2 is due to a few outliers. 1 2 12 15 Coefficients
Non-Stationary Noise We shall proceed now to examine how the distributions vary in the case of Non-Stationary noise. This is a segment of train noise, where a number of amplitude fluctuations occurs due to events as changing of rails and other trains passing by.
Mel Log Spectrum The Mel Log Spectrum is now varying with time reflecting the different sound events. The kurtosis is also increasing for higher coefficients. The few first coefficients close to Gaussian but the higher ones develop longer tails. 1 7 11 19 Coefficients
Mel Cepstrum The sound events are now reflected in the first few Cepstrum coefficients. Unlike the Log Spectrum the first coefficients now have longer tails, while the higher tend to Gaussian. 1 2 4 11 Coefficients
Log Spectrum Distribution - Physical Interpretation The lower ML Spectrum coefficients represent the lower frequencies of the spectrum where there is always noise energy present. Thus, they assume constant high values with not many fluctuations that turn them close to Gaussian. Higher coefficients assume high values only temporarily, due to non stationary events. This results in their distributions having longer tails. When energy is present at high frequencies for prolonged periods they can even be bimodal. 1 19 Coefficients
Cepstrum Distribution - Physical Interpretation The lower Cepstrum Coefficients reflect the amplitude and envelope spectral fluctuations. As both of these vary in non stationary signals so do the lower MFCCs resulting in distributions with long tails. Higher coefficients however, convey mostly information about harmonic components, not as dominant in the more broadband like noise of trains and cars and definitely not fast fluctuating. 1 11 Coefficients
Modelling the Data The previous analysis showed that the distribution of Mel Log Spectrum and Mel Cepstrum coefficients deviates from the normal especially in the case of non-stationary noise, which is of most interest. In our attempt to model successfully the coefficients we used Gaussian Mixture Models, which are capable of approximating irregularly shaped distributions. An algorithm that allows us to fit mixtures of Gaussians into our data is the Estimation Maximization algorithm.
The Estimation Maximization Algorithm for Gaussian Mixture Models We assume the probabilistic model: where: We assume a latent random variable that determines the distribution comes from. We then find the expected value of the log likelihood with respect to , given and an initial guess of the parameters That is:
The Estimation Maximization Algorithm for Gaussian Mixture Models (II) This was the Expectation step. In the Maximization step we maximize the expected value with respect to i.e. The two steps are repeated until convergence. For an excellent tutorial of EM see: J. Bilmes, A Gentle Tutorial of the EM Algorithm and its Application fir Gaussian Mixture and Hidden Markov Models
Fitting GMM to the Data Single Gaussian Two Gaussians Here we present some results of fitting GMMs to various distributions. Three Gaussians
Summary • Today we have discussed about: The distribution of the Mel Frequency Log Spectral and Cepstral Coefficients. • The form this assumes in the presence of non-stationary noise providing also a physical explanation. • How it can be modeled with Gaussian Mixture models via the EM algorithm. • And finally showed some results of fitting GMMs into our data.
Further Work • Examine the distribution of Mel Frequency features for noisy speech and see how these are altered by the presence of different noise types. • Construct Optimal Estimators for clean speech Mel features, given the noisy ones and the noise models. • Use HMMs with Gaussian Mixture Models for accommodating the different noise states.