Wen-Yi Chu Department of Computer Science & Information Engineering

DCT-based Processing of Dynamic Features for Robust Speech RecognitionWen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science & Information Engineering National Taiwan Normal University

Outline • Introduction • Discrete Cosine Transform • The New DTC-related Temporal Filtering Techniques • The Recognition Experimental Results and Discussions • Conclusion and Future Works

Introduction • In this paper, we explore the various properties of cepstral time coefficients (CTC) in speech recognition, and then propose several methods to refine the CTC construction process. • It is found that CTC are the filtered version of mel-frequency cepstral coefficients (MFCC), and the used filters are from the discrete cosine transform (DCT) matrix. We modify these DCT-based filters by windowing, removing DC gain, and varying the filter length.

Discrete Cosine Transform • For a real-valued N-point sequence , its DFT and DCT are obtained by the following two equations, respectively:

The New DTC-related Temporal Filtering Techniques(1/5) • For an MFCC feature sequence , the second and third CTC features from are represented as:

The New DTC-related Temporal Filtering Techniques(2/5) • The first four rows for an N ×N DCT matrix (N = 15 here) are shown in Fig. 2. • The second and third CTC features in eq. (5) can be viewed as the filtered version of MFCC features, in which the two temporal filters, and in eqs. (8) and (9), are used, respectively. Figs. 3 and 4 show the frequency responses of the two DCT filters.

The New DTC-related Temporal Filtering Techniques(3/5) • The two filters and used in deriving CTC possibly have two problems: 1) The relatively high side-lobes make and emphasize the undesired non-speech components. 2) The inappropriate passband location and width of and possibly make them filter out some speech components. • We try to use several well-known window functions, including Hamming, Hanning, Blackman and rectangular windows. Note that the rectangular window used here is: • By multiplying either window function with each of the two original DCT-based filters, we create two new filters as

The New DTC-related Temporal Filtering Techniques(4/5) • We find that the new becomes a low-pass filter, and thus it will retain the DC component of a feature stream which often contains the channel mismatch and possibly degrades the recognition performance. • We convolve with a simple high-pass filter as follows:

The New DTC-related Temporal Filtering Techniques(5/5) • In order to further tune the main-lobe (passband) width, here we propose to vary the filter length N in , and in eqs. (11), (12) and (13).

The Recognition Experimental Results and Discussions(1/3) • The experiment results of MFCC and CTC • Using CTC is more helpful in handling the nonstationary noise cases (Set B) possibly because the DCT-based filters attenuate the higher modulation frequency components caused by non-stationary noise.

The Recognition Experimental Results and Discussions(2/3) • The Experiment Results of the Proposed New DCT-based Filtering Approach

The Recognition Experimental Results and Discussions(3/3) • The experimental results for these windowed filters with different filter length (N=9, 11, 13, 15, and 17)

Conclusion and Future Works • We find that some problems exist in the original DCT-based filters, including the high side-lobes and inappropriate passband locations in the frequency response. • Then we present several directions to solve or alleviate the above problems, including “windowing the filter coefficients”, ”removing DC gain” and “varying the filter length”. • In the future, we will work along the following directions: 1) Creating adaptive DCT-based filters 2) Combining the various DCT-based filter outputs linearly or nonlinearly

Wen-Yi Chu Department of Computer Science & Information Engineering