640 likes | 709 Views
2.5.4.1 Basics of Neural Networks. 2.5.4.2 Neural Network Topologies. 2.5.4.2 Neural Network Topologies. 2.5.4.2 Neural Network Topologies. TDNN. 2.5.4.6 Neural Network Structures for Speech Recognition.
E N D
2.5.4.6 Neural Network Structures for Speech Recognition
3.2.2 Implementations of Filter Banks • Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: Where w(n) is a fixed lowpass filter
3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
3.2.2.7 Tree Structure Realizations of Nonuniform Filter Banks
سیگنال زمانی Mel-scaling فریم بندی |FFT|2 Logarithm IDCT Cepstra Low-order coefficients Delta & Delta Delta Cepstra Differentiator روش مل-کپستروم
Time-Frequency analysis • Short-term Fourier Transform • Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. • W(n): windowing function • N: frame length • p: step size
Critical band integration • Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise • Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole
Feature orthogonalization • Spectral values in adjacent frequency channels are highly correlated • The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix • Decorrelation is useful to improve the parameter estimation.