130 likes | 274 Views
Speech Processing Using HTK. Trevor Bowden 12/08/2008. Outline. Concept of Project HTK Feature Extraction Capabilities Details of Feature Extraction Script Future Development. Concept of Project. Explore HTK Feature Extraction Capabilities Feature Output Types
E N D
Speech Processing Using HTK Trevor Bowden 12/08/2008
Outline • Concept of Project • HTK Feature Extraction Capabilities • Details of Feature Extraction Script • Future Development
Concept of Project • Explore HTK Feature Extraction Capabilities • Feature Output Types • Additional Feature Parameters • Ideal Solution • Derive Any Feature Type from Any Corpus
HTK Feature Extraction Models • Linear Prediction Analysis • Cepstral Analysis Hamming Window Hamming Window FFT() Log()
HTK Feature Extraction Capabilities • Feature Extraction Methods • Linear Prediction Analysis • Cepstral Analysis • Mel-Scaling • Perceptual Linear Prediction Analysis • Additional Feature Information • Signal Energy • Derivative Information
Linear Prediction Analysis • Vocal Tract Transfer Function • Transfer Function Coefficients Solution • Autocorrelation Matrices • Autocorrelation of Speech • Amplitude of Model
Cepstral Analysis • Logarithmic Spectral Domain (Cepstral Domain) • Allows for Separation of Convolved Signals
Mel-Scaling • Perception of sound by the human mind is non-linear in that the mind perceives a non-linear scale of pitches to be equally spaced in the frequency domain.
Perceptual Linear Prediction Analysis • Perceptual linear prediction is a combination of both linear prediction and Cepstral analysis. • The spectrum of the speech data is first converted using the Mel scale. • The data is then cubed and linear prediction coefficients are computed. • From these coefficients Cepstral analysis is performed.
Signal Energy and Derivatives • Signal Energy • Delta Coefficients • Acceleration Coefficients • Third Differential Coefficients
Speech Processing of the AMI Corpus • Ideal Solution Yields Generic Feature Types from Generic Corpus • Corpora Have Varying Audio File Types and Varying Organizational Structures • Corpora Have Varying Methods for Annotation
Speech Processing of the AMI Corpus • Project Solution Yields Generic Feature Types from Corpora with Riff Format WAV Audio Files • Two Main Functions of Script • Traverse Corpus Directory Tree • Generate List of Audio Files • Produce Feature Data • Using User-Defined Configuration File
Future Development • Expand Script to Handle Audio Inputs of Any File Type • Include Processing for Specific Corpus Annotations