1 / 13

Speech Processing Using HTK

Speech Processing Using HTK. Trevor Bowden 12/08/2008. Outline. Concept of Project HTK Feature Extraction Capabilities Details of Feature Extraction Script Future Development. Concept of Project. Explore HTK Feature Extraction Capabilities Feature Output Types

upton
Download Presentation

Speech Processing Using HTK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Processing Using HTK Trevor Bowden 12/08/2008

  2. Outline • Concept of Project • HTK Feature Extraction Capabilities • Details of Feature Extraction Script • Future Development

  3. Concept of Project • Explore HTK Feature Extraction Capabilities • Feature Output Types • Additional Feature Parameters • Ideal Solution • Derive Any Feature Type from Any Corpus

  4. HTK Feature Extraction Models • Linear Prediction Analysis • Cepstral Analysis Hamming Window Hamming Window FFT() Log()

  5. HTK Feature Extraction Capabilities • Feature Extraction Methods • Linear Prediction Analysis • Cepstral Analysis • Mel-Scaling • Perceptual Linear Prediction Analysis • Additional Feature Information • Signal Energy • Derivative Information

  6. Linear Prediction Analysis • Vocal Tract Transfer Function • Transfer Function Coefficients Solution • Autocorrelation Matrices • Autocorrelation of Speech • Amplitude of Model

  7. Cepstral Analysis • Logarithmic Spectral Domain (Cepstral Domain) • Allows for Separation of Convolved Signals

  8. Mel-Scaling • Perception of sound by the human mind is non-linear in that the mind perceives a non-linear scale of pitches to be equally spaced in the frequency domain.

  9. Perceptual Linear Prediction Analysis • Perceptual linear prediction is a combination of both linear prediction and Cepstral analysis. • The spectrum of the speech data is first converted using the Mel scale. • The data is then cubed and linear prediction coefficients are computed. • From these coefficients Cepstral analysis is performed.

  10. Signal Energy and Derivatives • Signal Energy • Delta Coefficients • Acceleration Coefficients • Third Differential Coefficients

  11. Speech Processing of the AMI Corpus • Ideal Solution Yields Generic Feature Types from Generic Corpus • Corpora Have Varying Audio File Types and Varying Organizational Structures • Corpora Have Varying Methods for Annotation

  12. Speech Processing of the AMI Corpus • Project Solution Yields Generic Feature Types from Corpora with Riff Format WAV Audio Files • Two Main Functions of Script • Traverse Corpus Directory Tree • Generate List of Audio Files • Produce Feature Data • Using User-Defined Configuration File

  13. Future Development • Expand Script to Handle Audio Inputs of Any File Type • Include Processing for Specific Corpus Annotations

More Related