510 likes | 520 Views
This research aims to develop a structure-based speech classification system using nonlinear embedding techniques. The study focuses on voiced and unvoiced speech, usable and unusable speech, and nonlinearities in speech. The proposed research includes feature extraction methods and classification algorithms based on difference-mean comparison and nodal density measures.
E N D
Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu Advisor Dr. Robert E. Yantorno Committee Dr. Saroj K. Biswas Dr. Henry M. Sendaula
Acknowledgment • Dr. Robert Yantorno • Dr. Saroj Biswas • Dr. Henry Sendaula • Speech Lab Members • Air Force Research Laboratory, Rome, NY
Overview • Voiced and Unvoiced Speech • Usable and Unusable Speech • Nonlinearities in Speech • Non-Linear Embedding • Research Goal • Proposed Research
Voiced Quasi-periodic excitation Modulation by vocal tract Production of vowels, voiced fricatives & plosives Voiced/Unvoiced Characteristics • Unvoiced • No periodic vibration of vocal chords • Noise-like nature • Production of unvoiced fricatives and plosives
Usable Speech • Portions of co-channel speech still usable for applications such as Speaker ID and Speech Recognition. • Low-energy (unvoiced/silence) segments overlap with high-energy (voiced) segments • Target-to-interferer Ratio (TIR) > 20dB
Nonlinearities in Speech • Glottal waveform changes • Shape varies with amplitude • Physical observations • Flow in vocal tract is non-laminar • Coupling between vocal tract and folds • When glottis is open, prominent changes are observed in formant characteristics
Nonlinear Embedding • Nonlinear Systems • Point moving along some trajectory in an abstract state space • Coordinates of the point are independent degrees of freedom of the system • State space could be reconstructed from a scalar signal
Nonlinear Embedding (cont’d) • Takens’ Method of Delays • A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension • Vectors in m-dimensional state space are formed from time-delayed values of a signal
Nonlinear Embedding (cont’d) • m = embedding dimension • d = delay value
Nonlinear Embedding (Cont’d) • Delay value, d: • Dependent on sampling rate and signal properties • Large enough such that nonlinearities are taken into account by the reconstructed trajectory • Small enough to retain reasonable time resolution
Nonlinear Embedding (Cont’d) • Dimension, m: • Generation of voiced speech constitutes a low-dimensional system • Generation of unvoiced speech constitutes a relatively high-dimensional system • Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech
Research Goal • Feature Extraction • Difference-Mean Comparison (DMC) Measure • Voiced/unvoiced classification • Nodal Density Measure • Voiced/unvoiced classification • Usable/unusable classification
Difference-Mean Comparison (DMC) Measure Voiced/Unvoiced Classification
Introduction • 3rd order difference computation along first non-singleton dimension • Ist order difference of NxN matrix given by • Length(3rd order diff. > mean) observed
Nodal Density Measure Voiced/Unvoiced Classification Usable/Unusable Classification
Introduction • Smallest cube which encloses the signal is determined • This cube is divided into N smaller cubes • Edges of the smaller cubes are defined as nodes • Number of nodes spanned by the signal is determined • Ratio of number of nodes spanned to total number of nodes is defined as nodal density
Filtering • Moving Average Filter • Order, M = 10
Proposed Research Usable/Unusable Classification
Nodes Spanned by Embedded Co-channel Speech of 30dB TIR Nodes Spanned by Embedded Co-channel Speech of 30dB TIR Nodes Spanned by Embedded Co-channel Speech of 30dB TIR 6000 6000 6000 4000 4000 4000 2000 2000 2000 0 0 0 -2000 -2000 -4000 -4000 -2000 -6000 -6000 -4000 5000 5000 5000 5000 5000 0 0 6000 0 0 4000 -5000 -5000 0 2000 -5000 -5000 0 -10000 -10000 -10000 -10000 -2000 -5000 -4000 Nodes Spanned by Embedded Usable and Unusable Speech Frames
Difference-Mean Comparison V/UV Classification Nonlinear Embedding Speech Nodal Density V/UV Classification Usable/Unusable Classification Summary
Future Proposed Research • Determine optimum filter for nodal density-based voiced/unvoiced classification • Develop nodal density measure for usable/unusable classification • Investigate the presence of complimentary information in between both features (DMC and nodal density) for voiced/unvoiced classification • Perform decision-level fusion of both features