Structure-Based Speech Classification Using Nonlinear Embedding Techniques

Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu Advisor Dr. Robert E. Yantorno Committee Dr. Saroj K. Biswas Dr. Henry M. Sendaula

Acknowledgment • Dr. Robert Yantorno • Dr. Saroj Biswas • Dr. Henry Sendaula • Speech Lab Members • Air Force Research Laboratory, Rome, NY

Overview • Voiced and Unvoiced Speech • Usable and Unusable Speech • Nonlinearities in Speech • Non-Linear Embedding • Research Goal • Proposed Research

Voiced and Unvoiced Speech

Voiced Quasi-periodic excitation Modulation by vocal tract Production of vowels, voiced fricatives & plosives Voiced/Unvoiced Characteristics • Unvoiced • No periodic vibration of vocal chords • Noise-like nature • Production of unvoiced fricatives and plosives

Usable Speech • Portions of co-channel speech still usable for applications such as Speaker ID and Speech Recognition. • Low-energy (unvoiced/silence) segments overlap with high-energy (voiced) segments • Target-to-interferer Ratio (TIR) > 20dB

Nonlinearities in Speech • Glottal waveform changes • Shape varies with amplitude • Physical observations • Flow in vocal tract is non-laminar • Coupling between vocal tract and folds • When glottis is open, prominent changes are observed in formant characteristics

Nonlinear Embedding • Nonlinear Systems • Point moving along some trajectory in an abstract state space • Coordinates of the point are independent degrees of freedom of the system • State space could be reconstructed from a scalar signal

Nonlinear Embedding (cont’d) • Takens’ Method of Delays • A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension • Vectors in m-dimensional state space are formed from time-delayed values of a signal

Nonlinear Embedding (cont’d) • m = embedding dimension • d = delay value

Nonlinear Embedding (Cont’d) • Delay value, d: • Dependent on sampling rate and signal properties • Large enough such that nonlinearities are taken into account by the reconstructed trajectory • Small enough to retain reasonable time resolution

Nonlinear Embedding (Cont’d) • Dimension, m: • Generation of voiced speech constitutes a low-dimensional system • Generation of unvoiced speech constitutes a relatively high-dimensional system • Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech

Embedded Voiced and Unvoiced Speech

Embedded Usable and Unusable Speech

Research Goal • Feature Extraction • Difference-Mean Comparison (DMC) Measure • Voiced/unvoiced classification • Nodal Density Measure • Voiced/unvoiced classification • Usable/unusable classification

Difference-Mean Comparison (DMC) Measure Voiced/Unvoiced Classification

Introduction • 3rd order difference computation along first non-singleton dimension • Ist order difference of NxN matrix given by • Length(3rd order diff. > mean) observed

Embedded Voiced and Unvoiced Speech

Difference-Mean Comparison Distribution

DMC-Based Decisions

Results

Results (Cont’d)

Nodal Density Measure Voiced/Unvoiced Classification Usable/Unusable Classification

Introduction • Smallest cube which encloses the signal is determined • This cube is divided into N smaller cubes • Edges of the smaller cubes are defined as nodes • Number of nodes spanned by the signal is determined • Ratio of number of nodes spanned to total number of nodes is defined as nodal density

Voiced/Unvoiced Classification

Embedded Voiced and Unvoiced Speech Frames with Grids

Nodes Spanned by Embedded Voiced and Unvoiced Speech Frames

Nodal-Density Distribution

Filtering • Moving Average Filter • Order, M = 10

Nodal-Density Distributions after Filtering

Nodal-Density Distributions After Filtering

Results

Results (Cont’d)

Proposed Research Usable/Unusable Classification

Embedded Usable and Unusable Speech Frames with Grids

Nodes Spanned by Embedded Co-channel Speech of 30dB TIR Nodes Spanned by Embedded Co-channel Speech of 30dB TIR Nodes Spanned by Embedded Co-channel Speech of 30dB TIR 6000 6000 6000 4000 4000 4000 2000 2000 2000 0 0 0 -2000 -2000 -4000 -4000 -2000 -6000 -6000 -4000 5000 5000 5000 5000 5000 0 0 6000 0 0 4000 -5000 -5000 0 2000 -5000 -5000 0 -10000 -10000 -10000 -10000 -2000 -5000 -4000 Nodes Spanned by Embedded Usable and Unusable Speech Frames

Preliminary Results

Difference-Mean Comparison V/UV Classification Nonlinear Embedding Speech Nodal Density V/UV Classification Usable/Unusable Classification Summary

Future Proposed Research • Determine optimum filter for nodal density-based voiced/unvoiced classification • Develop nodal density measure for usable/unusable classification • Investigate the presence of complimentary information in between both features (DMC and nodal density) for voiced/unvoiced classification • Perform decision-level fusion of both features

Structure-Based Speech Classification Using Nonlinear Embedding Techniques

Structure-Based Speech Classification Using Nonlinear Embedding Techniques

Presentation Transcript

Speech-Coding Techniques

Speech Coding Techniques

Image Compression and Classification using Nonlinear Filter Banks

Other Classification Techniques

Recommendations Based on Speech Classification

Nonlinear Dimensionality Reduction by Locally Linear Embedding

NONLINEAR STATISTICAL MODELING OF SPEECH

Classification Techniques II

Structure and Classification

Structure Preserving Embedding

Document Classification Techniques using LSI

Nonlinear Statistical Modeling of Speech

Speech-Coding Techniques

NONLINEAR STATISTICAL MODELING OF SPEECH

Classification using instance-based learning

Modern Classification Techniques

Classification Techniques: Bayesian Classification

Advanced Classification techniques

A Review on Speech Feature Techniques and Classification Techniques

SCOP – Protein structure classification CATH – Protein structure classification

EFFECTIVE SPEECH TECHNIQUES???

Using Manifold Structure for Partially Labeled Classification