Harmonic-Temporal Clustering of Speech

Harmonic-Temporal Clustering of Speech Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné, Shigeki Sagayama

Motivation and Approach • Precise and Robust F0 analysis • Analysis of complex and varied acoustical scenes • For speech, applications in speech recognition, prosody analysis, speech enhancement, speaker identification… • Desirable features of a new pitch determination algorithm (PDA) • The performance should stay high in a wide range of background noises (white noise, pink noise, noise bursts, music, other speech) • Extracting simultaneously the pitch contours of several concurrent voices is possible • Overall speech model, spectro-temporal model with constraints • Several existing multi-pitch tracking algorithms: initial frame-by-frame analysis, then post-processing to reduce errors and obtain a smooth pitch contour (for example using HMMs) • We propose to perform estimation and model-based interpolation simultaneously: • Parametric model of the voiced parts of the power spectrum of speech • Introduction of a noise model to extract harmonically structured “islands” within a “sea” of unstructured noise.

Overview of the method • Express the whole pitch contour as a smooth curve→ cubic spline • Distribute audio objects with different acoustical properties • Express the harmonic structure as a parametric function: GMM • Express the power envelope in time direction as a parametric function: GMM • Characteristic: • Through the harmonicity assumption, the method models the voiced parts of speech Log-Frequency Simultaneous optimization of the parameters time

F0 estimation in noisy environments • Speech mixed with broadband background noise: • Voiced speech with several types of interferences: Accuracy (%) of the F0 estimation:

Multi-pitch estimation • Co-channel speech of two speakers speaking simultaneously with equal average power. • Test data • Bagshaw database、１５０ mixtures • 16kHz, monaural signal • Results 8kHz Frequency 50Hz 0s time 1.3s 1.3s 0s 「a-o-i」「o-i-o-o-u」 No second sound here

Harmonic-Temporal Clustering of Speech

Harmonic-Temporal Clustering of Speech

Presentation Transcript

Clustering and Partitioning for Spatial and Temporal Data Mining

Harmonic Progression and Harmonic Rhythm

Harmonic Oscillation

Harmonic Motion

HARMONIC ANALYSIS OF BV

On Use of Temporal Dynamics of Speech for Language Identification

Harmonic Characterization

Uses of the pitch-scaled harmonic filter in speech processing

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model

Temporal Mediators: Integration of Temporal Reasoning and Temporal-Data Maintenance

Spatio-Temporal Clustering

Part I Iterative Clustering of Gene Expression Data for Analyzing Temporal Patterns

Harmonic Ascent 

Speech Discrimination Based on Multiscale Spectro–Temporal Modulations

Harmonic Motion

TEMPORAL EVENT CLUSTERING FOR DIGITAL PHOTO COLLECTIONS

Learning Spectral Clustering, With Application to Speech Separation

Clustering and Partitioning for Spatial and Temporal Data Mining

UCERF3 Spatio-Temporal Clustering

Harmonic Waves