_________________________

_________________________ Speech and Music Discrimination using Gaussian Mixture Model Seminar Program Project Team Dr. Deep Sen (Supervisor) CHOI Arthur, Tsz Kin (3015809) CHENG Derek, Ka Chun (3015631)

_________________________ Speech and Music Discrimination using GMM

_________________________ Speech and Music Discrimination using GMM Motivations • Many researches on HMM, not too many using GMM • GMM reduce complexity compared to HMM • Our feature extraction methods will reduce complexity • Multimedia files search/storage still under develop • Fit University requirement

_________________________ Speech and Music Discrimination using GMM Applications • Audio Database Indexing • Automatic Bandwidth Allocation • Broadcast Browsing • Intelligent Signal Processing • Intelligent Audio Coding • Audio file Compression • Audio Clip Editing

_________________________ Speech and Music Discrimination using GMM Approaches Deterministic Signals can be analysis as completely specified functions of time Un-deterministic Signals must analysis probilistically [Tele3013 notes]

_________________________ Speech and Music Discrimination using GMM Procedures • Read a signal • Segmented it into small frames • Extract features of each frames • Classify each frames

Speech and Music Discrimination using GMM _________________________ Feature Extractions

Speech and Music Discrimination using GMM _________________________ Classification

_________________________ music speech silence speech Speech and Music Discrimination using GMM

Speech and Music Discrimination by using GMM _________________________ Segmentation • Reasons • Get a better estimation result • Achieve a Real-Time behavior • Problems and solutions • Frames too big -- Classification accuracy decrease • Frames too small -- Feature extraction accuracy decrease • Chose frame size ~20ms Music Signal

Speech and Music Discrimination using GMM _________________________ 4 Hz modulation energy • Speech energy has a characteristic energy modulation peak around the 4Hz syllabic rate.[Houtgast & Steeneken 1985] • Reasons • Accurately separate speech signals and music signals (~94%) • Easy to implement in Matlab • Novel and Robust

Speech and Music Discrimination using GMM _________________________ Music Signal Speech Signal

Speech and Music Discrimination using GMM _________________________ Music Signal Speech Signal Energy vs. Time

Speech and Music Discrimination using GMM _________________________ Zero-Crossing Count (ZCC) • The zero-crossing count is the total number of times that a signal goes through the x-axis over a certain time. • Speech signals High ZCC • Music signals Low ZCC • Reasons • ZCC of a speech signal is significantly high • Very easy to implement in Matlab • Mature and Robust

Speech and Music Discrimination using GMM _________________________

Speech and Music Discrimination using GMM _________________________ Spectral Roll-off Point • The spectral roll-off point measures the “skewness” of the spectrum. • Reasons • Music usually has more energy in the high frequency range • Useful for separate different kind of speech later

_________________________ Speech and Music Discrimination using GMM Spectral Roll-off Point Spectral Roll-off Point = SR where,

_________________________ Speech and Music Discrimination using GMM power Music Signal frequency power Speech Signal frequency

_________________________ Speech and Music Discrimination using GMM Entropy Modulation • Music appears to be “ordered” compared with a speech signal [J.Pinquier, J.L. Rouas, R. Andre-Obercht 2002] • Higher Entropy means higher “ordered” • Higher Dynamism means higher rate of changes • Reasons • Accurately separate speech signals and music signals(~90%) • Novel and Robust

_________________________ Speech and Music Discrimination using GMM Music Signal Speech Signal

_________________________ Speech and Music Discrimination using GMM [J. Ajmera, I.A. McCowan, H.Bourlard 2002]

_________________________ Speech and Music Discrimination using GMM Instantaneous entropy Average entropy Average Instantaneous entropy

Speech and Music Discrimination using GMM _________________________ Pulse Metric The beat of a piece of music is one of the clearest features of the music. [K.D. Martin, E.D.Scheirer, B.L. Vercoe 1988]

Speech and Music Discrimination using GMM _________________________ Other Features • Spectral Centroid • Spectral Flux • Silence Ratio • Short-Time Energy Ratio • Volume Dynamic Change • Number of Segments • Segment Duration • …etc

_________________________ Introduction to Gaussian Mixture Model (GMM) • Differentiation of speech and music from a sound source • Use for speech processing, mostly for speech recognition, speaker identification and voice conversion • Model densities and to represent general spectral features

Why we choose GMM? • Low complexity • Rate independence • Bit scalability • Short computation time

What is Gaussian Mixture Model? • Gaussian Mixture Model consist of a set of local Gaussian modes, and an integrating network. Different Gaussian distributions represent different domain of feature space, and have different output characteristics • GMM try to describe a complex system using combination of all the Gaussian clusters, instead of using a single model

Gaussian mixtures or clusters • Use to describe a complex system instead of using a single model • Represents a dataset by a set of mean and covariance

Gaussian Mixture Model A Gaussian Mixture Model is represented by: is the P-dimensional input vector is the mixture weights is the component densities

Clustering • ‘clustering’ is a technique from pattern classification • A technique to group samples • P-dimensional feature vector is considered as a point in space and all points ‘near’ if are clustered together

clustering Grey circle represents the variance of distribution

Gaussian component density P-variate Gaussian function of the form: is the mean vector is the covariance matrix

Covariance matrix • Indicates the dispersion of distribution • In mathematics, it is defined as the matrix whose ij th element is the covariance of and i,j=1…d

Covariance matrix • The diagonal components of the covariance matrix are the variances of individual random variables • Off-diagonal components are the covariance of two random variables, and • Symmetric matrix

Full covariance matrix • The most powerful Gaussian model as it fits the data best drawback! • Needs a lot of data to estimate parameters • Costly in high-dimensional feature spaces

Diagonal covariance matrix • Good compromise between quality and model size • Gaussian components can act together to model the overall probability density function • Capable of modelling the correlations between the feature vector

Review the Gaussian mixture density • The matrix weight must satisfy the condition and Three components compose the Gaussian mixture density: mean vectors, covariance matrices and mixture weights

Expectation-maximization (EM) • Estimate the mean vector, covariance matrix and mixture weight • Recursively updates distribution of each Gaussian model and conditional probability

Idea of Expectation-maximization Instead of starting with a random configuration of all components and improve upon this configuration with expectation-maximization. We start with the optimal one-component mixture. Then start repeating two steps until convergence • Inset a new components and • Apply EM until convergence

Convergence Theorem The sequence of likelihood is monotonically-increasing and bounded, the likelihood will converge to a local maximum

EM algorithm Assume denote the log-likelihood of the dataset under k-component matrix • Compute the optimal one-component mixture . Set k=1 • Find the optimal new component and corresponding matrix weight while keeping fixed

EM algorithm 3. Set and k=k+1 4. Update until convergence

Speech/music discrimination by using GMM An interesting feature of GMM, component densities of mixture may represent… • Different phonetic events for modelling speech • Different portion of the sound when used to model spectra of sound from musical instrument

Achievement • Identified optimized frame size • Obtained robust features • Performed a few tests • Implemented some Matlab codes • Studied the Gaussian Mixture Models (GMMs) and some of their mathematical expressions

Next year planning • Comprehensive and more in-depth research on GMMs • Model the sound source base on GMMs • Evaluate noise effect • Matlab implementation for speech/music separation

Next year planning • Investigate a novel classification method – Support Vector Machine (SVM) • Differentiate Male and female speech • Differentiate Classical and Non-Classical Music • Generate a final thesis report

_________________________

_________________________

Presentation Transcript