540 likes | 696 Views
_________________________. Speech and Music Discrimination using Gaussian Mixture Model. Seminar Program. Project Team Dr. Deep Sen (Supervisor) CHOI Arthur, Tsz Kin (3015809) CHENG Derek, Ka Chun (3015631). _________________________. Speech and Music Discrimination using GMM.
E N D
_________________________ Speech and Music Discrimination using Gaussian Mixture Model Seminar Program Project Team Dr. Deep Sen (Supervisor) CHOI Arthur, Tsz Kin (3015809) CHENG Derek, Ka Chun (3015631)
_________________________ Speech and Music Discrimination using GMM
_________________________ Speech and Music Discrimination using GMM Motivations • Many researches on HMM, not too many using GMM • GMM reduce complexity compared to HMM • Our feature extraction methods will reduce complexity • Multimedia files search/storage still under develop • Fit University requirement
_________________________ Speech and Music Discrimination using GMM
_________________________ Speech and Music Discrimination using GMM Applications • Audio Database Indexing • Automatic Bandwidth Allocation • Broadcast Browsing • Intelligent Signal Processing • Intelligent Audio Coding • Audio file Compression • Audio Clip Editing
_________________________ Speech and Music Discrimination using GMM Approaches Deterministic Signals can be analysis as completely specified functions of time Un-deterministic Signals must analysis probilistically [Tele3013 notes]
_________________________ Speech and Music Discrimination using GMM Procedures • Read a signal • Segmented it into small frames • Extract features of each frames • Classify each frames
Speech and Music Discrimination using GMM _________________________ Feature Extractions
Speech and Music Discrimination using GMM _________________________ Classification
_________________________ music speech silence speech Speech and Music Discrimination using GMM
Speech and Music Discrimination by using GMM _________________________ Segmentation • Reasons • Get a better estimation result • Achieve a Real-Time behavior • Problems and solutions • Frames too big -- Classification accuracy decrease • Frames too small -- Feature extraction accuracy decrease • Chose frame size ~20ms Music Signal
_________________________ Speech and Music Discrimination using GMM
Speech and Music Discrimination using GMM _________________________ 4 Hz modulation energy • Speech energy has a characteristic energy modulation peak around the 4Hz syllabic rate.[Houtgast & Steeneken 1985] • Reasons • Accurately separate speech signals and music signals (~94%) • Easy to implement in Matlab • Novel and Robust
Speech and Music Discrimination using GMM _________________________ Music Signal Speech Signal
_________________________ Speech and Music Discrimination using GMM
Speech and Music Discrimination using GMM _________________________ Music Signal Speech Signal Energy vs. Time
Speech and Music Discrimination using GMM _________________________ Zero-Crossing Count (ZCC) • The zero-crossing count is the total number of times that a signal goes through the x-axis over a certain time. • Speech signals High ZCC • Music signals Low ZCC • Reasons • ZCC of a speech signal is significantly high • Very easy to implement in Matlab • Mature and Robust
Speech and Music Discrimination using GMM _________________________
Speech and Music Discrimination using GMM _________________________
Speech and Music Discrimination using GMM _________________________ Spectral Roll-off Point • The spectral roll-off point measures the “skewness” of the spectrum. • Reasons • Music usually has more energy in the high frequency range • Useful for separate different kind of speech later
_________________________ Speech and Music Discrimination using GMM Spectral Roll-off Point Spectral Roll-off Point = SR where,
_________________________ Speech and Music Discrimination using GMM power Music Signal frequency power Speech Signal frequency
_________________________ Speech and Music Discrimination using GMM Entropy Modulation • Music appears to be “ordered” compared with a speech signal [J.Pinquier, J.L. Rouas, R. Andre-Obercht 2002] • Higher Entropy means higher “ordered” • Higher Dynamism means higher rate of changes • Reasons • Accurately separate speech signals and music signals(~90%) • Novel and Robust
_________________________ Speech and Music Discrimination using GMM Music Signal Speech Signal
_________________________ Speech and Music Discrimination using GMM [J. Ajmera, I.A. McCowan, H.Bourlard 2002]
_________________________ Speech and Music Discrimination using GMM Instantaneous entropy Average entropy Average Instantaneous entropy
Speech and Music Discrimination using GMM _________________________ Pulse Metric The beat of a piece of music is one of the clearest features of the music. [K.D. Martin, E.D.Scheirer, B.L. Vercoe 1988]
Speech and Music Discrimination using GMM _________________________ Other Features • Spectral Centroid • Spectral Flux • Silence Ratio • Short-Time Energy Ratio • Volume Dynamic Change • Number of Segments • Segment Duration • …etc
_________________________ Introduction to Gaussian Mixture Model (GMM) • Differentiation of speech and music from a sound source • Use for speech processing, mostly for speech recognition, speaker identification and voice conversion • Model densities and to represent general spectral features
Why we choose GMM? • Low complexity • Rate independence • Bit scalability • Short computation time
What is Gaussian Mixture Model? • Gaussian Mixture Model consist of a set of local Gaussian modes, and an integrating network. Different Gaussian distributions represent different domain of feature space, and have different output characteristics • GMM try to describe a complex system using combination of all the Gaussian clusters, instead of using a single model
Gaussian mixtures or clusters • Use to describe a complex system instead of using a single model • Represents a dataset by a set of mean and covariance
Gaussian Mixture Model A Gaussian Mixture Model is represented by: is the P-dimensional input vector is the mixture weights is the component densities
Clustering • ‘clustering’ is a technique from pattern classification • A technique to group samples • P-dimensional feature vector is considered as a point in space and all points ‘near’ if are clustered together
clustering Grey circle represents the variance of distribution
Gaussian component density P-variate Gaussian function of the form: is the mean vector is the covariance matrix
Covariance matrix • Indicates the dispersion of distribution • In mathematics, it is defined as the matrix whose ij th element is the covariance of and i,j=1…d
Covariance matrix • The diagonal components of the covariance matrix are the variances of individual random variables • Off-diagonal components are the covariance of two random variables, and • Symmetric matrix
Full covariance matrix • The most powerful Gaussian model as it fits the data best drawback! • Needs a lot of data to estimate parameters • Costly in high-dimensional feature spaces
Diagonal covariance matrix • Good compromise between quality and model size • Gaussian components can act together to model the overall probability density function • Capable of modelling the correlations between the feature vector
Review the Gaussian mixture density • The matrix weight must satisfy the condition and Three components compose the Gaussian mixture density: mean vectors, covariance matrices and mixture weights
Expectation-maximization (EM) • Estimate the mean vector, covariance matrix and mixture weight • Recursively updates distribution of each Gaussian model and conditional probability
Idea of Expectation-maximization Instead of starting with a random configuration of all components and improve upon this configuration with expectation-maximization. We start with the optimal one-component mixture. Then start repeating two steps until convergence • Inset a new components and • Apply EM until convergence
Convergence Theorem The sequence of likelihood is monotonically-increasing and bounded, the likelihood will converge to a local maximum
EM algorithm Assume denote the log-likelihood of the dataset under k-component matrix • Compute the optimal one-component mixture . Set k=1 • Find the optimal new component and corresponding matrix weight while keeping fixed
EM algorithm 3. Set and k=k+1 4. Update until convergence
Speech/music discrimination by using GMM An interesting feature of GMM, component densities of mixture may represent… • Different phonetic events for modelling speech • Different portion of the sound when used to model spectra of sound from musical instrument
Achievement • Identified optimized frame size • Obtained robust features • Performed a few tests • Implemented some Matlab codes • Studied the Gaussian Mixture Models (GMMs) and some of their mathematical expressions
Next year planning • Comprehensive and more in-depth research on GMMs • Model the sound source base on GMMs • Evaluate noise effect • Matlab implementation for speech/music separation
Next year planning • Investigate a novel classification method – Support Vector Machine (SVM) • Differentiate Male and female speech • Differentiate Classical and Non-Classical Music • Generate a final thesis report