1 / 54

_________________________

_________________________. Speech and Music Discrimination using Gaussian Mixture Model. Seminar Program. Project Team Dr. Deep Sen (Supervisor) CHOI Arthur, Tsz Kin (3015809) CHENG Derek, Ka Chun (3015631). _________________________. Speech and Music Discrimination using GMM.

lotte
Download Presentation

_________________________

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. _________________________ Speech and Music Discrimination using Gaussian Mixture Model Seminar Program Project Team Dr. Deep Sen (Supervisor) CHOI Arthur, Tsz Kin (3015809) CHENG Derek, Ka Chun (3015631)

  2. _________________________ Speech and Music Discrimination using GMM

  3. _________________________ Speech and Music Discrimination using GMM Motivations • Many researches on HMM, not too many using GMM • GMM reduce complexity compared to HMM • Our feature extraction methods will reduce complexity • Multimedia files search/storage still under develop • Fit University requirement

  4. _________________________ Speech and Music Discrimination using GMM

  5. _________________________ Speech and Music Discrimination using GMM Applications • Audio Database Indexing • Automatic Bandwidth Allocation • Broadcast Browsing • Intelligent Signal Processing • Intelligent Audio Coding • Audio file Compression • Audio Clip Editing

  6. _________________________ Speech and Music Discrimination using GMM Approaches Deterministic Signals can be analysis as completely specified functions of time Un-deterministic Signals must analysis probilistically [Tele3013 notes]

  7. _________________________ Speech and Music Discrimination using GMM Procedures • Read a signal • Segmented it into small frames • Extract features of each frames • Classify each frames

  8. Speech and Music Discrimination using GMM _________________________ Feature Extractions

  9. Speech and Music Discrimination using GMM _________________________ Classification

  10. _________________________ music speech silence speech Speech and Music Discrimination using GMM

  11. Speech and Music Discrimination by using GMM _________________________ Segmentation • Reasons • Get a better estimation result • Achieve a Real-Time behavior • Problems and solutions • Frames too big -- Classification accuracy decrease • Frames too small -- Feature extraction accuracy decrease • Chose frame size ~20ms Music Signal

  12. _________________________ Speech and Music Discrimination using GMM

  13. Speech and Music Discrimination using GMM _________________________ 4 Hz modulation energy • Speech energy has a characteristic energy modulation peak around the 4Hz syllabic rate.[Houtgast & Steeneken 1985] • Reasons • Accurately separate speech signals and music signals (~94%) • Easy to implement in Matlab • Novel and Robust

  14. Speech and Music Discrimination using GMM _________________________ Music Signal Speech Signal

  15. _________________________ Speech and Music Discrimination using GMM

  16. Speech and Music Discrimination using GMM _________________________ Music Signal Speech Signal Energy vs. Time

  17. Speech and Music Discrimination using GMM _________________________ Zero-Crossing Count (ZCC) • The zero-crossing count is the total number of times that a signal goes through the x-axis over a certain time. • Speech signals High ZCC • Music signals Low ZCC • Reasons • ZCC of a speech signal is significantly high • Very easy to implement in Matlab • Mature and Robust

  18. Speech and Music Discrimination using GMM _________________________

  19. Speech and Music Discrimination using GMM _________________________

  20. Speech and Music Discrimination using GMM _________________________ Spectral Roll-off Point • The spectral roll-off point measures the “skewness” of the spectrum. • Reasons • Music usually has more energy in the high frequency range • Useful for separate different kind of speech later

  21. _________________________ Speech and Music Discrimination using GMM Spectral Roll-off Point Spectral Roll-off Point = SR where,

  22. _________________________ Speech and Music Discrimination using GMM power Music Signal frequency power Speech Signal frequency

  23. _________________________ Speech and Music Discrimination using GMM Entropy Modulation • Music appears to be “ordered” compared with a speech signal [J.Pinquier, J.L. Rouas, R. Andre-Obercht 2002] • Higher Entropy means higher “ordered” • Higher Dynamism means higher rate of changes • Reasons • Accurately separate speech signals and music signals(~90%) • Novel and Robust

  24. _________________________ Speech and Music Discrimination using GMM Music Signal Speech Signal

  25. _________________________ Speech and Music Discrimination using GMM [J. Ajmera, I.A. McCowan, H.Bourlard 2002]

  26. _________________________ Speech and Music Discrimination using GMM Instantaneous entropy Average entropy Average Instantaneous entropy

  27. Speech and Music Discrimination using GMM _________________________ Pulse Metric The beat of a piece of music is one of the clearest features of the music. [K.D. Martin, E.D.Scheirer, B.L. Vercoe 1988]

  28. Speech and Music Discrimination using GMM _________________________ Other Features • Spectral Centroid • Spectral Flux • Silence Ratio • Short-Time Energy Ratio • Volume Dynamic Change • Number of Segments • Segment Duration • …etc

  29. _________________________ Introduction to Gaussian Mixture Model (GMM) • Differentiation of speech and music from a sound source • Use for speech processing, mostly for speech recognition, speaker identification and voice conversion • Model densities and to represent general spectral features

  30. Why we choose GMM? • Low complexity • Rate independence • Bit scalability • Short computation time

  31. What is Gaussian Mixture Model? • Gaussian Mixture Model consist of a set of local Gaussian modes, and an integrating network. Different Gaussian distributions represent different domain of feature space, and have different output characteristics • GMM try to describe a complex system using combination of all the Gaussian clusters, instead of using a single model

  32. Gaussian mixtures or clusters • Use to describe a complex system instead of using a single model • Represents a dataset by a set of mean and covariance

  33. Gaussian Mixture Model A Gaussian Mixture Model is represented by: is the P-dimensional input vector is the mixture weights is the component densities

  34. Clustering • ‘clustering’ is a technique from pattern classification • A technique to group samples • P-dimensional feature vector is considered as a point in space and all points ‘near’ if are clustered together

  35. clustering Grey circle represents the variance of distribution

  36. Gaussian component density P-variate Gaussian function of the form: is the mean vector is the covariance matrix

  37. Covariance matrix • Indicates the dispersion of distribution • In mathematics, it is defined as the matrix whose ij th element is the covariance of and i,j=1…d

  38. Covariance matrix • The diagonal components of the covariance matrix are the variances of individual random variables • Off-diagonal components are the covariance of two random variables, and • Symmetric matrix

  39. Full covariance matrix • The most powerful Gaussian model as it fits the data best drawback! • Needs a lot of data to estimate parameters • Costly in high-dimensional feature spaces

  40. Diagonal covariance matrix • Good compromise between quality and model size • Gaussian components can act together to model the overall probability density function • Capable of modelling the correlations between the feature vector

  41. Review the Gaussian mixture density • The matrix weight must satisfy the condition and Three components compose the Gaussian mixture density: mean vectors, covariance matrices and mixture weights

  42. Expectation-maximization (EM) • Estimate the mean vector, covariance matrix and mixture weight • Recursively updates distribution of each Gaussian model and conditional probability

  43. Idea of Expectation-maximization Instead of starting with a random configuration of all components and improve upon this configuration with expectation-maximization. We start with the optimal one-component mixture. Then start repeating two steps until convergence • Inset a new components and • Apply EM until convergence

  44. Convergence Theorem The sequence of likelihood is monotonically-increasing and bounded, the likelihood will converge to a local maximum

  45. EM algorithm Assume denote the log-likelihood of the dataset under k-component matrix • Compute the optimal one-component mixture . Set k=1 • Find the optimal new component and corresponding matrix weight while keeping fixed

  46. EM algorithm 3. Set and k=k+1 4. Update until convergence

  47. Speech/music discrimination by using GMM An interesting feature of GMM, component densities of mixture may represent… • Different phonetic events for modelling speech • Different portion of the sound when used to model spectra of sound from musical instrument

  48. Achievement • Identified optimized frame size • Obtained robust features • Performed a few tests • Implemented some Matlab codes • Studied the Gaussian Mixture Models (GMMs) and some of their mathematical expressions

  49. Next year planning • Comprehensive and more in-depth research on GMMs • Model the sound source base on GMMs • Evaluate noise effect • Matlab implementation for speech/music separation

  50. Next year planning • Investigate a novel classification method – Support Vector Machine (SVM) • Differentiate Male and female speech • Differentiate Classical and Non-Classical Music • Generate a final thesis report

More Related