200 likes | 359 Views
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS GENEVA - AUGUST 27-29, 2003 ISCA Tutorial and Research Workshop International Speech Communication Association. Entropy and Dynamism Criteria for Voice Quality Classification Applications.
E N D
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISGENEVA - AUGUST 27-29, 2003ISCA Tutorial and Research WorkshopInternational Speech Communication Association Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis L. Mitrofanov Belarusian State University, Radiophysics Department, Minsk, Belarus
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Voice Quality Classification Applications • Introduction • System design • Experiment • Conclusion
Introduction VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association • Audio is a large and extremely variable data class. • The range of sounds is large, from music genres to animal cries to synthesizer samples. • Any of the above can and will occur in combination.
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Existing Approaches • Signal Processing Techniques • Spectrum • Modulation spectrum • Temporal Information • Decision Making • Bayesian Information Criterion (BIC) • Log Likelihood Ratio • Hidden Markov Model (HMM)
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Block diagram of the proposed system Input Data (Wave file) Feature vector extraction Neural network Entropy & Dynamism HMM Segments Vectors (Mel Cepstra) Probability of Russian phonemes Entropy and Dynamism
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Definitions Entropy and averaged entropyEntropy is measure of the uncertainty or disorder in a given distribution We use N=40
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Definitions Dynamism and average dynamismDynamism is a measure of the rate of change of a quantity
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Feature Vectors extraction We use 12 Mel Cepstra coefficients in 30ms window with shifting of frame 10ms, for 4-15min wave files of russian speech, non-russian speech and music.
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Hidden Markov Model • HMM • Define HMM for signal – one HMM state for every segment we want to find • Perform a Viterbi search of an optimal path using probabilities from previous step • Determine segment boundaries as a moments of HMM states change S3 S4 S2 S5 S1 HMM S6 S0
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Neural Network • Neural network for probabilities generation : grounds • Neural networks can model probabilities distribution with a high accuracy due to their ability to approximate a large variety of functions • If training neural network doesn’t stop in local minimum • the outputs can be considered as classes probabilities
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Mutilayer Perceptron • Neural network for probabilities generation : structure • Fully connected mutilayer perceptron • Input layer size equals to feature vector size • Output layer size equals to probability of phonemes • Number and sizes of hidden layers varies • Tangent activation for hidden neurons • Softmax activation for output neurons
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Results Music Entropy histogram
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Results - Russian Speech
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Results - Foreign
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Results - Russian and Foreign Blue is Russian, pink is French
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Results Russian speaker (blue) and Music (pink) Two Russian speakers (blue and brown) and Music (others)
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Results Pure Russian & “Czech” Russian There some difference even between native speech and Russian with Czech accent
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Results Entropy histograms of “normal” (brown) and “rough” (blue) French speech
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Results Entropy histograms for “normal” (brown), “rough” (blue) and “lips” (lips) French speech
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESISISCA Tutorial and Research WorkshopInternational Speech Communication Association Conclusion • Further research • Parameter vectors, their size, number of context frames • Specialized HMM structures for a certain type of speech signals • Conclusion • Entropy and Dynamism features, as experiments show, can be successfully used for automatic signal segmentation. Further research in this area can lead to better practical results.