250 likes | 435 Views
Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization. Mohamad Hasan Bahari Hugo Van hamme. Outline. Introduction and Motivations Age and Gender Recognition Corpora Supervised Non-negative Matrix Factorization Proposed Method Results
E N D
Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme July 2011
Outline • Introduction and Motivations • Age and Gender Recognition • Corpora • Supervised Non-negative Matrix Factorization • Proposed Method • Results • Conclusions and Future Researches
Introduction • Confirming the identity of individuals • Biometric Characteristics • Fingerprint • Face • Iris • Hand Geometry • Ear Shape • Voice pattern • … • Choosing a characteristic • Availability • Reliability
Motivation • In many real world cases, only speech patterns are available (kidnapping, threatening calls, …) • Speech patterns can include many interesting information • Gender • Age • Dialect (original or previous regions) • Membership of a particular social group • … • To facilitates in identifying a criminal • To narrow down the number of suspects
Goal Goal: To extract different physicaland psychological characteristics of the speaker from his/her voice patterns (Speaker Profiling). • Physical: • Gender • Age • Accent • … • Psychological: • Anxiousness • Stress • Confidence • …
Age and Gender Recognition Three approaches: Directly from speech signal. Modeling the speech generation system. Modeling the hearing system.
Age and Gender Recognition • Directly from speech signal. • Different acoustic features vary with age. • Fundamental frequency • Speech rate • Sound pressure level • … • By Finding all acoustic features varying with age and their exact relation to the speaker age. • Conceptually simple and computationally inexpensive • These features are affected by many other parameters, such as weight, height, voice quality, emotional condition, …
Age and Gender Recognition Effect of Age and Gender on speech (Fundamental frequency) [1] • Age is only one of inputs affecting the speech and consequently acoustic features. • It is impossible to estimate the age without considering the rest of inputs • Perceptions of gender and age have a significant mutual impact on each other. [1] W. S. Brown, R. J. Morris, H. Hollien, and E. Howell, Journal of Voice, vol. 5, pp. 310–315, 1991.
Age and Gender Recognition • Modeling the speech generation system. • It is an input estimation problem. • Modeling the speech generation system of the speaker is very difficult.
Age and Gender Recognition • Modeling the hearing system • To solve the speech recognition problem, the hearing system is modeled using Hidden Markove Models (HMMs). • Using the tools applied in speech recognition problems (HMMs) . • Well established. • Accurate in recognizing content. • There exist a difference between the age of a speaker as perceived, and their actual age. • Computationally complex
Corpora • 555 speakers from the N-best evaluation corpus [1] • The corpus contains live and read commentaries, news, interviews, and reports broadcast in Belgium • Different age groups and genders [1] D. A. Van Leeuwen, J. Kessens, E. Sanders, and H. van den Heuvel, In proc. Interspeech, pp. 2571-2574, 2009.
SNMF • Non-negative Matrix Factorization (NMF) is a popular machine learning algorithm [1] • It is used in supervised or unsupervised modes. • Supervised NMF or SNMF is a pattern recognition method [1] • It is very effective in the case of high dimension input space. • It is a generative classifier. • It can directly classify patterns into multiple classes (no need to change the problem into multiple binary classification). [1] H. Van hamme, In proc. Interspeech, Australia, pp. 2554-2557, 2008.
SNMF Problem Statement: Given a training data-set: Str= {(x1, y1), . . ., (xn, yn), . . . , (xN, yN)} xn is a vector of observed characteristics for the data item yn denotes a label vector which represents the class that xn belongs to Goal: Approximation of a classifier function (g), such that ŷ=g(xtst) is as close as possible to the true label. xtst is an unseen observation
SNMF SNMF in Training Phase: First step: Second step: Extended Kullbeck-Leibler divergence: Multiplicative updating formula:
SNMF SNMF in Testing Phase: First step: Second step: Extended Kullbeck-Leibler divergence: Multiplicative updating formula:
Proposed Method • Feature selection • Acoustic modeling • Supervector making procedure • Training phase • Testing phase
Proposed Method • Feature selection • MEL Spectra • Mean normalization • vocal tract length normalization • Augmented with their first and second order time derivatives. Speech Signal Feature Vectors Feature selection ….
Proposed Method • Acoustic modeling Speaker independent Model: • An HMM with a shared pool of 49740 Gaussians to model the observations in 3873 cross-word context-dependent tied triphone states. Adaptation Method: • The speaker dependent mixture weights for each speaker result from a re-estimation of the speaker independent weights based on a forced alignment of the training data for that speaker using a speaker-independent acoustic model. The result of this step is 555 speaker adapted models Speaker Adaptation Method Speaker Independent Model Model of the Speaker
Proposed Method • Supervector making procedure Gaussian Mixture Model (GMM) of each speaker adapted HMMs is: Three type of supervectors: • Means • Variances • Weights Weights supervectors: The result of this step is 555 supervectors for each of 555 speakers
Proposed Method • Training phase • Testing phase
Results Evaluation Methodology • 5-fold cross-validation (five independent run) • In each of five run: • Training set is speech data of 444 speakers • Testing set is speech data of 111 speakers Database Run 1 Database Run 2 . . .
Results Gender recognition is 96%. relative confusion matrix Age group recognition
Conclusions and Future Researches Conclusions: • A new age-gender recognition method based on SNMF • Supervectors of GMM weights were used • Evaluated on N-Best Corpus • Gender recognition accuracy is 96% • Age group recognition accuracy is significantly higher than chance level Future Researches: • Age estimation instead of age group recognition. • Using supervectors of GMM means and variances and combining these features