150 likes | 316 Views
IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of Circuit Theory , CTU – FEE in Prague Laboratory of Artificial Neural Network Applications tuckova @ fel.cvut.cz
E N D
IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland Jana Tuckova & Martin Sramka Department ofCircuitTheory, CTU – FEE in Prague LaboratoryofArtificialNeural Network Applications tuckova@fel.cvut.cz http://amber.feld.cvut.cz/user/tuckova . EmotionalSpeechAnalysisusing ArtificialNeuralNetworks 1/14
IMCSIT-AAIA Wisla, Poland Overview • Introduction • Method • - The patterns based on time and • frequencycharacteristics • - The patterns based on musical theory • - Combination of both previous approaches • Experiments and Results • Conclusionandfuturework Acknowledgment: This work was supported by the Czech Science Foundation 102/09/0989grant. 2/14
Wisla, Poland IMCSIT-AAIA Introduction • Our aim: Aclassification of speech emotions. • Why ANN? - The robustness of the solution for real methods by ANN is a great advantage, for example, in the area of noise signal processing. - It is possible treat various input data type currently. 3/14
Wisla, Poland IMCSIT-AAIA Introduction • Which way ? • By a description of speech signals which are • formulated by: • - standard speech processing methods • - music theory • - combination of both methods • By ANN approach MLNN KSOM 4/14
Wisla, Poland IMCSIT-AAIA Introduction MLNN – with one hidden layer – the input layer is given bythe key linguistic parameters – the outputs are the various clasees of emotions – the training algorithm: Scaled Conjugate Gradient with superlinear convergence rate KSOM- SSOM 5/14
IMCSIT-AAIA Wisla, Poland Introduction KSOM- SSOM The database forANN 216 patternsfortraining 72 for validation 72 for test whichcombines aspects of the VQ method with the topologypreserving ordering of the quantization vectors. only for well-known input data for well-known classesof input data 6/14
IMCSIT-AAIA Corpus creation Wisla, Poland Database of Utterances 7/14
IMCSIT-AAIA Wisla, Poland Corpus creation The sentences was read by professional actors (2 f + 1 m) Speech recording: in a professional recording studio format “wav“ sampling frequence 44.1kHz, 24bit Recorded emotion speech was subjectively evaluated by 4 persons. The final database contained 720 patterns: 360 patterns for one-word sentences 360 patterns for multiword sentences) Emotions: 1- anger, 2- boredom, 3- pleasure 4- sadness H N R S 8/14
Wisla, Poland IMCSIT-AAIA Method: The Patterns Based on Music Theory. The method is based on the idea of the musical interval: The frequency difference between a specific n-tone and reference tone. Example: quint is frequency ratio of the fifth tone divided by the first tone = 1.498 9/14
IMCSIT-AAIA Wisla, Poland Method:The Patterns Based on Musical Theory. The reference frequency (F0) is given by the choices in each utterance feature. The frequency ratios are compared with the music intervals. fifth = f3/f2 geometric series fifth circle tone affinity – decrease from n=1 to n=7 - increase from n=8 to n=13 10/14
IMCSIT-AAIA Experimental Results Wisla, Poland H - anger R - pleasure U-matrix S - sadness N - boredom One-word sentences Multi-word sentences 11/14
Wisla, Poland IMCSIT-AAIA Conclusion – for music theory Our results - success classifications:74% (MLNN) QE / TE QE / TE 0.274 / 0.014 0.275 / 0.017 (SSOM) 1 word sentence multiword sentence Comparison to some publications: Success classifications 54-64% standard classifier 81 % ANN hight note versus 12 half tones Korea language 12/14
IMCSIT-AAIA Wisla, Poland Conclusion – future work • Our effort in future work: • ANN application in prosodymodelling: we want • to apply results from the describedexperiments • with emotional speech to the improvement of • synthetic speech naturalness • ANN application inchildren’s disordered speech • analysisdevelopmental dysphasia 13/14
Wisla, Poland IMCSIT-AAIA 0 Conclusion – future work • Thesedifferent domain of the application influence the • databasecreation. • Multiword sentences are more acceptable forprosody • modelling. • One-word sentences is suitable forthe analysis of • children’s disordered speech. • WHY? • often a speech malfunction is manifested in an • inability to pronounce whole sentences 14/14
IMCSIT-AAIA Wisla, Poland The End Thank you for your attention