1 / 15

Emotional Speech Analysis using Artificial Neural Networks

IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of Circuit Theory , CTU – FEE in Prague Laboratory of Artificial Neural Network Applications tuckova @ fel.cvut.cz

biana
Download Presentation

Emotional Speech Analysis using Artificial Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland Jana Tuckova & Martin Sramka Department ofCircuitTheory, CTU – FEE in Prague LaboratoryofArtificialNeural Network Applications tuckova@fel.cvut.cz http://amber.feld.cvut.cz/user/tuckova . EmotionalSpeechAnalysisusing ArtificialNeuralNetworks 1/14

  2. IMCSIT-AAIA Wisla, Poland Overview • Introduction • Method • - The patterns based on time and • frequencycharacteristics • - The patterns based on musical theory • - Combination of both previous approaches • Experiments and Results • Conclusionandfuturework Acknowledgment: This work was supported by the Czech Science Foundation 102/09/0989grant. 2/14

  3. Wisla, Poland IMCSIT-AAIA Introduction • Our aim: Aclassification of speech emotions. • Why ANN? - The robustness of the solution for real methods by ANN is a great advantage, for example, in the area of noise signal processing. - It is possible treat various input data type currently. 3/14

  4. Wisla, Poland IMCSIT-AAIA Introduction • Which way ? • By a description of speech signals which are • formulated by: • - standard speech processing methods • - music theory • - combination of both methods • By ANN approach MLNN KSOM 4/14

  5. Wisla, Poland IMCSIT-AAIA Introduction MLNN – with one hidden layer – the input layer is given bythe key linguistic parameters – the outputs are the various clasees of emotions – the training algorithm: Scaled Conjugate Gradient with superlinear convergence rate KSOM- SSOM 5/14

  6. IMCSIT-AAIA Wisla, Poland Introduction KSOM- SSOM The database forANN 216 patternsfortraining 72 for validation 72 for test whichcombines aspects of the VQ method with the topologypreserving ordering of the quantization vectors. only for well-known input data for well-known classesof input data 6/14

  7. IMCSIT-AAIA Corpus creation Wisla, Poland Database of Utterances 7/14

  8. IMCSIT-AAIA Wisla, Poland Corpus creation The sentences was read by professional actors (2 f + 1 m) Speech recording: in a professional recording studio format “wav“ sampling frequence 44.1kHz, 24bit Recorded emotion speech was subjectively evaluated by 4 persons. The final database contained 720 patterns: 360 patterns for one-word sentences 360 patterns for multiword sentences) Emotions: 1- anger, 2- boredom, 3- pleasure 4- sadness H N R S 8/14

  9. Wisla, Poland IMCSIT-AAIA Method: The Patterns Based on Music Theory. The method is based on the idea of the musical interval: The frequency difference between a specific n-tone and reference tone. Example: quint is frequency ratio of the fifth tone divided by the first tone = 1.498 9/14

  10. IMCSIT-AAIA Wisla, Poland Method:The Patterns Based on Musical Theory. The reference frequency (F0) is given by the choices in each utterance feature. The frequency ratios are compared with the music intervals. fifth = f3/f2 geometric series fifth circle tone affinity – decrease from n=1 to n=7 - increase from n=8 to n=13 10/14

  11. IMCSIT-AAIA Experimental Results Wisla, Poland H - anger R - pleasure U-matrix S - sadness N - boredom One-word sentences Multi-word sentences 11/14

  12. Wisla, Poland IMCSIT-AAIA Conclusion – for music theory Our results - success classifications:74% (MLNN) QE / TE QE / TE 0.274 / 0.014 0.275 / 0.017 (SSOM) 1 word sentence multiword sentence Comparison to some publications: Success classifications 54-64% standard classifier 81 % ANN hight note versus 12 half tones Korea language 12/14

  13. IMCSIT-AAIA Wisla, Poland Conclusion – future work • Our effort in future work: • ANN application in prosodymodelling: we want • to apply results from the describedexperiments • with emotional speech to the improvement of • synthetic speech naturalness • ANN application inchildren’s disordered speech • analysisdevelopmental dysphasia 13/14

  14. Wisla, Poland IMCSIT-AAIA 0 Conclusion – future work • Thesedifferent domain of the application influence the • databasecreation. • Multiword sentences are more acceptable forprosody • modelling. • One-word sentences is suitable forthe analysis of • children’s disordered speech. • WHY? • often a speech malfunction is manifested in an • inability to pronounce whole sentences 14/14

  15. IMCSIT-AAIA Wisla, Poland The End Thank you for your attention

More Related