200 likes | 307 Views
Modelling Personality Features by Changing Prosody in Synthetic Speech. Jürgen Trouvain 1,2 , Sarah Schmidt 3 , Marc Schröder 4 , Michael Schmitz 3 & Bill Barry 2 1 Phonetik-Büro Trouvain, Saarbrücken 2 Institute of Phonetics, Saarland University
E N D
Modelling Personality Features by Changing Prosody in Synthetic Speech Jürgen Trouvain1,2, Sarah Schmidt3, Marc Schröder4, Michael Schmitz3 & Bill Barry2 1Phonetik-Büro Trouvain, Saarbrücken 2Institute of Phonetics, Saarland University 3Institute of Computer Science, Saarland University 4DFKI GmbH, Saarbrücken
Dimensions of human personality Five factor model:
Features of personality in synthetic speech • Nass & Lee (2001) • "introverted~extroverted" (among others) • manipulated parameters in synthetic speech: • F0 range • F0 mean • tempo • listeners perceive degree of introversion as predicted
Dimensions of brand personality Aaker (1997)
Prosody of brand personality findings of possible correlates in literature
Synthetic speech • MARY speech synthesis mary.dfki.de • two voices • male voice (Mbrola de6) • female voice (Mbrola de7) • one utterance • "Hallo, ich bin Produkt XY. Ich möchte mich kurz vorstellen. Ich werde nun meine Eigenschaften erläutern."
Parametrisation of prosody * default rather slow
Listening test Schmidt (2005) • judging on scale from 1 (does not fit at all) to 5 (fits very well) • 36 native speakers of German • online test
Judgements female voice ** 1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
Judgements female voice ** ** = p < 0.01; * = p < 0.05; (*) = p <0.06 1 = "does not fit at all" – 5 = "fits very well"
Judgements female voice ** ** = p < 0.01; * = p < 0.05; (*) = p <0.06 1 = "does not fit at all" – 5 = "fits very well"
Judgements female voice ** * ** ** = p < 0.01; * = p < 0.05; (*) = p <0.06 1 = "does not fit at all" – 5 = "fits very well"
Judgments male voice ** (*) ** * 1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
Judgements male voice ** (*) ** * 1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
Judgements male voice ** (*) ** * 1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
Judgements male voice (*) ** (*) * ** * ** 1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
Summary • tendency for statistically significant differences • between baseline and models • between baseline and best versions • different preferences for different voices • "excited" 3.4 (male) vs. 4.1 (female) • "rugged" 4.1 (male) vs. 3.3 (female) • improved default settings for synthesis • male: "sophisticated" model • female: "sincere" model
Conclusions • modelling personality in synthethis possible • more research needed, eg. wrt "excited" (also important for emotional synthesis) • parametrical synthesis vs. unit-selection • applications: • talking objects • speech prostheses for voice-handicapped • tuning of a synthetic corporate voice
Outlook www.icphs2007.de