200 likes | 336 Views
The role of prosody in dialect synthesis and authentication. Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS & KASS. Goals. Synthesize Masan utterances from matching Seoul utterances by prosody cloning
E N D
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS & KASS
Goals • Synthesize Masan utterances from matching Seoul utterances by prosody cloning • Examine the role of prosody in the authentication of synthetic Masan utterances (Listening experiment)
Background • Differences among dialects • Segmental differences • Fricative differences in the time domain (Lee, 2002) • Busan fricatives have shorter frication/aspiration intervals than for Seoul • Fricative differences in the frequency domain (Kim et al., 2002) • The low cutoff frequency of Kyungsang fricatives was higher than for Cholla fricatives (> 1,000 Hz) • Non-segmental or prosodic differences • Intonation or fundamental frequency (F0) contour difference • Intensity contour difference • Segment durational difference • Voice quality difference
Synthesis • Simulating (by prosody cloning) Masan dialect from Seoul dialect • The simulated Masan utterances will have • the speech segments of Seoul dialect • the prosody of Masan dialect • F0 contour • Intensity contour • Segmental duration
Evaluation • Through a listening experiment • Stimuli consist of • #1. Authentic, but synthetic, Masan utterance • #2. Seoul utterance with Masan segmental durations (D) • #3. Seoul utterance with Masan F0 contour (F) • #4. Seoul utterance with Masan intensity contour (I) • #5. Seoul utterance with Masan durations and F0 contour (D+F) • #6. Seoul utterance with Masan durations and intensity contour (D+I) • #7. Seoul utterance with Masan F0 contour and intensity contour (F+I) • #8. Seoul utterance with Masan durations, F0 contour and intensity contour (D+F+I) (1) 동대구에 볼 일이 없습니다. (2) 바다에 보물섬이 없다
Prosody transfer (PSOLA algorithm) • Three aspects of the prosody • Fundamental frequency (F0) contour • Intensity contour • Segmental durations • Pitch-Synchronous OverLap and Add (PSOLA) algorithm (Mouline & Charpentier, 1990) • Implemented in Praat (Boersma, 2005) • Use of a script for semi-automatic segment-by-segment manipulation (Yoon, 2007)
Prosody transfer (PSOLA algorithm) • Procedures for full prosody transfer • Align segments btw/ Masan and Seoul utterances • Make the segment durations of the two identical • Make the two F0 contours identical • Make the two intensity contours identical
ㅂ ㅏ ㄹ ㅏ ㅁ Masan “…바람…” stretch shrink ㅏ ㅏ ㅂ ㄹ ㅁ Seoul Prosody transfer (PSOLA algorithm) Align segments btw/ Masan and Seoul utterances Make the segment durations of the two utterances identical
Masan F0 ㅂ ㅏ ㄹ ㅏ ㅁ Masan ㅂ ㅏ ㄹ ㅏ ㅁ Seoul Seoul F0 Prosody transfer (PSOLA algorithm) Make the two F0 contours identical
Masan intensity ㅂ ㅏ ㄹ ㅏ ㅁ Masan ㅂ ㅏ ㄹ ㅏ ㅁ Seoul Seoul intensity Prosody transfer (PSOLA algorithm) Make the two intensity contours identical
Listening experiment • 16 stimuli (8 + 8) • Presented to 13 Masan/Changwon listeners • On a scale of 1 (worst) to 10 (best) • Used Praat ExperimentMFC object • Allowed repetition of stimulus: up to 10 times
Results & Conclusion Histogram of listener responses
Results & Conclusion F0 contour transfer 1 … listener responses … 10
Results & Conclusion Masan FI F DFI DF D DI I Seoul utterances with Masan prosody
Results & Conclusion • Main effects of • Segmental durations; F(1,12)=11.53, p=0.005 • F0 contour; F(1,12)=141.12, p=0.00000005 • Regression analysis
Results & Conclusion • Prosody cloning not sufficient for dialect simulation • (Sub)Segmental differences may be at work • Quality of synthetic stimuli • F0 contour transfer (from Masan to Seoul) • Most influential on shifting perception fromSeoul to Masan utterances
References [1] Kyung-Hee Lee, “Comparison of acoustic characteristics between Seoul and Busan dialect on fricatives”, Speech Sciences, Vol.9/3, pp.223-235, 2002. [2] Hyun-Gi Kim, Eun-Young Lee, and Ki-Hwan Hong, “Experimental phonetic study of Kyungsang and Cholla dialect using power spectrum and laryngealfiberscope”, Speech Sciences, Vol.9/2, pp.25-47, 2002. [3] Kyuchul Yoon, “Imposing native speakers’ prosody on non-native speakers’ utterances: The technique of cloning prosody”, Journal of the Modern British & American Language & Literature, Vol.25(4). pp.197-215, 2007. [4] E. Moulines and F. Charpentier, “Pitch synchronouswaveform processing techniquesfor text-to-speech synthesis using diphones”, Speech Communication, 9 5-6, 1990. [5] P. Boersma, “Praat, a system for doing phonetics by computer”, Glot International,Vol.5, 9/10, pp.341-345, 2005.