1 / 79

Current Interests 2007~2008 ( Unfinished papers & Premature ideas )

Current Interests 2007~2008 ( Unfinished papers & Premature ideas ). Identifying frication & aspiration noise in the frequency domain: The case of Korean alveolar lax fricatives The role of prosody in dialect synthesis and authentication

gladys
Download Presentation

Current Interests 2007~2008 ( Unfinished papers & Premature ideas )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Current Interests 2007~2008 (Unfinished papers & Premature ideas) • Identifying frication & aspiration noise in the frequency domain: The case of Korean alveolar lax fricatives • The role of prosody in dialect synthesis and authentication • Synthesis & evaluation of prosodically exaggerated utterances • Determining the weights of prosodic components in prosody evaluation • Difference database of prosodic features for automatic prosody evaluation • Transforming Korean alveolar lax fricatives into tense • Gender transformation of utterances

  2. 1. Identifying frication & aspiration noise in the frequency domain:The case of Korean alveolar lax fricatives Kyuchul Yoon School of English Language & Literature Yeungnam University Spring 2008 Joint Conference of KSPS & KASS

  3. Korean lax alveolar fricatives • Two different types of noise

  4. Algorithm

  5. Algorithm • Change of energy distribution in the frequency domain over time • Energy distribution on a frame-by-frame basis (e.g. 5 msec) • Sums of band energy across the reference (e.g. low cutoff) frequency • criterionValue variable determines the boundary • Assumption: Same criteronValue for same speaker

  6. How Praat script works See Demo

  7. How Praat script works

  8. Experiment <Table 1> The list of words used in the experiment. The words marked with * was also used in the repeated series experiment. The numbers in parentheses represent the number of repetition during the recording.

  9. Results & Conclusion Human 1 vs. Script 1 Repeated <Histogram 1> The histogram of differences between the manually inserted and automatically inserted boundaries for the repeated series experiment. X-axis in msec.

  10. Results & Conclusion The outlier from <Histogram 1>. The difference was 6.4 msec. The m and a represents manual and automatic respectively.

  11. Results & Conclusion The same-speaker-same-criterionValue assumption holds! Human 1 vs. Script 1 Non-repeated Human 2 vs. Script 2 Non-repeated <Histogram 2> The histogram of differences between the manually inserted and automatically inserted boundaries for the non-repeated series experiment with 53 words. X-axis in msec.

  12. Results & Conclusion Human 1 vs. Human 2 Non-repeated Script 1 vs. Script 2 Non-repeated <Histogram 3> The histogram of differences between the two phoneticians and the two automated scripts for the non-repeated series experiment with 53 words. X-axis in msec.

  13. Results & Conclusion <Table 2> The summary of the means and the standard deviations of the differences from the two experiments. The numbers are given in msec.

  14. Results & Conclusion The automated identification of the boundary (labeled auto) between /s/ and /h/ in the phrase Miss Henry produced by a female native speaker of English. The f and v represent the beginnings of /s/ and the vowel following /h/.

  15. References [1] Boersma, Paul. 2001. Praat, a system for doing phonetics by computer. Glot International 5(9/10). pp.341-345. [2] Yoon, Kyuchul. 2002. A production and perception experiment of Korean alveolar fricatives. Speech Sciences. 9(3). pp.169-184. [3] Yoon, Kyuchul. 2005. Durational correlates of prosodic categories: The case of two Korean voiceless coronal fricatives. Speech Sciences. 12(1). pp.89-105.

  16. 2. The role of prosody in dialect synthesis and authentication Kyuchul Yoon School of English Language & Literature Yeungnam University Spring 2008 Joint Conference of KSPS & KASS

  17. Goals • Synthesize Masan utterances from matching Seoul utterances by prosody cloning • Examine the role of prosody in the authentication of synthetic Masan utterances (Listening experiment)

  18. Background • Differences among dialects • Segmental differences • Fricative differences in the time domain (Lee, 2002) • Busan fricatives have shorter frication/aspiration intervals than for Seoul • Fricative differences in the frequency domain (Kim et al., 2002) • The low cutoff frequency of Kyungsang fricatives was higher than for Cholla fricatives (> 1,000 Hz) • Non-segmental or prosodic differences • Intonation or fundamental frequency (F0) contour difference • Intensity contour difference • Segment durational difference • Voice quality difference

  19. Synthesis • Simulating (by prosody cloning) Masan dialect from Seoul dialect • The simulated Masan utterances will have • the speech segments of Seoul dialect • the prosody of Masan dialect • F0 contour • Intensity contour • Segmental duration

  20. Evaluation • Through a listening experiment • Stimuli consist of • #1. Authentic, but synthetic, Masan utterance • #2. Seoul utterance with Masan segmental durations (D) • #3. Seoul utterance with Masan F0 contour (F) • #4. Seoul utterance with Masan intensity contour (I) • #5. Seoul utterance with Masan durations and F0 contour (D+F) • #6. Seoul utterance with Masan durations and intensity contour (D+I) • #7. Seoul utterance with Masan F0 contour and intensity contour (F+I) • #8. Seoul utterance with Masan durations, F0 contour and intensity contour (D+F+I) (1) 동대구에 볼 일이 없습니다. (2) 바다에 보물섬이 없다 Listen to Stimuli

  21. Prosody transfer (PSOLA algorithm) • Three aspects of the prosody • Fundamental frequency (F0) contour • Intensity contour • Segmental durations • Pitch-Synchronous OverLap and Add (PSOLA) algorithm (Mouline & Charpentier, 1990) • Implemented in Praat (Boersma, 2005) • Use of a script for semi-automatic segment-by-segment manipulation (Yoon, 2007)

  22. Prosody transfer (PSOLA algorithm) • Procedures for full prosody transfer • Align segments btw/ Masan and Seoul utterances • Make the segment durations of the two identical • Make the two F0 contours identical • Make the two intensity contours identical

  23. ㅏ ㄹ ㅏ ㅁ Masan “…바람…” stretch shrink ㅏ ㅏ ㅂ ㄹ ㅁ Seoul Prosody transfer (PSOLA algorithm) Align segments btw/ Masan and Seoul utterances Make the segment durations of the two utterances identical

  24. Masan F0 ㅂ ㅏ ㄹ ㅏ ㅁ Masan ㅂ ㅏ ㄹ ㅏ ㅁ Seoul Seoul F0 Prosody transfer (PSOLA algorithm) Make the two F0 contours identical

  25. Masan intensity ㅂ ㅏ ㄹ ㅏ ㅁ Masan ㅂ ㅏ ㄹ ㅏ ㅁ Seoul Seoul intensity Prosody transfer (PSOLA algorithm) Make the two intensity contours identical

  26. Synthetic (simulated) Masan stimulus

  27. Synthetic authentic Masan stimulus

  28. Listening experiment • 16 stimuli (8 + 8) • Presented to 13 Masan/Changwon listeners • On a scale of 1 (worst) to 10 (best) • Used Praat ExperimentMFC object • Allowed repetition of stimulus: up to 10 times

  29. Listening experiment See Demo

  30. Results & Conclusion Histogram of listener responses

  31. Results & Conclusion F0 contour transfer 1 … listener responses … 10

  32. Results & Conclusion Masan FI F DFI DF D DI I Seoul utterances with Masan prosody

  33. Results & Conclusion • Main effects of • Segmental durations; F(1,12)=11.53, p=0.005 • F0 contour; F(1,12)=141.12, p=0.00000005 • Regression analysis

  34. Results & Conclusion • Prosody cloning not sufficient for dialect simulation • (Sub)Segmental differences may be at work • Quality of synthetic stimuli • F0 contour transfer (from Masan to Seoul) • Most influential on shifting perception fromSeoul to Masan utterances

  35. References [1] Kyung-Hee Lee, “Comparison of acoustic characteristics between Seoul and Busan dialect on fricatives”, Speech Sciences, Vol.9/3, pp.223-235, 2002. [2] Hyun-Gi Kim, Eun-Young Lee, and Ki-Hwan Hong, “Experimental phonetic study of Kyungsang and Cholla dialect using power spectrum and laryngealfiberscope”, Speech Sciences, Vol.9/2, pp.25-47, 2002. [3] Kyuchul Yoon, “Imposing native speakers’ prosody on non-native speakers’ utterances: The technique of cloning prosody”, Journal of the Modern British & American Language & Literature, Vol.25(4). pp.197-215, 2007. [4] E. Moulines and F. Charpentier, “Pitch synchronouswaveform processing techniquesfor text-to-speech synthesis using diphones”, Speech Communication, 9 5-6, 1990. [5] P. Boersma, “Praat, a system for doing phonetics by computer”, Glot International,Vol.5, 9/10, pp.341-345, 2005.

  36. 3. Synthesis & evaluation of prosodically exaggerated utterances:A preliminary study Kyuchul Yoon School of English Language & Literature Yeungnam University Spring 2008 Joint Conference of KSPS & KASS

  37. Contents • Synthesis & evaluation of human utterances with exaggerated prosody • Synthesis of exaggerated prosody • Useful for presenting native utterances to students • The definition of prosody “exaggeration” • The algorithm • Evaluation of exaggerated prosody • Useful for evaluating learner utterances • The algorithm & an experiment

  38. Teaching & evaluating prosody • Teaching language prosody • The need for “exaggeration” of native utterances • How to define “exaggeration” • Evaluating language prosody • Given the native version of an utterance, evaluate learner’s atypical prosody • How to measure the differences btw/ the native and learner utterances

  39. Exaggerating native prosody • Exaggeration of the F0 contour • One way would be to make the pitch peaks/valleys higher/lower • Exaggeration of the intensity contour • One way would be to manipulate the intensity contour of the pitch peaks(or valleys) • Exaggeration of the segmental durations • One way would be to manipulate the segmental durations of the pitch peaks(or valleys) See Demo

  40. Exaggerating native prosody F0 The fundamental frequency (F0) contour of an utterance Marianna!.

  41. Exaggerating native prosody Intensity The intensity contour of an utterance Marianna!.

  42. Exaggerating native prosody Duration The segmental durations of an utterance Marianna! before and after the exaggeration.

  43. Algorithm: prosody exaggeration • Definition of prosody exaggeration • F0 contour • Make pitch peaks/valleys higher/lower in Hz values • Intensity contour • Make pitch peaks higher in dB values • Segmental durations • Make pitch peaks longer in times values

  44. Algorithm: prosody exaggeration F0

  45. Algorithm: prosody exaggeration Intensity

  46. Algorithm: prosody exaggeration Durations

  47. How Praat script works

  48. How Praat script works F0 Intensity Durations

  49. How Praat script works Original F0 Durations F0 Durations Intensity

  50. Evaluating learner prosody • Assumes the existence of the native version • Evaluates the learner versions • Evaluation of the F0 & intensity contours • Is preceded by duration manipulation: • The durations of the matching segments of the two utterances are made identical [3] • Is preceded by F0/intensity normalization & F0 smoothing • The mean difference is added/subtracted to/from learner utterance • Is followed by pitch/intensity point-to-point comparison • Evaluation of segmental durations • Done without any duration manipulation. Segment-to-segment comparison • Evaluation measure: Euclidean distance metric

More Related