260 likes | 275 Views
This preliminary study explores synthesizing and evaluating human utterances with exaggerated prosody, beneficial for native and learner language understanding. Learn how to define, synthesize, and assess language prosody through algorithms and experiments. Discover the importance of exaggerating native prosody elements like F0 contour, intensity, and segmental durations. Explore the Praat script operations and evaluation metrics for learner prosody, focusing on F0, intensity, and duration comparisons. Gain insights into the limitations and potential applications of prosody exaggeration and evaluation.
E N D
Synthesis & evaluation of prosodically exaggerated utterances:A preliminary study Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS & KASS
Contents • Synthesis & evaluation of human utterances with exaggerated prosody • Synthesis of exaggerated prosody • Useful for native utterances • The definition of prosody “exaggeration” • The algorithm • Evaluation of exaggerated prosody • Useful for evaluating learner utterances • The algorithm & an experiment
Teaching & evaluating prosody • Teaching language prosody • The need for “exaggeration” of native utterances • How to define “exaggeration” • Evaluating language prosody • Given the native version of an utterance, evaluate learner’s utterances w/ atypical prosody • How to measure the differences btw/ the native and learner utterances
Exaggerating native prosody • Exaggeration of the F0 contour • One way would be to make the pitch peaks/valleys higher/lower • Exaggeration of the intensity contour • One way would be to manipulate the intensity contour of the pitch peaks/valleys • Exaggeration of the segmental durations • One way would be to manipulate the segmental durations of the pitch peaks/valleys
Exaggerating native prosody F0 The fundamental frequency (F0) contour of an utterance Marianna!.
Exaggerating native prosody Intensity The intensity contour of an utterance Marianna!.
Exaggerating native prosody Duration The segmental durations of an utterance Marianna! before and after the exaggeration.
Algorithm: prosody exaggeration • Definition of prosody exaggeration • F0 contour • Make pitch peaks/valleys higher/lower in Hz values • Intensity contour • Make pitch peaks higher in dB values • Segmental durations • Make pitch peaks longer in times values
Algorithm: prosody exaggeration Intensity
Algorithm: prosody exaggeration Durations
How Praat script works F0 Intensity Durations
How Praat script works Original F0 Durations F0 Durations Intensity
Evaluating learner prosody • Assumes the existence of the native version • Evaluates the learner versions • Evaluation of the F0 & intensity contours • Is preceded by duration manipulation: • The durations of the matching segments of the two utterances are made identical [3] • Is preceded by F0/intensity normalization & F0 smoothing • The mean difference is added/subtracted to/from learner utterance • Is followed by pitch/intensity point-to-point comparison • Evaluation of segmental durations • Done without any duration manipulation. Segment-to-segment comparison • Evaluation measure: Euclidean distance metric
Algorithm: prosody evaluation Before & after duration manipulation native learner before learner after
Algorithm: prosody evaluation F0 point-to-point comparison btw/ native and learner native learner after
Algorithm: prosody evaluation Intensity point-to-point comparison btw/ native and learner native learner after
Algorithm: prosody evaluation Duration segment-to-segment comparison btw/ native and learner native learner before Euclidean distance metric for evaluation measure P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-space
A pilot experiment native learner after Euclidean distance should be minimum
A pilot experiment native learner after F0: -100Hz to +100Hz with a 10Hz interval 21 stimuli Intensity: -25dB to +25dB with a 5dB interval 11 stimuli Duration: 0.25, 0.50, 0.75, 1.00, 1.50, 2.00, 2.50, 3.00 times the original 8 stimuli
Results & Conclusion • Prosody exaggeration • Can be a tool for teaching language prosody • Can be used to test measures for evaluating prosody • Limitation of the current prosody evaluation • Native utterances should exist to yield measures • TTS systems with advanced prosody models could be helpful • “Weights” of the three separate measures (F0/intensity/duration) need to be determined • Experiments with human evaluators could provide the weights
References [1] Boersma, Paul. 2001. Praat, a system for doing phonetics by computer. Glot International 5(9/10). pp.341-345. [2] Moulines, E. & F. Charpentier. 1990. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication 9. pp.453-467. [3] Yoon, K. 2007. Imposing native speakers' prosody on non-native speakers' utterances: The technique of cloning prosody. Journal of the Modern British & American Language & Literature 25(4). pp.197-215.