1 / 31

Building a sentential model for automatic prosody evaluation

This study aims to develop a sentential model for automatic evaluation of prosody in English pronunciation, focusing on the suprasegmental level. The model uses multivariate statistical analysis to compare a target utterance with native speakers' utterances, considering three prosodic aspects: F0, intensity, and durations. Results show a valid separation of groups with different manual scores, indicating the effectiveness of the model.

Download Presentation

Building a sentential model for automatic prosody evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building a sentential modelforautomatic prosody evaluation Part A Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea University

  2. Introduction English pronunciation evaluation • English pronunciation proficiency evaluation • Ultimate goals • Evaluation at • The segmental level • The suprasegmental level • Current goals • Evaluation at • The suprasegmental level

  3. Introduction English pronunciation evaluation • The goal of present study • Prosody evaluation of a single target utterance • Produced by a Korean student • Given • An English target sentence • A sentential model for prosody evaluation

  4. Introduction Manual vs. automatic • Problems of manual evaluation • What to evaluate • How to evaluate • Consistency • Problems of automatic evaluation • How to reflect human knowledge

  5. Introduction Manual vs. automatic • A possible solution? • Avoid knowledge-based abstraction • Compare a target utterance with native speakers’ utterances • Use multiple utterances for comparison • Multiple “good” utterances from native speakers • Adopt raw values • Calculate difference values between the target and the “good” utterances in terms of • The three prosodic aspects : F0, intensity, durations  3D coordinates

  6. Introduction How to build the model • Use multivariate statistical analysis • A discriminant analysis • The components of the model (The segmental proficiency scores controlled) • The manual prosody evaluation scores (response) • The automatic prosody evaluation scores (factors) • The requirements of the model • The correlation between the two levelsManual scores vs. Automatic scores

  7. Introduction How to build the model • The manual prosody scores (an ideal case) • The “good” utterance versions (point 5)by many native speakers of English • The utterance versions by Korean students whose prosodic proficiencies are • High (point 5) • Intermediate (point 3) • Low (point 1) • On a scale of 1 (worst) to 5 (best)

  8. Introduction How to build the model • The automatic prosody scores • Use of Praat scripts • Comparison between a single target utterance & multiple native speakers’ utterances to yield scores for • The F0 difference • The intensity difference • The duration difference in the form of 3D coordinates (x, y, z) = (F0, Int, Dur) • One utterance yields as many coordinates as the number of “good” native speakers

  9. Introduction How to build the model • Evaluation by comparisons

  10. Introduction A 3D sentential modelfor prosody evaluation • A 3D model • 3D axes: F0, intensity, durations (F0, Int, Dur) coordinates= (x, y, z) • Automatic scores as scatterplot points • Manually evaluated scores group the points

  11. Introduction A 3D sentential modelfor prosody evaluatioin • Validity of the model • Sufficient separation of groups with different manual scores • colors : manual scores • arrowheads : automatic scores

  12. Methods Sentential prosody evaluation [7] Before & after duration manipulation native learner before learner after

  13. Methods Sentential prosody evaluation [7] F0 : point-to-point comparison btw/ native and learner after normalization native learner after Automatic score (F0, Int, Dur) (x, y, z)

  14. Methods Sentential prosody evaluation [7] Intensity : point-to-point comparison btw/ native and learner after normalization native learner after Automatic score (F0,Int, Dur) (x, y, z)

  15. Methods Sentential prosody evaluation [7] Duration : segment-to-segment comparison btw/ native and learner native learner before Automatic score (F0, Int, Dur) (x, y, z) Euclidean distance metric for evaluation measure P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-dimensional space

  16. Methods Manual evaluation of sentential prosody Manual scores for Set B utterances “The dancing queen likes only the apple pies”

  17. Methods Sentential prosody evaluation [7] A sample score array for one utterance from group K5:one learner utterance vs. 10 model native utterances Automatic prosody score for K5.U1 = {(899,142,408), (360,92,190), …(716,178,183)}

  18. Results A prosody evaluation modelby a Korean phonetician Korean phonetician’s Model

  19. Results A prosody evaluation modelby a Korean phonetician Korean phonetician’s Model

  20. Results A sample prosody evaluationwith a discriminant analysis

  21. Discussion To make this fully automatic • For manual evaluation of the training model • The number of Korean learners • The more the better • The levels of English proficiency • The diverse the better (scores 1 through 5) • For automatic evaluation of the trainees • Need automatic segmentation (ASR) • Need to deal with redundant/missing segments

  22. Building a sentential modelfor automatic evaluation of pronunciation proficiency Part B What about segmental evaluation?

  23. Methods Segmental evaluation byspectral comparison • Sex/age controlled (no normalization was used) • Adult male (native/Korean) speakers were selected • Spectral comparison • Three equally-spaced spectral slices were used for each matching segments • Euclidean distance measure was used from a pair of matching spectral envelopes • Four coordinates for pronunciation proficiency evaluation • Segments, F0, intensity, durations • (w, x, y, z) becomes one of the score array

  24. Methods Manual evaluation of overall proficiency Manual scores for Set C utterances “Put your toys away right now” <Table 4> The overall scores of the 34 utterances for Set C sentence “Put your toys away right now”. The manual evaluation was performed by a Korean phonetician. Note that the subjects were all male adults.

  25. Results A pronunciation proficiency evaluation modelby a Korean phonetician Korean phonetician’s Models (Intensity axis not shown)

  26. Results A prosody evaluation modelby a Korean phonetician Korean phonetician’s Model

  27. Results A discriminant analysis <Table 5> The classification table from the discriminant analysis of one test data. The number in each cell represents the probability of the automatic pronunciation Proficiency score being classified into the predicted group. <Table 6> The confusion matrix for the classification table.

  28. Results Discriminant analyseswith leave-one-out cross-validation Testing for score 4 : 6 out of 9 correct Testing for score 2 : 12 out of 15 correct

  29. Results Discriminant analyseswith leave-one-out cross-validation • For N4 & K2 groups, evaluation models were built by using • The discriminant analysis with • Leave-one-out cross-validation • The number of models (built by discriminant analyses) was 24 • Group N4 : 9 subjects • Group K2 : 15 subjects • Success rate • Group N4 : 6 out of 9 predicted correct • Group K2 : 12 out of 15 predicted correct

  30. Discussion Automatic evaluationof pronunciation proficiency • Viability of sentential models for the evaluation of • Segmental proficiency : spectral comparison • Prosodic proficiency : F0/intensity/durations in the form of multiple score array coordinates (segments, F0, intensity, durations) = (w, x, y, z) • Comparison seems to work • A target utterance vs. multiple model native utterances • Better models can be built with • More (controlled) utterances • More score resolution • Current : score 2 (bad) – score 4 (good) • Future : score 1 (worst) – score 3 (fair) – score 5 (best)

  31. References [1] Boersma, Paul, “Praat, a system for doing phonetics by computer”, Glot International 5(9/10), pp.341-345, 2001. [2] Mahalanobis, P.C., “On the generalized distance in statistics”, Proceedings of the National Institute of Science of India 12, pp.49-55, 1936. [3] Moulines, E. & F. Charpentier, “Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones”, Speech Communication 9, pp.453-467, 1990. [4] Ramus, F., M. Nespor, J. Mehler, “Correlates of linguistic rhythm in the speech signal”, Cognition 73, pp. 265-292, 1999. [5] Rhee, S., S. Lee, Y. Lee & S. Kang, “Design and construction of Korean-Spoken English Corpus (K-SEC)”, Malsori 46, pp.159-174, 2003. [6] Yoon, K, “Imposing native speakers' prosody on non-native speakers' utterances: The technique of cloning prosody”, Journal of the Modern British & American Language & Literature 25(4), pp.197-215, 2007. [7] Yoon, K. 2008. Synthesis and evaluation of prosodically exaggerated utterances. Unpublished manuscript

More Related