310 likes | 316 Views
This study aims to develop a sentential model for automatic evaluation of prosody in English pronunciation, focusing on the suprasegmental level. The model uses multivariate statistical analysis to compare a target utterance with native speakers' utterances, considering three prosodic aspects: F0, intensity, and durations. Results show a valid separation of groups with different manual scores, indicating the effectiveness of the model.
E N D
Building a sentential modelforautomatic prosody evaluation Part A Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea University
Introduction English pronunciation evaluation • English pronunciation proficiency evaluation • Ultimate goals • Evaluation at • The segmental level • The suprasegmental level • Current goals • Evaluation at • The suprasegmental level
Introduction English pronunciation evaluation • The goal of present study • Prosody evaluation of a single target utterance • Produced by a Korean student • Given • An English target sentence • A sentential model for prosody evaluation
Introduction Manual vs. automatic • Problems of manual evaluation • What to evaluate • How to evaluate • Consistency • Problems of automatic evaluation • How to reflect human knowledge
Introduction Manual vs. automatic • A possible solution? • Avoid knowledge-based abstraction • Compare a target utterance with native speakers’ utterances • Use multiple utterances for comparison • Multiple “good” utterances from native speakers • Adopt raw values • Calculate difference values between the target and the “good” utterances in terms of • The three prosodic aspects : F0, intensity, durations 3D coordinates
Introduction How to build the model • Use multivariate statistical analysis • A discriminant analysis • The components of the model (The segmental proficiency scores controlled) • The manual prosody evaluation scores (response) • The automatic prosody evaluation scores (factors) • The requirements of the model • The correlation between the two levelsManual scores vs. Automatic scores
Introduction How to build the model • The manual prosody scores (an ideal case) • The “good” utterance versions (point 5)by many native speakers of English • The utterance versions by Korean students whose prosodic proficiencies are • High (point 5) • Intermediate (point 3) • Low (point 1) • On a scale of 1 (worst) to 5 (best)
Introduction How to build the model • The automatic prosody scores • Use of Praat scripts • Comparison between a single target utterance & multiple native speakers’ utterances to yield scores for • The F0 difference • The intensity difference • The duration difference in the form of 3D coordinates (x, y, z) = (F0, Int, Dur) • One utterance yields as many coordinates as the number of “good” native speakers
Introduction How to build the model • Evaluation by comparisons
Introduction A 3D sentential modelfor prosody evaluation • A 3D model • 3D axes: F0, intensity, durations (F0, Int, Dur) coordinates= (x, y, z) • Automatic scores as scatterplot points • Manually evaluated scores group the points
Introduction A 3D sentential modelfor prosody evaluatioin • Validity of the model • Sufficient separation of groups with different manual scores • colors : manual scores • arrowheads : automatic scores
Methods Sentential prosody evaluation [7] Before & after duration manipulation native learner before learner after
Methods Sentential prosody evaluation [7] F0 : point-to-point comparison btw/ native and learner after normalization native learner after Automatic score (F0, Int, Dur) (x, y, z)
Methods Sentential prosody evaluation [7] Intensity : point-to-point comparison btw/ native and learner after normalization native learner after Automatic score (F0,Int, Dur) (x, y, z)
Methods Sentential prosody evaluation [7] Duration : segment-to-segment comparison btw/ native and learner native learner before Automatic score (F0, Int, Dur) (x, y, z) Euclidean distance metric for evaluation measure P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-dimensional space
Methods Manual evaluation of sentential prosody Manual scores for Set B utterances “The dancing queen likes only the apple pies”
Methods Sentential prosody evaluation [7] A sample score array for one utterance from group K5:one learner utterance vs. 10 model native utterances Automatic prosody score for K5.U1 = {(899,142,408), (360,92,190), …(716,178,183)}
Results A prosody evaluation modelby a Korean phonetician Korean phonetician’s Model
Results A prosody evaluation modelby a Korean phonetician Korean phonetician’s Model
Results A sample prosody evaluationwith a discriminant analysis
Discussion To make this fully automatic • For manual evaluation of the training model • The number of Korean learners • The more the better • The levels of English proficiency • The diverse the better (scores 1 through 5) • For automatic evaluation of the trainees • Need automatic segmentation (ASR) • Need to deal with redundant/missing segments
Building a sentential modelfor automatic evaluation of pronunciation proficiency Part B What about segmental evaluation?
Methods Segmental evaluation byspectral comparison • Sex/age controlled (no normalization was used) • Adult male (native/Korean) speakers were selected • Spectral comparison • Three equally-spaced spectral slices were used for each matching segments • Euclidean distance measure was used from a pair of matching spectral envelopes • Four coordinates for pronunciation proficiency evaluation • Segments, F0, intensity, durations • (w, x, y, z) becomes one of the score array
Methods Manual evaluation of overall proficiency Manual scores for Set C utterances “Put your toys away right now” <Table 4> The overall scores of the 34 utterances for Set C sentence “Put your toys away right now”. The manual evaluation was performed by a Korean phonetician. Note that the subjects were all male adults.
Results A pronunciation proficiency evaluation modelby a Korean phonetician Korean phonetician’s Models (Intensity axis not shown)
Results A prosody evaluation modelby a Korean phonetician Korean phonetician’s Model
Results A discriminant analysis <Table 5> The classification table from the discriminant analysis of one test data. The number in each cell represents the probability of the automatic pronunciation Proficiency score being classified into the predicted group. <Table 6> The confusion matrix for the classification table.
Results Discriminant analyseswith leave-one-out cross-validation Testing for score 4 : 6 out of 9 correct Testing for score 2 : 12 out of 15 correct
Results Discriminant analyseswith leave-one-out cross-validation • For N4 & K2 groups, evaluation models were built by using • The discriminant analysis with • Leave-one-out cross-validation • The number of models (built by discriminant analyses) was 24 • Group N4 : 9 subjects • Group K2 : 15 subjects • Success rate • Group N4 : 6 out of 9 predicted correct • Group K2 : 12 out of 15 predicted correct
Discussion Automatic evaluationof pronunciation proficiency • Viability of sentential models for the evaluation of • Segmental proficiency : spectral comparison • Prosodic proficiency : F0/intensity/durations in the form of multiple score array coordinates (segments, F0, intensity, durations) = (w, x, y, z) • Comparison seems to work • A target utterance vs. multiple model native utterances • Better models can be built with • More (controlled) utterances • More score resolution • Current : score 2 (bad) – score 4 (good) • Future : score 1 (worst) – score 3 (fair) – score 5 (best)
References [1] Boersma, Paul, “Praat, a system for doing phonetics by computer”, Glot International 5(9/10), pp.341-345, 2001. [2] Mahalanobis, P.C., “On the generalized distance in statistics”, Proceedings of the National Institute of Science of India 12, pp.49-55, 1936. [3] Moulines, E. & F. Charpentier, “Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones”, Speech Communication 9, pp.453-467, 1990. [4] Ramus, F., M. Nespor, J. Mehler, “Correlates of linguistic rhythm in the speech signal”, Cognition 73, pp. 265-292, 1999. [5] Rhee, S., S. Lee, Y. Lee & S. Kang, “Design and construction of Korean-Spoken English Corpus (K-SEC)”, Malsori 46, pp.159-174, 2003. [6] Yoon, K, “Imposing native speakers' prosody on non-native speakers' utterances: The technique of cloning prosody”, Journal of the Modern British & American Language & Literature 25(4), pp.197-215, 2007. [7] Yoon, K. 2008. Synthesis and evaluation of prosodically exaggerated utterances. Unpublished manuscript