230 likes | 381 Views
Phone-level pronunciation scoring and assessment for interactive language learning. S.M. Witt *, S.J . Young Speech Communication 30 (2000) 95-108. Chun-Yu Chen. Introduction GOP scoring Basic GOP algorithm Phone dependent thresholds Explicit error modelling Performance measures
E N D
Phone-level pronunciation scoring and assessment forinteractive language learning S.M. Witt *, S.J. Young Speech Communication 30 (2000) 95-108 Chun-Yu Chen
Introduction GOP scoring Basic GOP algorithm Phone dependent thresholds Explicit error modelling Performance measures The transcription of pronunciation errors Performance measures Collection of a non-native database The labelling consistency of the human judges Experimental results Conclusions Outline
Introduction Computer-assisted language learning (CALL) system requires the ability to accurately measure pronunciation The system described here is focussed on measuring pronunciation quality of non-native speech at the phone level and locate pronunciation errors
Basic GOP algorithm The aim of the GOP measure is to provide a score for each phone of an utterance the individual GOP scores are calculated by the forced alignment pass and the phone recognition pass where each phone can follow the previous one with equal probability GOP1(p) = =
The quality of the GOP scoring procedure described above depends on the quality of the acoustic models used
Phone dependent thresholds A simple phone-specficthreshold can be computed from the global GOP statistics.The threshold for a phone p can be defined in terms of the mean and variance of all the GOP scores The other way to approximate human performance is to learn from human labellingbehaviour. The phone dependent threshold can be defined by averaging the normalised rejection counts over all speakers
Explicit error modelling Pronunciation errors can be grouped into two main error classes Individual mispronunciations when the speaker is not familiar with the pronunciation of a specific word substitutions of native sounds for sounds of the target language, which do not exist in the native language. This type also called systematic mispronunciations The knowledge of the native tongue of the learner can be included in the GOP scoring to improve the detection of errors : using phone model sets of both the target and the speaker’s native language The posterior probability of the target phones can be calculated by
scores for systematic mispronunciations are defined as Combining the basic with
Performance measures Performance measures are only concerned with the detection of pronunciation errors , and four different dimensions are considered Strictness : how strict was the judge in marking pronunciation errors Agreement : the overall agreement between reference transcription and the automatically derived transcription Cross-correlation : the overall agreement between the errors marked in the reference and the automatically detected errors Overall phone correlation : Overall rejection statistics for each phone correlate between the reference and the automatic system
The transcription of pronunciation errors All performance measures compare transcriptions on a frame by frame basis as follows forced alignment of the acoustic waveform with the corrected transcriptions substituted, inserted or deleted phones are marked with ''1'', other ones with ''0'‘ and this yielded vector x the vectors representing corrected transcriptions are smoothed by a Hamming window
if rejected frames in one transcription are immediately followed by rejected frames in the other transcription, the rejections can be considered to have been caused by the same pronunciation error
Performance measures Stricness : use the difference between strictness levels for the two Agreement : distance between the corresponding transcription vectors Cross-Correlation : takes into account only those frames where there exists a rejection in either of them ,where
Phoneme Correlation : the overall similarity of the phone rejection statistics
Collection of a non-native database In order to evaluate the pronunciation scoring, a database of non-native speech from second-language learners has been recorded and annotated The speakers understand the prompting texts and their competence level was low enough to produced easily detectable mispronunciations The annotation of database was performed at three different levels The original transcriptions were annotated with all substitution, deletion and insertion errors made by the non-native speaker Each word was scored on a scale of 1~4 Each sentence was socred on the same scale
The labelling consistency of the human judges Four performance measures described above are to determine these characteristics The results have been calculated by averaging A, CC, PC and between the respective judge and all other ones
This table shows the similarity between the human judges and the baseline GOP scoring method for each non-native speaker in that judge's group
This figure shows CC and PC results grouped according to each student's mother-tongue
Experimental results human and machine judgementsagree on which phones to accept and to reject with two exceptions
This table shows the effects of incorporating error modelling into the GOP algorithm and in adaptation, judge-based individual thresholds
Conclusions Using a specially recorded database of non-native speech, the basic GOP method has been investigated and the effectiveness of the performance measures studied The combination of the baseline method with several refinements became comparable to the human-human benchmark values A computer based pronunciation scoring system can judge with regard to which phonetic segments in an utterance can be accepted as correct or not like a human