1 / 20

Phone-level pronunciation scoring and assessment for interactive language learning

Phone-level pronunciation scoring and assessment for interactive language learning. S.M. Witt *, S.J . Young Speech Communication 30 (2000) 95-108. Chun-Yu Chen. Introduction GOP scoring Basic GOP algorithm Phone dependent thresholds Explicit error modelling Performance measures

nola
Download Presentation

Phone-level pronunciation scoring and assessment for interactive language learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phone-level pronunciation scoring and assessment forinteractive language learning S.M. Witt *, S.J. Young Speech Communication 30 (2000) 95-108 Chun-Yu Chen

  2. Introduction GOP scoring Basic GOP algorithm Phone dependent thresholds Explicit error modelling Performance measures The transcription of pronunciation errors Performance measures Collection of a non-native database The labelling consistency of the human judges Experimental results Conclusions Outline

  3. Introduction Computer-assisted language learning (CALL) system requires the ability to accurately measure pronunciation The system described here is focussed on measuring pronunciation quality of non-native speech at the phone level and locate pronunciation errors

  4. Basic GOP algorithm The aim of the GOP measure is to provide a score for each phone of an utterance the individual GOP scores are calculated by the forced alignment pass and the phone recognition pass where each phone can follow the previous one with equal probability GOP1(p) = =

  5. The quality of the GOP scoring procedure described above depends on the quality of the acoustic models used

  6. Phone dependent thresholds A simple phone-specficthreshold can be computed from the global GOP statistics.The threshold for a phone p can be defined in terms of the mean and variance of all the GOP scores The other way to approximate human performance is to learn from human labellingbehaviour. The phone dependent threshold can be defined by averaging the normalised rejection counts over all speakers

  7. Explicit error modelling Pronunciation errors can be grouped into two main error classes Individual mispronunciations when the speaker is not familiar with the pronunciation of a specific word substitutions of native sounds for sounds of the target language, which do not exist in the native language. This type also called systematic mispronunciations The knowledge of the native tongue of the learner can be included in the GOP scoring to improve the detection of errors : using phone model sets of both the target and the speaker’s native language The posterior probability of the target phones can be calculated by

  8. scores for systematic mispronunciations are defined as Combining the basic with

  9. Performance measures Performance measures are only concerned with the detection of pronunciation errors , and four different dimensions are considered Strictness : how strict was the judge in marking pronunciation errors Agreement : the overall agreement between reference transcription and the automatically derived transcription Cross-correlation : the overall agreement between the errors marked in the reference and the automatically detected errors Overall phone correlation : Overall rejection statistics for each phone correlate between the reference and the automatic system

  10. The transcription of pronunciation errors All performance measures compare transcriptions on a frame by frame basis as follows forced alignment of the acoustic waveform with the corrected transcriptions substituted, inserted or deleted phones are marked with ''1'', other ones with ''0'‘ and this yielded vector x the vectors representing corrected transcriptions are smoothed by a Hamming window

  11. if rejected frames in one transcription are immediately followed by rejected frames in the other transcription, the rejections can be considered to have been caused by the same pronunciation error

  12. Performance measures Stricness : use the difference between strictness levels for the two Agreement : distance between the corresponding transcription vectors Cross-Correlation : takes into account only those frames where there exists a rejection in either of them ,where

  13. Phoneme Correlation : the overall similarity of the phone rejection statistics

  14. Collection of a non-native database In order to evaluate the pronunciation scoring, a database of non-native speech from second-language learners has been recorded and annotated The speakers understand the prompting texts and their competence level was low enough to produced easily detectable mispronunciations The annotation of database was performed at three different levels The original transcriptions were annotated with all substitution, deletion and insertion errors made by the non-native speaker Each word was scored on a scale of 1~4 Each sentence was socred on the same scale

  15. The labelling consistency of the human judges Four performance measures described above are to determine these characteristics The results have been calculated by averaging A, CC, PC and between the respective judge and all other ones

  16. This table shows the similarity between the human judges and the baseline GOP scoring method for each non-native speaker in that judge's group

  17. This figure shows CC and PC results grouped according to each student's mother-tongue

  18. Experimental results human and machine judgementsagree on which phones to accept and to reject with two exceptions

  19. This table shows the effects of incorporating error modelling into the GOP algorithm and in adaptation, judge-based individual thresholds

  20. Conclusions Using a specially recorded database of non-native speech, the basic GOP method has been investigated and the effectiveness of the performance measures studied The combination of the baseline method with several refinements became comparable to the human-human benchmark values A computer based pronunciation scoring system can judge with regard to which phonetic segments in an utterance can be accepted as correct or not like a human

More Related