Phone-level pronunciation scoring and assessment for interactive language learning

Phone-level pronunciation scoring and assessment forinteractive language learning S.M. Witt *, S.J. Young Speech Communication 30 (2000) 95-108 Chun-Yu Chen

Introduction GOP scoring Basic GOP algorithm Phone dependent thresholds Explicit error modelling Performance measures The transcription of pronunciation errors Performance measures Collection of a non-native database The labelling consistency of the human judges Experimental results Conclusions Outline

Introduction Computer-assisted language learning (CALL) system requires the ability to accurately measure pronunciation The system described here is focussed on measuring pronunciation quality of non-native speech at the phone level and locate pronunciation errors

Basic GOP algorithm The aim of the GOP measure is to provide a score for each phone of an utterance the individual GOP scores are calculated by the forced alignment pass and the phone recognition pass where each phone can follow the previous one with equal probability GOP1(p) = =

The quality of the GOP scoring procedure described above depends on the quality of the acoustic models used

Phone dependent thresholds A simple phone-specficthreshold can be computed from the global GOP statistics.The threshold for a phone p can be defined in terms of the mean and variance of all the GOP scores The other way to approximate human performance is to learn from human labellingbehaviour. The phone dependent threshold can be defined by averaging the normalised rejection counts over all speakers

Explicit error modelling Pronunciation errors can be grouped into two main error classes Individual mispronunciations when the speaker is not familiar with the pronunciation of a specific word substitutions of native sounds for sounds of the target language, which do not exist in the native language. This type also called systematic mispronunciations The knowledge of the native tongue of the learner can be included in the GOP scoring to improve the detection of errors : using phone model sets of both the target and the speaker’s native language The posterior probability of the target phones can be calculated by

scores for systematic mispronunciations are defined as Combining the basic with

Performance measures Performance measures are only concerned with the detection of pronunciation errors , and four different dimensions are considered Strictness : how strict was the judge in marking pronunciation errors Agreement : the overall agreement between reference transcription and the automatically derived transcription Cross-correlation : the overall agreement between the errors marked in the reference and the automatically detected errors Overall phone correlation : Overall rejection statistics for each phone correlate between the reference and the automatic system

The transcription of pronunciation errors All performance measures compare transcriptions on a frame by frame basis as follows forced alignment of the acoustic waveform with the corrected transcriptions substituted, inserted or deleted phones are marked with ''1'', other ones with ''0'‘ and this yielded vector x the vectors representing corrected transcriptions are smoothed by a Hamming window

if rejected frames in one transcription are immediately followed by rejected frames in the other transcription, the rejections can be considered to have been caused by the same pronunciation error

Performance measures Stricness : use the difference between strictness levels for the two Agreement : distance between the corresponding transcription vectors Cross-Correlation : takes into account only those frames where there exists a rejection in either of them ,where

Phoneme Correlation : the overall similarity of the phone rejection statistics

Collection of a non-native database In order to evaluate the pronunciation scoring, a database of non-native speech from second-language learners has been recorded and annotated The speakers understand the prompting texts and their competence level was low enough to produced easily detectable mispronunciations The annotation of database was performed at three different levels The original transcriptions were annotated with all substitution, deletion and insertion errors made by the non-native speaker Each word was scored on a scale of 1~4 Each sentence was socred on the same scale

The labelling consistency of the human judges Four performance measures described above are to determine these characteristics The results have been calculated by averaging A, CC, PC and between the respective judge and all other ones

This table shows the similarity between the human judges and the baseline GOP scoring method for each non-native speaker in that judge's group

This figure shows CC and PC results grouped according to each student's mother-tongue

Experimental results human and machine judgementsagree on which phones to accept and to reject with two exceptions

This table shows the effects of incorporating error modelling into the GOP algorithm and in adaptation, judge-based individual thresholds

Conclusions Using a specially recorded database of non-native speech, the basic GOP method has been investigated and the effectiveness of the performance measures studied The combination of the baseline method with several refinements became comparable to the human-human benchmark values A computer based pronunciation scoring system can judge with regard to which phonetic segments in an utterance can be accepted as correct or not like a human

Phone-level pronunciation scoring and assessment for interactive language learning

Phone-level pronunciation scoring and assessment for interactive language learning

Presentation Transcript

Using speech technology for pronunciation assessment and training Helmer Strik Centre for Language and Speech Technol

Formative assessment and Assessment for Learning

Intelligent Web-based Interactive Language Learning

Centre for Research in English Language Learning and Assessment

Centre for Research in English Language Learning and Assessment

Centre for Research in English Language Learning and Assessment

Masters Level Assessment of Teaching and Learning

Centre for Research in English Language Learning and Assessment

Assessment for Learning in A-Level Literature

Connecting Assessment, Language, and Learning

Assessment and Assessment for Learning

LLNP assessment and Pronunciation Improvement

Connecting Assessment, Language, and Learning

Interactive Application for Learning the Latin Language

Assessment in Language Learning

Assessment and Assessment for Learning

Formative assessment and Assessment for Learning

Homeschool Interactive Program For Language Learning

Speech Assessment: Methods and Applications for Spoken Language Learning

M5 Pronunciation On the Phone

Connecting Assessment, Language, and Learning

Assessment, Scoring, and Evaluation