400 likes | 563 Views
Automatic phonetic transcription of large speech corpora. Christophe Van Bael. Nijmegen, 09-06-06. Annual Symposium of the Dutch Association for Phonetic Sciences: Corpus-based Research . overview. automatic phonetic transcription of LSC
E N D
Automatic phonetic transcription of large speech corpora Christophe Van Bael Nijmegen, 09-06-06 Annual Symposium of the Dutch Association for Phonetic Sciences: Corpus-based Research
overview • automatic phonetic transcription of LSC [Christophe Van Bael, Lou Boves, Henk van den Heuvel, Helmer Strik] • background • aim of our study • material - method • generation of phonetic transcriptions • evaluation of phonetic transcriptions • results • conclusions 1 - 19
background • increased availability of LSC • data annotation required • phonetic transcription added value • manual transcription expensive, inconsistent • semi-automatic transcription cheaper, potential bias • automatic transcription cheap, consistent 2 - 19
aim • test whether • automatic transcription procedures can approximate manual transcriptions that are usually delivered with present-day corpora • combination of automatic transcription procedures yields ‘better’ transcription results 3 - 19
material - method • Spoken Dutch Corpus • read speech and telephone dialogues • reference transcriptions • 7K development set optimise procedures • 7K evaluation set test procedures • standard canonical lexicon • continuous speech recogniser • ADAPT: alignment algorithm 4 - 19
generation no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • 10 transcription procedures 5 - 19
generation no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • 10 transcription procedures 5 - 19
generation orthographic transcription CAN-PT lexicon-lookup canonical lexicon no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • lexicon-lookup procedure: CAN-PT 6 - 19
generation op een gegeven moment Op @n x@xev@ mOmEnt lexicon-lookup op Op een @n gegeven x@xev@ moment mOmEnt no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • lexicon-lookup procedure: CAN-PT 6 - 19
generation acoustic models constrained recognition DD-PT phonotactic models no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • data-driven transcription: DD-PT 7 - 19
generation rule extraction no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT linguistic literature 8 - 19
generation phonological rules rule extraction no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT linguistic literature 8 - 19
generation phonological rules rule extraction no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT linguistic literature 8 - 19
generation phonological rules rule extraction no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT buk dAt linguistic literature canonical transcription 8 - 19
generation rule application phonological rules rule extraction canonical transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT buk dAt bug dAt linguistic literature 8 - 19
generation mult. pron. lexicon KB-PT acoustic models orthographic transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT forced recognition 8 - 19
generation CAN-PT DD-PT combination of variants no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: CAN/DD-PT 9 - 19
generation d@ Ap@ltart d Ap@ltat d@ Ap@ltart d Ap@ltat d@ Ap@ltat d Ap@ltart d @ A p @ l t a r t d - A p @ l t a - t no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: CAN/DD-PT 9 - 19
generation mult. pron. lexicon d @ A p @ l t a r t d - A p @ l t a - t no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: CAN/DD-PT d@ Ap@ltart d Ap@ltat d@ Ap@ltart d Ap@ltat d@ Ap@ltat d Ap@ltart 9 - 19
generation mult. pron. lexicon CAN/DD-PT acoustic models orthographic transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: CAN/DD-PT forced recognition 9 - 19
generation KB-PT mult. pron. lexicon DD-PT combination of variants no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: KB/DD-PT 10 - 19
generation mult. pron. lexicon KB/DD-PT acoustic models orthographic transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: KB/DD-PT forced recognition 10 - 19
generation automatic transcription automatic transcription phone-level alignment variant generation decision trees reference transcription P( g | k, u _ # d) = 0.7 P( k | k, u _ # d) = 0.2 no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • decision trees (basic idea) P (RT_phone|APT_phone,APT_context_phones) P(pron_variants|APT_phone,APT_context_phones) P(pron_variants| k , u _ # d) 11 - 19
generation mult. pron. lexicon [APT]d acoustic models orthographic transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • decision trees (basic idea) forced recognition variant generation 11 - 19
evaluation automatic transcription phone-level alignment quality measure reference transcription • evaluation of phonetic transcriptions 12 - 19
evaluation d@ Ap@l vAlt %dis = 22% d @ A p @ l v A l t d - A p @ l f A l t d Ap@l fAlt 1 ins + 1 sub + 0 del * 100 9 phones in reference • evaluation of phonetic transcriptions 12 - 19
results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19
results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19
results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19
results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19
results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19
results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19
results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19
results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19
results • what about the remaining discrepancies? • slightly higher number than between human experts • mainly voiced/voiceless contrast 14 - 19
recap: aim • test whether • automatic transcription procedures can approximate manual transcriptions that are usually delivered with present-day corpora • combination of automatic transcription procedures yield ‘better’ transcription results 15 - 19
conclusions • canonical transcription • good, not optimal • knowledge-based transcription • added value for spontaneous speech • data-driven transcription • suboptimal • combination methods (combination with DD) • suboptimal • decision trees • general tendency: improved transcription accuracy 16 - 19
conclusions • [CAN-PT]d best • approximated human inter-labeller agreement • ‘disagreements’ human-like • don’t try to model remaining inconsistencies (if possible at all, with an automatic procedure) therefore… 17 - 19
conclusions • thoroughly assess added value of manual labour in transcription projects because … • the ‘added value’ of manual verification may be largely reproduced by means of a quick, cheap, consistent and adequate procedure 18 - 19
questions Christophe Van Bael CLST, Radboud University Nijmegen c.v.bael@let.ru.nl Q