1 / 40

Automatic phonetic transcription of large speech corpora

Automatic phonetic transcription of large speech corpora. Christophe Van Bael. Nijmegen, 09-06-06. Annual Symposium of the Dutch Association for Phonetic Sciences: Corpus-based Research . overview. automatic phonetic transcription of LSC

mare
Download Presentation

Automatic phonetic transcription of large speech corpora

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic phonetic transcription of large speech corpora Christophe Van Bael Nijmegen, 09-06-06 Annual Symposium of the Dutch Association for Phonetic Sciences: Corpus-based Research

  2. overview • automatic phonetic transcription of LSC [Christophe Van Bael, Lou Boves, Henk van den Heuvel, Helmer Strik] • background • aim of our study • material - method • generation of phonetic transcriptions • evaluation of phonetic transcriptions • results • conclusions 1 - 19

  3. background • increased availability of LSC • data annotation required • phonetic transcription added value • manual transcription expensive, inconsistent • semi-automatic transcription cheaper, potential bias • automatic transcription cheap, consistent 2 - 19

  4. aim • test whether • automatic transcription procedures can approximate manual transcriptions that are usually delivered with present-day corpora • combination of automatic transcription procedures yields ‘better’ transcription results 3 - 19

  5. material - method • Spoken Dutch Corpus • read speech and telephone dialogues • reference transcriptions • 7K development set optimise procedures • 7K evaluation set test procedures • standard canonical lexicon • continuous speech recogniser • ADAPT: alignment algorithm 4 - 19

  6. generation no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • 10 transcription procedures 5 - 19

  7. generation no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • 10 transcription procedures 5 - 19

  8. generation orthographic transcription CAN-PT lexicon-lookup canonical lexicon no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • lexicon-lookup procedure: CAN-PT 6 - 19

  9. generation op een gegeven moment Op @n x@xev@ mOmEnt lexicon-lookup op Op een @n gegeven x@xev@ moment mOmEnt no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • lexicon-lookup procedure: CAN-PT 6 - 19

  10. generation acoustic models constrained recognition DD-PT phonotactic models no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • data-driven transcription: DD-PT 7 - 19

  11. generation rule extraction no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT linguistic literature 8 - 19

  12. generation phonological rules rule extraction no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT linguistic literature 8 - 19

  13. generation phonological rules rule extraction no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT linguistic literature 8 - 19

  14. generation phonological rules rule extraction no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT buk dAt linguistic literature canonical transcription 8 - 19

  15. generation rule application phonological rules rule extraction canonical transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT buk dAt bug dAt linguistic literature 8 - 19

  16. generation mult. pron. lexicon KB-PT acoustic models orthographic transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • knowledge-based transcription: KB-PT forced recognition 8 - 19

  17. generation CAN-PT DD-PT combination of variants no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: CAN/DD-PT 9 - 19

  18. generation d@ Ap@ltart d Ap@ltat d@ Ap@ltart d Ap@ltat d@ Ap@ltat d Ap@ltart d @ A p @ l t a r t d - A p @ l t a - t no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: CAN/DD-PT 9 - 19

  19. generation mult. pron. lexicon d @ A p @ l t a r t d - A p @ l t a - t no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: CAN/DD-PT d@ Ap@ltart d Ap@ltat d@ Ap@ltart d Ap@ltat d@ Ap@ltat d Ap@ltart 9 - 19

  20. generation mult. pron. lexicon CAN/DD-PT acoustic models orthographic transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: CAN/DD-PT forced recognition 9 - 19

  21. generation KB-PT mult. pron. lexicon DD-PT combination of variants no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: KB/DD-PT 10 - 19

  22. generation mult. pron. lexicon KB/DD-PT acoustic models orthographic transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • combined lexicon: KB/DD-PT forced recognition 10 - 19

  23. generation automatic transcription automatic transcription phone-level alignment variant generation decision trees reference transcription P( g | k, u _ # d) = 0.7 P( k | k, u _ # d) = 0.2 no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • decision trees (basic idea) P (RT_phone|APT_phone,APT_context_phones) P(pron_variants|APT_phone,APT_context_phones) P(pron_variants| k , u _ # d) 11 - 19

  24. generation mult. pron. lexicon [APT]d acoustic models orthographic transcription no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT • decision trees (basic idea) forced recognition variant generation 11 - 19

  25. evaluation automatic transcription phone-level alignment quality measure reference transcription • evaluation of phonetic transcriptions 12 - 19

  26. evaluation d@ Ap@l vAlt %dis = 22% d @ A p @ l v A l t d - A p @ l f A l t d Ap@l fAlt 1 ins + 1 sub + 0 del * 100 9 phones in reference • evaluation of phonetic transcriptions 12 - 19

  27. results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19

  28. results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19

  29. results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19

  30. results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19

  31. results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19

  32. results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19

  33. results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19

  34. results no multiple pronunciation lexicon multiple pronunciation lexicon 1 2 3 CAN-PT combined lexicon DD-PT KB-PT d-trees 4 5 1-5 [ ]d CAN/DD-PT KB/DD-PT 13 - 19

  35. results • what about the remaining discrepancies? • slightly higher number than between human experts • mainly voiced/voiceless contrast 14 - 19

  36. recap: aim • test whether • automatic transcription procedures can approximate manual transcriptions that are usually delivered with present-day corpora • combination of automatic transcription procedures yield ‘better’ transcription results 15 - 19

  37. conclusions • canonical transcription • good, not optimal • knowledge-based transcription • added value for spontaneous speech • data-driven transcription • suboptimal • combination methods (combination with DD) • suboptimal • decision trees • general tendency: improved transcription accuracy 16 - 19

  38. conclusions • [CAN-PT]d best • approximated human inter-labeller agreement • ‘disagreements’ human-like • don’t try to model remaining inconsistencies (if possible at all, with an automatic procedure) therefore… 17 - 19

  39. conclusions • thoroughly assess added value of manual labour in transcription projects because … • the ‘added value’ of manual verification may be largely reproduced by means of a quick, cheap, consistent and adequate procedure 18 - 19

  40. questions Christophe Van Bael CLST, Radboud University Nijmegen c.v.bael@let.ru.nl Q

More Related