1 / 14

Accent & Dialect Identification

Accent & Dialect Identification. Chuck Curtis LING575 – Discourse & Dialogue 6 / 1 /2011. HTK. Hidden Markov Model Toolkit Library of modules and tools written in C First release was in 1989 Eventually wound up in Microsoft’s hands, but it is publicly available

geona
Download Presentation

Accent & Dialect Identification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accent & Dialect Identification Chuck Curtis LING575 – Discourse & Dialogue 6/1/2011

  2. HTK • Hidden Markov Model Toolkit • Library of modules and tools written in C • First release was in 1989 • Eventually wound up in Microsoft’s hands, but it is publicly available • http://htk.eng.cam.ac.uk/ • Ended up being too difficult to implement • Too much background and theory that was unfamiliar • Many incremental steps that were confusing

  3. TIMIT • LDC Acoustic-Phonetic Continuous Speech Corpus (1993) • 8 American English dialect groups • 630 speakers total, 10 read sentences per speaker • 70% male, 30% female • ARPABET phonetic transcriptions • On Patasat /corpora/LDC/LDC93S1/

  4. Example Sentence 0 63488 She had your dark suit in greasy wash water all year. 7470 11362 she11362 16000 had 15420 17503 your 17503 23360 dark 23360 28360 suit 28360 30960 in 30960 36971 greasy 36971 42290 wash 43120 47480 water 49021 52184 all 52184 58840 year

  5. Example Phone Sequence 0 7470 h# 7470 9840 sh 9840 11362 iy 11362 12908 hv 12908 14760 ae 14760 15420 dcl 15420 16000 jh 16000 17503 axr 17503 18540 dcl 18540 18950 d 18950 21053 aa 21053 22200 r 22200 22740 kcl … …

  6. What’s our vector, Victor? • As a starting point, I’m looking at phone sequences for each word as separate features /corpora/LDC/LDC93S1/TIMIT/TRAIN/DR4/MBMA0_2 South_Midlandask_ae_s 1 an_ix_n 1 rag_r_ae_gcl_g 1 like_l_ay_kcl 1 that_dh_ae_tcl 1 oily_oy_l_iy 1 me_m_iy 1 carry_kcl_k_eh_r_iy 1 don't_d_ow_nx 1 to_dx_ix 1

  7. Using MalleT and TBL • MaxEnt classifier • TBL algorithm that we implemented for 572

  8. FEATURES = WORDS, VECTORS = SPEAKERS

  9. FEATURES = WORDS, VECTORS = SENTENCES

  10. FEATURES = Monophones, VECTORS = sentences

  11. FEATURES = diphones, VECTORS = Sentences

  12. FEATURES = Triphones, VECTORS = Sentences

  13. TODO • Trigrams w/ word boundaries • Try DecisionTree classifier (which uses InfoGain) • Possibly add gender to feature vectors

  14. Questions / Comments?

More Related