1 / 27

Language modeling for speaker recognition

Explore character n-gram models and key-list models for author identification in speaker recognition research, with results and insights from experiments. Access the full thesis at http://www.dgillick.com/resource/thesis.pdf.

belisma
Download Presentation

Language modeling for speaker recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dan Gillick January 20, 2004 Language modeling for speaker recognition

  2. Outline • Author identification • Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition) • My next project Language modeling for speaker recognition

  3. Author ID (undergrad. thesis) Problem: • train models for each of k authors • given some test text written by 1 of those authors, identify the correct author Variations: • different kinds of models • different size test samples • different k Language modeling for speaker recognition

  4. Character n-gram models What? • 27 tokens: a-z, <space> • some text generated from such a trigram model: “you orthad gool of anythilly uncand or prafecaustiont and to hing that put ably” Language modeling for speaker recognition

  5. Character n-gram models Why? • very simple • data sparseness less troublesome than with word n-grams • supposed to be state-of-the-art or at least close to it (Khmelev, D, Tweedie, F.J. “Using Markov Chains for the Identification of Writers”: Literary and Linguistic Computing, 16(4): 299-307. 2001.) Language modeling for speaker recognition

  6. Character n-grams: Setup • task: pick correct author from 10 possible authors • training data: 3 novels for each author • test data: text from a held-out novel • jack-knifing: 4 novels for each of 20 authors Language modeling for speaker recognition

  7. Character n-grams: Results • task: picking 1 author from 10 possible authors • training data size: 3 novels Language modeling for speaker recognition

  8. Character n-gram models Why does it work? • captures some word choice information • picks up word endings (–ing, -tion, -ly, etc.) • not hurt much by data sparseness issues Language modeling for speaker recognition

  9. Key-list models Incentive: • ought to be able to beat character n-grams • develop a new modeling method more focused on that which differentiates between authors (characters and words are both useful for topic recognition, but that doesn’t mean they are best for author recognition) Language modeling for speaker recognition

  10. Key-list models Idea: • convert the text stream into a stream of only authorship-relevant symbols (I called these lists of symbols key-lists) • each symbol is a regular expression to allow for broad definitions (/*tion/ captures any nounification) • text not accounted for by the key-list is represented by <short>, <med>, or <long> markers • build n-gram models from these new streams Language modeling for speaker recognition

  11. Key-list models Sample key-list: sample trigram: <comma> <short> <period> Language modeling for speaker recognition

  12. Key-list models: Results • task: picking 1 author from 10 possible authors • training data size: 3 novels Language modeling for speaker recognition

  13. Key-list models: Results Some other interesting results: • key-lists with just punctuation (as well as <short>, <med>, <long>) performed almost as well as the best key-lists • all key-lists were outperformed by the best n-letter model when test data size < 10,000 chars. but all key-list models eventually surpassed the n-letter models Language modeling for speaker recognition

  14. Key-list models Things I didn’t do: • vary amount of training data • spend a long time trying different key-lists • combine key-list results with each other or with the character results • a lot of other stuff The thesis is available on the web: http://www.dgillick.com/resource/thesis.pdf Language modeling for speaker recognition

  15. Outline • Author identification • Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition) • My next project Language modeling for speaker recognition

  16. G. Doddington’s LM strategy • create LMs with a limited vocabulary of the most commonly occurring 2000 bigrams • to smooth out zeroes, boost each bigram prob. by 0.001 • score by calculating: logprob(test|target) – logprob(test|bkg) • logprobs are joint probabilities logprob(AB) = logprob(A) + logprob(B|A) Language modeling for speaker recognition

  17. G. Doddington’s LM: Setup Switchboard 1 data: • collected in early ’90s from all over the US • 2,400 (~5 min.) conversations among 543 speakers • corpus divided into 6 splits and tested using jack-knifing through the splits • manual transcripts provided by MS. State Task: • 8 conversation sides used as training data to build models for each target speaker • 1 conversation side used as test data • background model built from 3 splits of held-out data • jack-knifing allowed for almost 10,000 trials Language modeling for speaker recognition

  18. G. Doddington’s LM: Results Notes: • these results are my own attempt to replicate the original experiments • SRI reported EER = 8.65% for this same experiment Language modeling for speaker recognition

  19. Adapted bigram models Incentive: • adapting target models from a much larger background model should yield better estimates of probabilities in the language models Specifically: • use same 2000 bigram vocabulary • target probabilities are a mixture of training probabilities and background probabilities • mixture weight is 2:1 target data:bkg. data Language modeling for speaker recognition

  20. Adapted bigram models: Results Notes: • nearly identical performance • combination of the 2 systems yields almost no improvement • why isn’t the adapted version better? Language modeling for speaker recognition

  21. Can anything improve on 8.68? Trigrams? • use same count threshold to make a list of the top 700 trigrams (“a lot of”, “I don’t know” were among the most common) Character models? • worked well for authorship… • included all character combinations (no limited vocabulary) • tried bigram and trigram models Language modeling for speaker recognition

  22. Scores and combinations adapt. word bigrams EER = 8.89% adapt. word trigrams EER = 11.88% adapt.char. bigrams EER = 13.73% adapt. char. trigrams EER = 17.92% adapted words EER = 8.46% adapted characters EER = 13.24% adapted words + adapted characters EER = 7.89% GD bigrams EER = 8.68% Language modeling for speaker recognition

  23. Final Comparison Language modeling for speaker recognition

  24. What about less training data? 1 conversation-side training • character models might provide more of an advantage with less data? • not so. • GD EER = 22.5% • adapted character EER = 30% • adapted word EER = 20% • maybe these character models pick up on the topic of that 1 conversation • haven’t tried any other size training data Language modeling for speaker recognition

  25. Outline • Author identification • Trying to beat GD’s result • My next project Language modeling for speaker recognition

  26. Key-lists for speaker recognition • key-list n-grams picked up on phrasing (comma and period were valuable tokens) • automatic transcripts don’t have punctuation but they do have pause and duration information • use reg. exps. and duration info. to capture idiosynchratic speaker phrasing • capture other speech information in key-lists? (energy, f0, etc.) Language modeling for speaker recognition

  27. Acknowledgements Thanks to: Anand and Luciana at SRI for trying to help me replicate their results Barbara for providing advice Barry and Kofi for helping with computers and stuff George Language modeling for speaker recognition

More Related