1 / 20

Language Identification

Language Identification. Oldřich Plchot, Pavel Ma t ějka Speech@FIT, Brno University of Technology, Czech Republic matejkap@fit.vutbr.cz. IKR Brno 2012. Outline. Why do we need LID? Evaluations Acoustic LID Phonotactic LID Fusion Conclusion. Wh y do we need language identification?.

argyle
Download Presentation

Language Identification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Language Identification Oldřich Plchot, Pavel Matějka Speech@FIT, Brno University of Technology, Czech Republic matejkap@fit.vutbr.cz IKR Brno 2012

  2. Outline • Why do we need LID? • Evaluations • Acoustic LID • Phonotactic LID • Fusion • Conclusion Language Identification IKR, Brno, 2012

  3. Why do we need language identification? 1) Route phone calls to human operators. Emergency (112,155,911) Call centers Police (158) Fireguard (150) Language Identification IKR, Brno, 2012

  4. Why do we need language identification? 2) Pre-select suitable recognition system. KWS CHN Language Identification Translate SPA Translate CZE Translate VIE Speech2Text ENG Connect Language Identification IKR, Brno, 2012

  5. Why do we need language identification? 3) Security applications to narrow search space. Language Identification IKR, Brno, 2012

  6. Two main approaches to LID • Acoustic – Gaussian Mixture Model • Phonotactic – Phoneme Recognition followed by Language Model Language Identification IKR, Brno, 2012

  7. Acoustic approach • Gaussian Mixture Model • good for short speech segments and dialect recognition - relies on the sounds Language Identification IKR, Brno, 2012

  8. Spectral features - MFCC 20ms 10ms -12.8 -0.3 -5.7 -22.4 8.9 6.8 … -11.2 0.4 -4.7 -13.0 2.3 4.5 … Short-time FFT Mel - Filter Bank Log () Discrete Cosine Transform Language Identification IKR, Brno, 2012

  9. Shifted delta cepstra • Shifted Delta Cepstra represent an information about the speech evolution around the current frame ( ± 0.1sec) • Size of Final feature vector is: 7 MFCC + 7 × 7 SDC = 56 Language Identification IKR, Brno, 2012

  10. Acoustic systems – GMM based • Maximum likelihood(generative) • Objective function to maximize is the likelihood of training data given the transcription • Maximum Mutual Information (discriminative) • Objective function to maximize is the posterior probability of all training utterances being correctly recognized • Advantages of using discriminative training: • Lower error rates • Less parameters • Disadvantagesofdiscriminativetraining • Overtraining • Sometimescomputationalyexpensive • Channel Compensation – from previous presentation Language Identification IKR, Brno, 2012

  11. Highly overlapped distributions Language Identification IKR, Brno, 2012

  12. Results on LRE 2007 (14 languages) The best acoustic system combines: • Many Gaussians • Eigen-channel compensation of features • MMI Language Identification IKR, Brno, 2012

  13. Phonotactic approach • Phoneme Recognition followed by Language Model (PRLM) • good for longer speech segments • robust against dialects in one language • eliminates speech characteristics of speaker's native language Language Identification IKR, Brno, 2012

  14. Phone recognizer • Investigation of different phone recognizers for LID => better phone recognizer ≈ better LID system • 3 neural networks to produce the phone posterior probability • 310 ms long time trajectory around the actual frame Language Identification IKR, Brno, 2012

  15. Phone recognition output One best phone string Language Identification IKR, Brno, 2012

  16. Phonotactic modeling - example German English Test • N-gram language models – discounting, backoff • Support Vector Machines – vectors with counts • PCA + LDA • Neural Networks Language Identification IKR, Brno, 2012

  17. Phone recognition output 0,6 0,3 0,1 One best phone string Phone lattice Language Identification IKR, Brno, 2012

  18. Results on LRE 2007 (14 languages) Conclusion: • Build as good phone recognizer as you can • Gather as much data for each language as you can • Different approaches to modeling counts seem to not have big influence on results Language Identification IKR, Brno, 2012

  19. Fusion - LRE 2007 (14 languages) Note: • Fusion weights have to be trained on separate set of files which are as close as possible to target data Language Identification IKR, Brno, 2012

  20. Thanks for your attention and I hope you enjoyed it ;) Language Identification IKR, Brno, 2012

More Related