1 / 13

Spoken Language Identification Using the Speechdat-M Corpus

Spoken Language Identification Using the Speechdat-M Corpus. Diamantino Caseiro - Isabel Trancoso INESC/IST. Language Identification. Best systems use multiple large vocabulary continuous speech recognisers.

shiri
Download Presentation

Spoken Language Identification Using the Speechdat-M Corpus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spoken Language Identification Using the Speechdat-M Corpus Diamantino Caseiro - Isabel Trancoso INESC/IST Spoken Language Identification Using the Speechdat-M Corpus

  2. Language Identification • Best systems use multiple large vocabulary continuous speech recognisers. • But are hard to extend to new languages because they require large amounts of hard-to-get linguistic data (such as transcribed speech). • Phonotactic approaches • Published systems still require some linguistic data Spoken Language Identification Using the Speechdat-M Corpus

  3. Phonotactic Aproaches - PRLM-P (phonetic recognition followed by language modelling - parallel)multiple language-specific phone recognisers Spoken Language Identification Using the Speechdat-M Corpus

  4. Phonotactic Aproaches DBD (double bigram decoding)one language independent phone recogniser Spoken Language Identification Using the Speechdat-M Corpus

  5. Speechdat-M Multilingual 6 languages: English,Spanish, German, Portuguese, Italian, French. Telephone Speech Includes: Numbers/Digits/Hours/ Dates/Money/Commands Phonetically rich sentences Etc. Orthographic transcriptions Subset used: Phonetically rich sentences 6 languages x 1000 speakers x 9 utterances The same sentence is read by more than one speaker Utterances with 5 seconds average duration. Corpus Spoken Language Identification Using the Speechdat-M Corpus

  6. Corpus - Train/test selection • Criteria • Speakers: 70% train, 30% test • Sentences: 70% train, 30% test • Random selection Spoken Language Identification Using the Speechdat-M Corpus

  7. Baseline System • Objective: Creation of high performance modules. • PRLM Architecture (Phone Recognition followed by Language Modelling) Spoken Language Identification Using the Speechdat-M Corpus

  8. Baseline System - Modules • Parameters extraction • MFCC: 12 cepstral coef. + 12 delta cepstral + energy + delta energy • Mean cepstral subtraction • Acoustic units • 80 units = 39 Portuguese phones x 2 sexes + silence + pause • Phone recogniser • Continuous HMMs with 8 mixtures • Language models • Interpolated phone bigrams • Classifier • Maximum likelihood Spoken Language Identification Using the Speechdat-M Corpus

  9. Continuous HMMs with 8 mixtures Train Used only Portuguese speech and orthographic transcriptions Flat start with embedded Baum-Welch Recogniser: Viterbi Recognises only all-male or all-female phone sequences. Phone recognition performance. Baseline System - Modules - Recogniser Correctness Accuracy Train utterances 55,5% 52,5 Test utterances 54,1% 50,5 Spoken Language Identification Using the Speechdat-M Corpus

  10. Baseline System - Results • Global identification rate 71.1% • Language proximity revealed • Portuguese better identified Spoken Language Identification Using the Speechdat-M Corpus

  11. Proposed System - Bootstrappeddouble bigram decoding Spoken Language Identification Using the Speechdat-M Corpus

  12. Proposed System - Results • Identification rate increased to 83.5%. • The duration of the utterance is an important factor • 86.1% with [7,8[ seconds utterances Spoken Language Identification Using the Speechdat-M Corpus

  13. Conclusions • A language identification system easy to extend to new languages • Language proximity hurts identification Spoken Language Identification Using the Speechdat-M Corpus

More Related