1.24k likes | 1.46k Views
Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus. National Tsing Hua University Chi-Yueh Lin 2006/1/19. Introduction to LID. L anguage Id entification (LID) applications Pre-processing for machine systems Pre-processing for human listeners
E N D
Automatic Language IdentificationOverview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19 NGSST 2006 冬季講習會
Introduction to LID • Language Identification (LID) applications • Pre-processing for machine systems • Pre-processing for human listeners • Some authors preferred to use another abbreviation “ALI”, which stands for “Automatic Language Identification”. NGSST 2006 冬季講習會
Introduction to LID • Pre-processing for machine systems • Multi-lingual information retrieval system in hotel lobby or international airport. ? Information in English English ASR ? Information in French Language Identification System French ASR ? Information in Spanish Spanish ASR ? Information in Mandarin Mandarin ASR NGSST 2006 冬季講習會
Introduction to LID • Pre-processing for human-listeners • AT&T Language Line was designed for handling emergency calls Delay in the order of minutes ? ? ? NGSST 2006 冬季講習會
Introduction to LID • AT&T Language Line • http://www.languageline.com • The service uses trained human interpreters to handle about 150 languages. • It takes about 3-minute delay to correctly identify “Tamil”. NGSST 2006 冬季講習會
Introduction to LIDHuman Perceptual Experiment From “Reviewing Automatic Language Identification”, IEEE Signal Processing Magazine, Oct. 1994. NGSST 2006 冬季講習會
Introduction to LIDHuman Perceptual Experiment • Comments from the post-experiment interview • Phoneme-spotting and word-spotting strategies • Prosodic cues • Increased exposure to each language, performance improved. NGSST 2006 冬季講習會
Introduction to LID • Paper found in IEEE Xplore • Keyword : “language identification” • ICASSP 2006 • 6 papers Golden Age of LID NGSST 2006 冬季講習會
Introduction to LID • Research on LID before 1980 were primarily done in Texas Instruments. • 1973~1980 (4 papers) • Reference template • House and Neuberg (1977 JASA) • HMM trained on sequences of broad phonetic category labels • Near-perfect discrimination • No real speech data. NGSST 2006 冬季講習會
Language identification cues • Phonology • Phone & phoneme sets differ from one language to another. • Phone & phoneme frequencies of occurrence may also differ. • Phonotactics. • Prosody • Duration, pitch, and stress. NGSST 2006 冬季講習會
Language identification cues • Morphology • Word roots • Lexicon • Syntax • The sentence patterns are different among languages. NGSST 2006 冬季講習會
Language identification cues Most of recent LID systems use these two kinds of cues Phonology Prosody These cues are seldom used Morphology Syntax NGSST 2006 冬季講習會
Language Identification System NGSST 2006 冬季講習會
LID systems NGSST 2006 冬季講習會
LID systems • Systems vary primarily according to their method for modeling languages. • Spectral-similarity approaches • Prosody-based approaches • Phone-recognition approaches • Using multilingual speech units • Word level approaches • Continuous speech recognition NGSST 2006 冬季講習會
LID systems • System conditions • Content-independent • Speaker-independent NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches • The earliest automatic LID system. • Use conventional spectral or cepstral feature vectors. NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches • Cimarusti and Ives (1982 ICASSP) • Read speech • 5 speakers, 8 languages • 100-dim feature vector • 15 area functions, 15 autocorrelation coefficients, 5 bandwidths, 15 cepstral coefficients, 15 filter coefficients, 5 formant frequencies, 15 log area ratios, and 15 reflection coefficients. NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches • Foil (1986 ICASSP) • Noisy radio signals (~5 dB) • 3 languages • Information from pitch, energy, and formant • 45-dim feature vector • 23-dim from energy • 22-dim from pitch • VQ codebook (10 clusters) for formants NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches • Goodman et al. (1989 ICASSP) • Improved version of Foil’s work. (~9 dB) • 6 languages • Formant-cluster algorithm used an LPC-12 autocorrelation analysis. • The parameters used were log-amplitude values A1, A2, A3, and formant values F1, F2, F3. • Formant-based method is superior than LPCC-based method. NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches • Sugiyama (1991 ICASSP) • 20 languages • VQ based approach • Standard VQ algorithm • VQ histogram algorithm (common codebook) • Autocorrelation coefficients, LPC coefficients, delta-cepstrum coefficients. NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches • Zissman (1993 ICASSP) applied GMM to LID task. C: Cepstrum, D: Delta-cepstrum NGSST 2006 冬季講習會
LID systemsProsody-based approaches • Savic (1991 ICASSP) • Pitch information is useful for discriminating Spanish from Mandarin • Human can use prosodic features (Muthusamy, 1994 ICASSP) • Tonal-languages (Mandarin, Vietnamese) • Speech rate (Spanish) NGSST 2006 冬季講習會
LID systemsProsody-based approaches • Itahashi (1994 ICSLP) argues that pitch estimation is more robust in noisy environment. • Based on fundamental frequency, 21 features totally. • Polygonal line approximation of F0 pattern. • Use PCA to perform discriminant analysis NGSST 2006 冬季講習會
LID systemsProsody-based approaches • Thyme-Gobbel (1996 ICSLP) • Syllable-based pitch contour • Syllable duration • Amplitude • Rhythm • Phrase location • Pitch is the most distinguishable feature. NGSST 2006 冬季講習會
LID systemsProsody-based approaches • Ramus (1999 JASA) • A study based on speech resynthesis. • Global intonation (aaaa, sasasa) • Syllabic rhythm (sasasa ,flat sasasa) • Broad phonotactics (saltanaj) NGSST 2006 冬季講習會
LID systemsProsody-based approaches • Rouas (2003 ICASSP, 2005 Speech Comm.) • Rhythmic parameter • Duration of consonant and vowel • Complexity of CV segment. • Fundamental frequency parameter • Skewness and kurtosis of F0 • Accent location NGSST 2006 冬季講習會
LID systemsProsody-based approaches • Rouas (2005 Eurospeech) • Long-term and short-term prosody modeling. • N-gram model. • Long-term • Prosodic movements over several pseudo-syllables • Short-term • Prosodic movements inside a pseudo-syllable. NGSST 2006 冬季講習會
LID systemsProsody-based approaches NGSST 2006 冬季講習會
LID systemsProsody-based approaches • Lin (2005, 2006 ICASSP) • Pseudo-syllable segmentation • Pitch contours were represented by a set of Legendre polynomials • Dynamic model instead of static model NGSST 2006 冬季講習會
LID systemsProsody-based approaches • However, Hazen (1993) showed that features derived from prosodic information provided little language discriminability when compared to a phonetic system. • Performance of approach based on prosodic information degrades in N-way identification task when N becomes large. NGSST 2006 冬季講習會
LID systemsProsody-based approaches • Advantage of prosody-based system • Robust to channel effect and noise. • Require little transcriptions and training data. NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches • Different languages have different phone inventories and different phonotactics. • Zissman (1994 ICASSP) • PRLM • P-PRLM NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches • Phone recognition followed by language modeling (PRLM) • N-gram probability distributions are trained from the output of the single-language phone recognizer, not from human-supplied labels. NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches • Parallel PRLM (PPRLM, an extension of PRLM) NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches • PPRLM tries to incorporate phones from more than one language into a PRLM-like system. • The only limitation is the number of languages for which labeled training speech is available. • Achieve the best performance among all methods in LID task. NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches • Yan (ICASSP 1995) Forward-Bigram Backward-Bigram Combination NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches • Torres-Carrasquillo (2002, ICASSP & ICSLP) • Variation of PRLM-like system. • Use GMM tokenizer instead of phone recognizer as front-end processing. • Language models are trained on the values of “token index”. • Shifted delta cepstral feature. • Do not need any transcription. NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches Feature vector Xn is representedby token index 2. Token sequence 2221321113323111123213213… Apply language model NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches • To make phone-recognition-based LID systems easier to train, one can use a single-language phone recognizer as a front end to a system that uses phonotactic scores to perform LID. • Language ID could be performed successfully even when the front end phone recognizer(s) was not trained on speech spoken in the languages to be recognized. NGSST 2006 冬季講習會
LID systemsUsing multilingual speech units • Focus on the problem of identifying and processing only those phones that carry the most language discriminating information. • Mono-phonemes • Phonemes whose acoustic realizations in one language overlap little or not at all with those in another language. • Poly-phonemes • Phonemes whose acoustic realizations are similar enough across many languages. NGSST 2006 冬季講習會
LID systemsUsing multilingual speech units • Dalsgaard (ICSLP 1994) • Four European languages • Danish, English, German, Italian • 134 phoneme models Mono-phonemes Poly-phonemes NGSST 2006 冬季講習會
LID systemsUsing multilingual speech units • Berkling (1994 ICASSP) • 3 languages (English, German, Japanese) f (1.3) GE Language with larger frequency of occurence Ratio Label NGSST 2006 冬季講習會
LID systemsUsing multilingual speech units • Köhler (1998) • Single multi-language (6 languages) front end phone recognizer. • 24 mel-scaled cepstral, 12 delta cepstral, 12 delta delta cepstral, energy, delta energy, delta delta energy. • Feature vectors were transformed by a LDA. • Monophones -> multilingual phones NGSST 2006 冬季講習會
LID systemsWord level approaches • These systems use more sophisticated sequence modeling than the phonotactic models of the phone-level systems, but do not employ full speech-to-text systems. NGSST 2006 冬季講習會
LID systemsWord level approaches • Kadambe and Hieronymus (1995) • Trigram phonotactics & lexicon matching • 4 languages NGSST 2006 冬季講習會
LID systemsWord level approaches • Ramesh and Roe (1994) • Use of embedded word models of frequently occurring words and phrases. • Multiple-mixture left-to-right CDHMM, LPC cepstrum based features. NGSST 2006 冬季講習會
LID systemsWord level approaches • Lund and Gish (1995 Eurospeech) • Pseudo-word Language Model (PWLM) • Pseudo-words are the frequently occurring sub-sequences within the phoneme recognition output. • Finding pseudo-word candidates is a time-consuming task. NGSST 2006 冬季講習會
LID systemsWord level approaches • Gao (2005 Eurospeech) • Applied techniques from document retrieval. • Spoken document categorization • Latent semantic indexing NGSST 2006 冬季講習會
LID systemsContinuous speech recognition • Several large-vocabulary continuous-speech recognition systems were used in parallel for language ID. • Architecture is similar to PRLM and PPRLM • During testing, recognizers run in parallel, and the one yielding output with highest likelihood is selected as the winning recognizer. • Sometime was called parallel phone recognition (PPR). NGSST 2006 冬季講習會