20 likes | 260 Views
8. Frequency (kHz). 4. False rejection (%). False Alarm (%). 0. el. a. m. E. . E. t. . e. b. . n. Amplitude. 0. 0.2. 0.4. 0.6. 0.8. 1.0. Time (s). Vowel. Pause. Non Vowel. Rhythm Modeling. Vowel System Modeling. Vowel System Modeling. Vowel System Models.
E N D
8 Frequency (kHz) 4 False rejection (%) False Alarm (%) 0 el a m E E t e b n Amplitude 0 0.2 0.4 0.6 0.8 1.0 Time (s) Vowel Pause Non Vowel Rhythm Modeling Vowel System Modeling Vowel System Modeling Vowel System Models Mean Identification Rate: 79% Discussion Jérôme FARINAS1, François PELLEGRINO2, Jean-Luc ROUAS1 and Régine ANDRÉ-OBRECHT1 {farinas, rouas, obrecht}@irit.fr; pellegrino@univ-lyon2.fr MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION 1Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS - Université Paul Sabatier - INP 31062 Toulouse Cedex 4 - France 2Laboratoire Dynamique du Langage UMR 5596 CNRS - Université Lumière Lyon 2 69363 Lyon Cedex 7 - France Vowel / Non Vowel Segmentation • Speech segmentation: statistical segmentation (André-Obrecht, 1988) • Shorts segments (bursts and transient parts of sounds) • Longer segments (steady parts of sounds) • Speech Activity Detection and Vowel detection • Spectral analysis of the signal • Vowel detection (Pellegrino & Obrecht, 2000) • Language and speaker independent algorithm Vowel / Non Vowel Segmentation signal The speech signal is parsed in patterns matching the structure: Cn V (n integer, can be 0). (For the above example: CCVV.CCV.CV.CCCV.CV) Pseudo-syllable Segmentation Acoustic Modeling Each vowel segment is represented with a set of 8 Mel-Frequency Cepstral Coefficients and 8 delta-MFCC, augmented with the Energy and delta Energy of the segment. This parameter vector is extended with the duration of the underlying segment. Example for a .CCV. syllable: • 3 parameters are computed: • Global consonant cluster duration • Global vowel duration • Complexity of the consonantal cluster • With the same .CCV. example: Pseudo-syllable Modeling Rhythm Models Vowel System Likelihoods Rhythm Likelihoods For each language, a Gaussian Mixture Model (GMM) is trained using the EM algorithm. The number of components of the model is computed using the LBG-Rissanen algorithm. During the test, the decision lays on a Maximum Likelihood procedure. Merging A simple statistical merging is performed by adding the log-likelihoods of both the Rhythm model and the VSM for each language. Decision Rule L* Rhythm Modeling Merging Mean Identification Rate: 83% Mean Identification Rate: 70% We propose two algorithms dedicated to Automatic Language Identification. Experiments, performed with cross-validation, show that it is possible to achieve an efficient rhythmic modeling (78% of correct identification) in a way that requires no a priori knowledge of the rhythmic structure of the processed languages. Besides, the Vowel System Model reaches 70% of correct identification. With these read data, merging the two approaches improves the identification rate up to 83%.