10 likes | 178 Views
Automatic Modelling of Rhythm and Intonation for Language Identification. English. Jean-Luc ROUAS 1 , Jérôme FARINAS 1 , François PELLEGRINO 2 and Régine ANDR É -OBRECHT 1 {jean-luc.rouas, jerome.farinas, regine.andre-obrecht}@irit.fr; francois.pellegrino@univ-lyon2.fr.
E N D
Automatic Modelling of Rhythm and Intonation for Language Identification English Jean-Luc ROUAS1,Jérôme FARINAS1, François PELLEGRINO2 and Régine ANDRÉ-OBRECHT1 {jean-luc.rouas, jerome.farinas, regine.andre-obrecht}@irit.fr; francois.pellegrino@univ-lyon2.fr 1Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS - Université Paul Sabatier - INP 31062 Toulouse Cedex 4 - France 2Laboratoire Dynamique du Langage UMR 5596 CNRS - Université Lumière Lyon 2 69363 Lyon Cedex 7 - France Pseudo Syllable Duration Parameters Recognition • Speech segmentation: statistical segmentation [André-Obrecht, 1988] • Shorts segments (bursts and transient parts of sounds) • Longer segments (steady parts of sounds) • Speech Activity Detection and Vowel detection [Pellegrino & Obrecht, 2000] • Spectral analysis of the signal • Language and speaker independent algorithm • Pseudo Syllable segmentation • Derived from the most frequent syllable structure in the world: CV • The speech signal is parsed in patterns matching the structure: • Cn V (n integer, can be 0). • 3 parameters are computed: • Global consonantal segments duration Dc • Global vocalic segment duration Dv • Syllable complexity (Nc: number of consonantal segments in the pseudo-syllable) • CCV = {DC DV NC} Each pseudo-syllable is characterized by two vectors, one characterizing rhythmic units and the other characterizing intonation on each of these rhythmic units. For each language two Gaussian Mixture Models are learned to characterize the language specific CCV and F0 distributions, using the EM algorithm with VQ initialization. Results of the rhythmic system on read speech (MULTEXT corpus) Model Item French Duration parameters extraction German Pseudo syllable generation Models Signal Speech activity detection Speech Segmentation Vowel detection Recognition Language Model Item Models Intonation parameters extraction Italian Japanese Spanish 8 : Results of the intonation system on read speech (MULTEXT corpus) Frequency (kHz) Intonation Parameters Experiments Conclusion 0 el a m E E t e b n • Fundamental frequency extraction: • « MESSIGNAIX » toolbox: combination of three methods (amdf, spectral comb, autocorrelation) • Fundamental frequency features: • 2 parameters are computed: • a measurement of the accent location (maximum F0 location regarding to vocalic onset F0) • the normalized fundamental frequency bandwidth on each syllable (F0). • F0 = {F0F0} • Rhythmic model: • Catch the rhythmic information. • Achieves a clear separation between languages of different rhythmic classes. • Don’t catch the languages’ rhythm but characterizes language-specific rhythmic units, • Model sequences of these units to get a complete rhythm model. • Fundamental frequency model: • Better characterizes languages which have well defined intonational rules like Japanese and English. • We need to test our system on many more languages to confirm (or not) the linguistic classes hypothesis. Experiments were previously made on the five languages of the MULTEXT database: English, French, German, Italian and Spanish. Japanese was added thanks to Mr. Kitasawa. The tests are made using 20 seconds read speech utterances and consist in a six-way identification task. On this read speech corpus, our rhythm system can achieve good performance (79 % of correct identification on six languages) while the intonation system allows us to reach up to 62 % of correct identifications. Amplitude 0 0.2 0.4 0.6 0.8 1.0 Time (s) CCVV CCV CV CCCV CV Vowel Non Vowel Pause • Rhythm : • Duration C • Duration V • Complexity C • Intonation : • F0 • F0