1 / 1

Automatic Modelling of Rhythm and Intonation for Language Identification

Automatic Modelling of Rhythm and Intonation for Language Identification. English. Jean-Luc ROUAS 1 , Jérôme FARINAS 1 , François PELLEGRINO 2 and Régine ANDR É -OBRECHT 1 {jean-luc.rouas, jerome.farinas, regine.andre-obrecht}@irit.fr; francois.pellegrino@univ-lyon2.fr.

xaria
Download Presentation

Automatic Modelling of Rhythm and Intonation for Language Identification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Modelling of Rhythm and Intonation for Language Identification English Jean-Luc ROUAS1,Jérôme FARINAS1, François PELLEGRINO2 and Régine ANDRÉ-OBRECHT1 {jean-luc.rouas, jerome.farinas, regine.andre-obrecht}@irit.fr; francois.pellegrino@univ-lyon2.fr 1Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS - Université Paul Sabatier - INP 31062 Toulouse Cedex 4 - France 2Laboratoire Dynamique du Langage UMR 5596 CNRS - Université Lumière Lyon 2 69363 Lyon Cedex 7 - France Pseudo Syllable Duration Parameters Recognition • Speech segmentation: statistical segmentation [André-Obrecht, 1988] • Shorts segments (bursts and transient parts of sounds) • Longer segments (steady parts of sounds) • Speech Activity Detection and Vowel detection [Pellegrino & Obrecht, 2000] • Spectral analysis of the signal • Language and speaker independent algorithm • Pseudo Syllable segmentation • Derived from the most frequent syllable structure in the world: CV • The speech signal is parsed in patterns matching the structure: • Cn V (n integer, can be 0). • 3 parameters are computed: • Global consonantal segments duration Dc • Global vocalic segment duration Dv • Syllable complexity (Nc: number of consonantal segments in the pseudo-syllable) • CCV = {DC DV NC} Each pseudo-syllable is characterized by two vectors, one characterizing rhythmic units and the other characterizing intonation on each of these rhythmic units. For each language two Gaussian Mixture Models are learned to characterize the language specific CCV and F0 distributions, using the EM algorithm with VQ initialization. Results of the rhythmic system on read speech (MULTEXT corpus) Model Item French Duration parameters extraction German Pseudo syllable generation Models Signal Speech activity detection Speech Segmentation Vowel detection Recognition Language Model Item Models Intonation parameters extraction Italian Japanese Spanish 8 : Results of the intonation system on read speech (MULTEXT corpus) Frequency (kHz) Intonation Parameters Experiments Conclusion 0 el a m E  E t  e b  n • Fundamental frequency extraction: • « MESSIGNAIX » toolbox: combination of three methods (amdf, spectral comb, autocorrelation) • Fundamental frequency features: • 2 parameters are computed: • a measurement of the accent location (maximum F0 location regarding to vocalic onset F0) • the normalized fundamental frequency bandwidth on each syllable (F0). • F0 = {F0F0} • Rhythmic model: • Catch the rhythmic information. • Achieves a clear separation between languages of different rhythmic classes. • Don’t catch the languages’ rhythm but characterizes language-specific rhythmic units, • Model sequences of these units to get a complete rhythm model. • Fundamental frequency model: • Better characterizes languages which have well defined intonational rules like Japanese and English. • We need to test our system on many more languages to confirm (or not) the linguistic classes hypothesis. Experiments were previously made on the five languages of the MULTEXT database: English, French, German, Italian and Spanish. Japanese was added thanks to Mr. Kitasawa. The tests are made using 20 seconds read speech utterances and consist in a six-way identification task. On this read speech corpus, our rhythm system can achieve good performance (79 % of correct identification on six languages) while the intonation system allows us to reach up to 62 % of correct identifications. Amplitude 0 0.2 0.4 0.6 0.8 1.0 Time (s) CCVV CCV CV CCCV CV Vowel Non Vowel Pause • Rhythm : • Duration C • Duration V • Complexity C • Intonation : • F0 • F0

More Related