10 likes | 160 Views
Acknowledgment. Example of speech and music classification. Experiment. General schema of the classification system. Abstract. Training of the four GMMs. Differentiated modeling. Introduction. Training. Class model. Non-Class model. Audio file indexed. Signal. CLASSIFICATION.
E N D
Acknowledgment Example of speech and music classification Experiment General schema of the classification system Abstract Training of the four GMMs Differentiated modeling Introduction Training Class model Non-Class model Audio file indexed Signal CLASSIFICATION Acoustic preprocessing SMOOTHING (max log-likelihood) 32 Gaussian laws Speech model Manual indexing (speech) Speech parameters VQ EM Assignment Cepstral Coeff. Non-Speech model Non-Speech parameters VQ EM 18 Signal Preprocessing Music model Music parameters 29 VQ EM Spectral Coeff. Assignment Non-Music model Non-Music parameters VQ EM Manual indexing (music) 16 Gaussian laws Speech and Music Classification in Audio Documents Julien PINQUIER, Christine SÉNAC and Régine ANDRÉ-OBRECHT Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS – INP – UPS, FRANCE 118, Route de Narbonne, 31062 Toulouse Cedex 04 {pinquier, senac, obrecht}@irit.fr In order to index efficiently the soundtrack of multimedia documents, it is necessary to extract elementary and homogeneous acoustic segments. In this paper, we explore such a prior partitioning which consists to detect the two basic components: the speech and the music components. The originality of this work is that music and speech are not considered as two classes. Two classification systems are independently defined: a speech/non-speech one and a music/non-music one. This approach permits to better characterize and discriminate each component: in particular, two different feature spaces are necessary as two pairs of Gaussian mixture models. Then, the acoustic signal is classified into four types of segments: speech, music, speech-music and other. The experiments are performed on the soundtracks of audio video documents (films, TV sport broadcasts). The performance proves the interest of this approach, so called the Differentiated Modeling Approach. • Audio indexing: finding appropriate audio descriptors in signal to better characterize each document. • Detect two basic components (Speech and Music). • The indexing system is based on a differentiated modeling. • The modeling is based on a Gaussian Mixture Model (GMM). • Goal: detection of the two basic class (speech and music) with an equal performance. • How: extraction of elementary and homogeneous acoustic components for each class. • Speech: formantic structure • Music: harmonic structure • 1 class = {Representation space, Class model, Non-Class model} • Corpus (114 mn) • Championship of figure skating, sport report and TV movie • Contents: speech, music and ‘mixed’ zones • Speech: phone call, outdoor recordings, crowd… • Music: stringed and wind instrument, bass, drum… • Main speakers: 9 (3 women and 6 men) • Training corpus: 74 mn • Test corpus: 40 mn • Models: Speech, Non-Speech, Music and Non-Music Results Speech Music Delay < 20 cs (1) 96 % 91 % Accuracy (2) 99 % 93 % Speech class = {Cepstral space, Speech model, Non-Speech model} Music class = {Spectral space, Music model, Non-Music model} • Delay measure between manual and automatic labels. • Accuracy = (length test corpus – length insertions – length deletions) / (length test corpus) • (a) Manual indexing • (b) Speech / Non-Speech indexing • (c) Music / Non-Music indexing • (d) Fusion of speech and music classifications • with P (speech), M (music), NP (non-speech), NM (non-music) and – (other) ICASSP’2002 Orlando (Florida) We would like to thank the CNRS (French national center for scientific research) for its support to this work under the RAIVES project.