Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team

Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre for Speech Technology

recreate in real-time the articulatory movements of a speaker with an talking head, using the speech signal only. Applications: Communication help for HOH people Second language learning Speech therapy Aim of the INRIA-KTH collaboration

Articulation display is important to understand English pronunciation as well as a support for perception. In this demonstration: Voice Activation Detection is achieved (separation of dpeech and and non-speech) English phoneme recognition is performed English articulation is analysed Articulation is displayed Articulation

New VAD is a combination GMM and automaton

21 18 24 27 15 12 9 3D Reconstruction The reconstruction was made using a semi-polar grid of 20 gridlines. One contour per image. A polygon mesh of 420 vertices and about 800 polygons was constructed.

Qualisys optical motion tracking: 4 IR cameras 28 reflectors 3 reference reflectors on headmount C C C C V R C Rf Audio & video recorders V Movetrack Electromagnetic Articulograph: 6 coils; upper lip, upper & lower incisors, three tongue coils: 8, 20 and 52 mm from the tip. Models have been adapted to English

Prosody is important for message understanding. It is present both in speech sound and in facial expressions and gestures. Some prosody information is extracted from the signal as: Fundamental frequency (F0) Energy Speech rate and displayed with the talking head. Prosody

Pitch(F0): Comb filters estimation

F0 comparison between French and Native Speaker Please note that the F0 and narrow band spectogram scales are different

Speech rate can be computed as the average number of phonemes produced by second. We define it as a ratio between: the average duration of the produced phonems The average duration of the same phonems in the phonem recognizer trainning database. Speech rate

Speech rate

The teacher, the learner or the speech therapist speaks The talking head reproduces what has been uttered showing articulators The talking head shows what should have been articulated. This is a first step towards an interactive learning loop. Usage Scenario

A French student pronounces an English sentence…

The student and the teacher can have a closer look at the articulation and prosody…

The teacher can pronouce the sentence as it should be…

The student and the teacher can watch together the correct articulation and prosody…

And of course the teacher can give more detailed explanations and advices…

Thank you.

Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team

Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team

Presentation Transcript

Automatic Speech Recognition

Automatic Speech Recognition

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Automatic speech recognition

Automatic Speech Recognition II

Automatic Speech Recognition System

Articulatory Feature-Based Speech Recognition

Automatic Speech Recognition

Automatic Continuous Speech Recognition

Articulatory Feature-Based Speech Recognition

Automatic Speech Recognition Studies

Acoustic to articulatory inversion of speech Yves Laprie Speech Group INRIA Lorraine

Automatic Speech Recognition Introduction

Automatic Speech Recognition

Automatic Speech Recognition - Edukite

Articulatory Feature-Based Speech Recognition

Automatic Speech Recognition Introduction

Automatic Speech Recognition Introduction

Automatic Speech Recognition

Articulatory Feature-Based Speech Recognition

Acoustic Landmarks and Articulatory Phonology for Automatic Speech Recognition