250 likes | 355 Views
Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre for Speech Technology. recreate in real-time the articulatory movements of a speaker with an talking head, using the speech signal only. Applications: Communication help for HOH people
E N D
Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre for Speech Technology
recreate in real-time the articulatory movements of a speaker with an talking head, using the speech signal only. Applications: Communication help for HOH people Second language learning Speech therapy Aim of the INRIA-KTH collaboration
Articulation display is important to understand English pronunciation as well as a support for perception. In this demonstration: Voice Activation Detection is achieved (separation of dpeech and and non-speech) English phoneme recognition is performed English articulation is analysed Articulation is displayed Articulation
21 18 24 27 15 12 9 3D Reconstruction The reconstruction was made using a semi-polar grid of 20 gridlines. One contour per image. A polygon mesh of 420 vertices and about 800 polygons was constructed.
Qualisys optical motion tracking: 4 IR cameras 28 reflectors 3 reference reflectors on headmount C C C C V R C Rf Audio & video recorders V Movetrack Electromagnetic Articulograph: 6 coils; upper lip, upper & lower incisors, three tongue coils: 8, 20 and 52 mm from the tip. Models have been adapted to English
Prosody is important for message understanding. It is present both in speech sound and in facial expressions and gestures. Some prosody information is extracted from the signal as: Fundamental frequency (F0) Energy Speech rate and displayed with the talking head. Prosody
F0 comparison between French and Native Speaker Please note that the F0 and narrow band spectogram scales are different
Speech rate can be computed as the average number of phonemes produced by second. We define it as a ratio between: the average duration of the produced phonems The average duration of the same phonems in the phonem recognizer trainning database. Speech rate
The teacher, the learner or the speech therapist speaks The talking head reproduces what has been uttered showing articulators The talking head shows what should have been articulated. This is a first step towards an interactive learning loop. Usage Scenario
The student and the teacher can have a closer look at the articulation and prosody…
The student and the teacher can watch together the correct articulation and prosody…
And of course the teacher can give more detailed explanations and advices…