Adaptation of orofacial clones to the morphology and control strategies

Adaptation of orofacial clones to the morphology and control strategies of target speakers for speech articulation Julián Andrés VALDÉS VARGAS Jury: Michel DESVIGNES (President) Yves LAPRIE (Reviewer) Rudolph SOCK (Reviewer) Thierry LEGOU (Examiner) Pierre BADIN (Thesis Director) 1

Summary • Context of visual articulatory feedback • Articulatory data • Individual models and characterisation • Multi-speaker models • Conclusions and perspectives 2

Context • Mastery of articulators for speech production • Skill maintained/improved by Perception-action loop (Matthies et al., 1996) • Feedback in speech • Auditory • proprioceptive 4

Vision of articulators • Augmented speech  Visual feedback • Display of articulators • Vision of lips and face • Improves speech intelligibility (Sumby and Pollack, 1954) • Speech imitation is faster (Fowler et al., 2003) • Vision of hidden articulations • Increases intelligibility (Badin et al.,2010) 5

Visual articulatory feedback system • System of visual articulatory feedback (Ben Youssef et al., 2011) • Applications • Speech rehabilitation • Computer Aided Pronunciation Training (CAPT) Speech sound signal of a given speaker Visual articulatory feedback system Clone’s animation 6

Problem of articulatory adaptation • Animation of clone based on a single speaker • Adaptation to several speakers Animation based on entry speaker Animation based on reference speaker Mismatchbetweenclone’s animationand real speakers Speech sound speaker 1 Speech sound speaker 2 Visual articulatory feedback system Acoustic Adaptation (Atef BEN YOUSSEF) Speech sound speaker n Articulatory adaptation 7

Morphology Different vocal tracts Size, vertical / horizontal lengths ratios Shape (e.g. concave / flat palates) Articulatory control strategies Cope with morphology  different articulatory strategies to achieve sounds considered equivalent for speech communication purposes Inter-speaker variability 8

Illustration of speaker differences Speaker PB Speaker AA Speaker YL /a/ /i/ /u/ 9

Objectives • Articulatory adaptation (Initial objective) •  normalization: extraction of common components (patterns) to control the articulators of several speakers. • To acquire knowledge about inter-speaker variability 10

Articulatory data • Type of data  Articulatory data  Building articulatory models • Inter-speaker variability: • 11 French speakers (6 males and 5 females) • Articulatory phonetic coverage: • 13 vowels • 10 consonants in 5 vocalic contexts (vowel-consonant-vowel) • 63 articulations in total 12

Recording Methods • Several recording methods considered: • X-ray (Meyer (1907) ,Mosher (1927)) • Difficult to accurately identify the contours • Electro-Magnetic Articulography (EMA) • No recording of the whole vocal tract • Magnetic Resonance Imaging (MRI) (Rokkaku et al., 1986) • Tomographic (imaging by sections) • Maintained vocal tract positions • Speakers in supine position • Gravitational effect is moderate (Engwall (2003; 2006) ) 13

Decision to use MRI • Whole vocal tract information ≠ EMA • Contours easier to identify compared to X-ray • No health hazard compared to X-ray • Recording parameters: • Midsagittal image of the vocal tract • Slice thickness: 4 mm • Spatial resolution: 1 mm / pixel • Acquisition time: 8 -16 seconds 14

MRI Recording • The speaker is asked to go through several stages • Speakers lay in supine position • Bed shifted into the MRI machine • Setting up of alignment recording properties • Maintained pronunciation of articulations for 8-16 seconds. • Speakers are asked not to move their heads 15

Processing of MRI • Rigid contours are drawn once for a given speaker • Positioning of palate using skull bones as reference • Rotation and translation • Positioning of jaw by means of rototranslations • Edition of deformable contours: Lips, tongue, velum, etc. • Palate of all articulations are aligned • Avoidance of noise introduced by head moving • Midsagittal contours manually edited /a/ /i/ /u/ 16

Contours modelled • Upper tongue: 150 (x,y) points • Lips: 100 (x,y) points • Velum: 150 (x,y) points • Static data  Articulatory study/models 17

Universal control parameters • Extraction of common set of patterns (components) • Goals: • Building individual-speaker articulatory models • Controlling all individual articulatory models from a universal set of components CP/a/ CP /i/ CP/u/ CP/a/ CP/i/ CP/u/ CP/a/ CP/i/ CP/u/ Speaker 1 Mspeaker1 Mspeaker2 Speaker 1 Universal model /a/ /i/ /u/ /a/ /i/ /u/ /a/ /i/ /u/ /a/ /i/ /u/ Speaker specific weights Speaker 2 Speaker 2 Universal Set of Components Individual articulatory models Articulator contours of individual speakers Articulator contours of individual speakers Components 19

Method for individual models of speakers • Principal component analysis (PCA) • dimensionality reduction  extraction of orthogonal components 20

Assessment of models • Evaluation of model for a individual speaker X • Variance explanation • Root Mean Square Error (RMSE) 21

Generalization properties of models • Performance of models to reconstruct data that was not used for training • Leave-one-out cross validation procedure (a.k.a. Jackknife) • Observation left out  Reconstruction of observation left out by inverting the model  Validation of generalization properties  Valuable predictors retained 22

Individual tongue models • First component extracted by Linear regression • Jaw Height (predictor) •  Three degrees of freedom: x,y translation and rotation (Edwards & Harris, 1990) •  Normalized value of the y-coordinate of the lower incisor (Badin & Serrurier (2006)) • Guided PCA model (Badin & Serrurier (2006)) • 4 components extracted Corr(Y,θ) ≈ 0.92 (X,Y) 23

Individual tongue models • Other 3 components extracted by PCA from the residue: • Tongue Body (TB) • Tongue Dorsum (TD) • Tongue Tip (TT) 24

Comparison between components Speaker LD Speaker RL Speaker AK Y-Tongue = Coefficients_LR * JH • JH component: • Max. variance: LD • Min. variance: RL, MG, AK • Compensation strategy of MG • TB component: • Represents more variance than other components • Horizontal/diagonal back-front movement • TD component: • vertical/diagonal arching movement • TT component: • Used in different proportion according to the speaker • Nomograms: graphical representation of components • Variation between -3 to 3 27

Individual lips models Speaker LD Speaker RL • 3 components extracted by Guided PCA model (Badin et al., 2012) • Jaw Height • More influence on LL than UL • Little influence on UL for RL • Protrusion • ULP > LLP for speaker LD • LLP > ULP for speaker RL • Lip height • ULH > LLH for all speakers • Except for speaker LD 25.2% 52.7% 12.7% 28.6% 15.4% 44.6% 1.7% 21.9% 55% 20.5% 31% 34.8% 28

Individual velum models • 2 components extracted by PCA (Serrurier & Badin, 2008): • Velum levator (Oblique movement) - VL • Superior pharyngeal constrictor (horizontal movement) - VS VS VL 29

Individual velum models: consonant /ʁ/ VS VS VL VL /ʁa/ 30 Speaker AA Speaker HL

Conclusions: individual models • Tongue PCA models: 4 components (JH,TB,TD,TT) • Variance Explained: 93%, RMSE: 0.13 cm • Lip models: 3 components (JH, Protrusion, Height) • Variance Explained: 94%, RMSE: 0.04 cm • Velum models: 2 components (VL, VS) • Variance Explained: 90%, RMSE: 0.08 cm 31

Literature on multi-speaker models • PARAFAC models : 2 components extracted • Studies based on EMA (Hoole(1998), Geng(2000), Hu(2006)) • 6-7 speakers, 10-15 vowels, 3-4 sensors on the tongue, 80%-96% variance explained. • Study based on X-ray: Harshman(1977) • 5 speakers, 10 vowels, 13 points, 92.7% • Studies based on MRI (Hoole(2000), Zheng(2003), Ananth(2010)) • 3-9 speakers, 7-13 vowels, 13-150 points, 71%-87% of variance exp. 33

Multi-speaker decomposition methods • Extraction of common set of components • PARAFAC (Harshman,1970) (three-way factor analysis, diagonal speaker adaptation matrix) 34

Multi-speaker decomposition methods • TUCKER 3 • Extension of PARAFAC • Decomposition in all modes of variation 35

Multi-speaker decomposition methods • Joint PCA (two-way analysis adapted to multi-speaker models) (Ananthakrishnan et al. (2010) – KTH(Sweden)) • All speakers articulatory measurements for one phoneme considered as one set of data • forces common components 36

RMSE and Variance Explained (VarEx) multi-speaker model (red, green, black) vs. average of individual speakers’ models (blue) Comparison of performance between methods VarEx RMSE 37

Reference PCA model with 4 components Total number of components: 11 x 4 = 44 Student's t-test for RMSE at 5% signif. level Joint PCA: 14 – 21 components ( TUCKER ) PARAFAC: 21 components Multi-speaker Tongue models • Student's t-test -> determine if the RMSE of models are significantly different from each other VarEx RMSE 38

Individual models: Reference PCA model with 44 (11 x 4) components VarEx: 93.23 % RMSE: 0.13 cm Multi-speaker models: Joint PCA with 4 components VarEx: 72.16 % RMSE: 0.27 cm Interpretation of components: JH, TB, TD and TT Equivalent solution: Joint PCA, 21 components VarEx: 94.88% RMSE: 0.12 cm Lack of interpretation from the 5th component Multi-speaker Tongue models Literature No. Components: 2 VarExp: 71% - 96% Corpus: 7-15 vowels Speakers: 3-9 Present study Corpus: 63 articulations (vowels and consonants) Speakers: 11 speakers 39

Multi-speaker modelslips and velum • Lips and velum models comparable with tongue models • Lips •  individual models: 33 components (3 * 11) •  multi-speaker joint PCA models: equivalent with 21 components •  Reduced no. of components: 3 interpretable components • (JH, protrusion, lip height) • Velum •  individual models: 22 components (2 * 11) •  multi-speaker joint PCA models: equivalent with 14 components •  Reduced no. of components: 2 components • (Oblique, horizontal) 40

Conclusions • Data • Unique set of articulatory data for French • MRI for the whole vocal tract for 11 French speakers • Contours • Vowels and consonants • More speakers compared to the literature • Characterisation of different speakers’ strategies • Tongue • Upper and lower lip • Velum • Multi-speaker models (normalisation) of tongue, lips and velum contours • No work in the literature on lips and velum 42

Perspectives • More speakers • Relation between articulatory strategies and acoustics • Cross-speaker velum variability • Influence of the tongue movement • Nasality •  new modelling solutions • Non-linear methods: • Kernel PCA • Artificial Neural Networks (ANN) • Support Vector Machines (SVM) 43

Acknowledgments • Laurent Lamalle (IRMaGe, Grenoble) • Speakers • ARTIS project (GIPSA-lab, LORIA) 43

Thank you for your attention Questions? 44

Grid system • Maeda S. (1979)  Fix grid • Busset J.(2013) : Adaptive grid system • Euclidean coordinates (intersections) • Distances and extreme angles • Polar coordinates (distances and angles for each grid line) • Beautemps et al. (2001): adapted to each articulation  Euclidean coordinates  Distances and TngAdv + TngBot 46

Corr(Y-jaw,Angle_rotation) PB = 0.6611 YL = 0.7385 LH = 0.7174 RL = 0.3946 LD = 0.8423 BR = 0.7764 HL = 0.7913 AA = 0.4952 MG = 0.4151 AK = 0.8317 MGO = 0.9228 (X,Y) 47

Acoustic simulation • Grid system  Midsagittal function • vocal tract area function (series of areas and lengths of each sagittal section) • α , β models (Beautemps et al.1995; Heinz & Stevens, 1965) A = Area of a given grid section, d = midsagittal distance α , β coefficients depending on subject and vocal tract location  α , β according to speaker of reference: PB • vocal tract acoustic transfer function (Fant, 1960; Badin & Fant, 1984) • Formants 48

No. Coefficients by method 49

“Essentially, all models are wrong, but some are useful“ George Edward Pelham Box 50

Adaptation of orofacial clones to the morphology and control strategies

Adaptation of orofacial clones to the morphology and control strategies

Presentation Transcript

Orofacial structures

Adaptation Opportunities, Barriers and Strategies

Morphology Guided Adaptation

Analyzing Adaptation Strategies

Clones

Strategies for Adaptation

VHT Control and Link Adaptation

Attack of the Clones!

The UK response: adaptation and mitigation strategies

National adaptation strategies, vulnerability and adaptation research activities

TRADEOFF ANALYSIS OF ADAPTATION STRATEGIES IN THE PHILIPPINES

National adaptation strategies, vulnerability and adaptation research activities

No. of clones

Clones

Adaptation method and Strategies

China: National Adaptation Programs and Strategies.

Management and Adaptation Strategies

Orofacial Embryology

Differential Diagnosis of Orofacial Pain

Orofacial pain

Climate Change and Adaptation Strategies

Attack of the Instagram clones