280 likes | 915 Views
Factor Analysis of MRI-Derived Tongue Shapes Mark Hasegawa-Johnson ECE Department and Beckman Institute University of Illinois at Urbana-Champaign Background The vowel sounds of English are classified in two dimensions: “high/low” and “front/back.” u High i e o ae a Low Front Back
E N D
Factor Analysis of MRI-Derived Tongue Shapes Mark Hasegawa-Johnson ECE Department and Beckman Institute University of Illinois at Urbana-Champaign
Background The vowel sounds of English are classified in two dimensions: “high/low” and “front/back.” u High i e o ae a Low Front Back
Background Tongue is composed of about 9 muscles (4 intrinsic, 5 extrinsic) Superior Longitudinalis Palatoglossus Styloglossus Verticalis Superior Phar. Constrictor Transversus Genioglossus Inferior Longitudinalis Hyoglossus
Theories of Motor Control Theory 2: Hierarchical Control Theory 1: Direct Control
Factor Analysis of X-Ray ImagesHarshman, Ladefoged, &Goldstein, 1977
Factor Analysis of X-Ray ImagesHarshman, Ladefoged, &Goldstein, 1977
Factor Analysis of X-Ray ImagesHarshman, Ladefoged, &Goldstein, 1977
Factor Analysis of X-Ray ImagesHarshman, Ladefoged, &Goldstein, 1977 Finding: Two factors account for 92% of variance.
Factor loadings seem to represent distinctive features: v1 = [a front] v2 = [b high]
Can Three-Dimensional TongueShape be Explained Using ShapeFactors? Hypothesis 1 3D tongue shape during speech = weighted sum of 2-3 factors. Hypothesis 2 Shape of the factors t1(i), t2(i) is speaker-dependent. (??)
Why is 3D Different from 2D? Linear Source-Filter Theory: - Vowel Quality is Determined by Areas - Area Correlated w/Midsagittal Width
Do Shape Factors Exist in 3D? • If inter-speaker shape similarity is governed by desire for acoustic similarity, and... • If acoustic similarity depends on cross-sectional area, not cross-sectional shape... • Then Variation in 3D Shape May Not Have a Shape Factor Basis
Factor Analysis of MRI-Derived Tongue Shapes: Methodology 1. Recruit Subjects 2. Collect MRI Images 3. Segment the Images 4. Interpolate ROI to Create 3D Tongue Shapes for Each Vowel 5. Speaker-Dependent Factor Analysis 6. Speaker-Independent Factor Analysis
Subject Recruitment: • Ten subjects recruited; five successfully imaged (3 male, 2 female). • Subjects were college undergrads and grads with no metal fillings and no claustrophobia. • Subjects were trained to sustain vowel sounds with little variation. • Human subjects approval: both UCLA and Cedars-Sinai Medical Center.
MRI Image Collection • GE Signa 1.5T • T1-weighted • 3mm slices • 24 cm FOV • 256 x 256 pixels • Coronal, Axial • 11-18 Sounds • per Subject. • Breath-hold in • vowel position • for 25 seconds
Image Viewing and Segmentation: the CTMRedit GUI and toolbox • Display series of CT or MR image slices • Segment ROI manually or automatically • Interpolate and reconstruct ROI in 3D space
Calibration: Segmentation of Phantom (J. Cha) • Test tubes of 3 sizes • Radius estimated from manual segmentation has an absolute error of • typical case: 0.1mm • worst case: 0.4mm
Calibration: Articulatory Speech Synthesis (J. Cha) • /a,i,u/ synthesized using Maeda articulatory synthesizer • F1-F4 errors: • worst case: +/- 30% • mean error: +2.8% • std dev: 19.5%
Reconstruction of ROI • Interpolate between image slices to create 3D object.
Speaker Normalization: VT Length, Inter-Molar Width (S. Pizza)
Speaker-Dependent Factor Analysis • 12 tongue shapes from one speaker: • Each tongue shape modeled as a 25 point x 40 point rubber sheet. • Principal Components Analysis: • 11 Non-Zero Factors (12 vowels - 1 mean vector = 11 degrees of freedom). • 2 Factors: 78% of variance • 3 Factors: 88% of variance
“Excuses:” Why Didn’t it Work? • Tongue Length changes from /ao/ to /iy/. • Human Transcriber Error? • Interpolation to Form 3D Image Causes Error • Spline & Sinc interpolation: very large errors • Linear interpolation: smaller errors, but still too large.
New Approaches: ---- Avoid Interpolation General Method: Avoid interpolation by modeling the measured data directly. • J. Huang: Control factor shape using an a priori probability distribution. • Y. Zheng: Limit factor to the set of polynomial surfaces.
Polynomial Smoothing (Y. Zheng) • Polynomial Surface Modeling • Tongue shape = polynomial surface • 4D surface model enforces smoothness constraints. • Hybrid Polynomial/Factor model • Midsagittal tongue shape is as predicted by Harshman et al. • 3D shape = (midsag. shape)X(polynomial)
Conclusions • X-ray analysis suggests hierarchical motor control, but... • “Hierarchical control” might reflect structure of the acoustic space. • MRI analysis does not find hierarchical control (yet), but... • Negative finding might be result of methodological weakness.