1 / 28

Factor Analysis of MRI-Derived Tongue Shapes

Factor Analysis of MRI-Derived Tongue Shapes Mark Hasegawa-Johnson ECE Department and Beckman Institute University of Illinois at Urbana-Champaign Background The vowel sounds of English are classified in two dimensions: “high/low” and “front/back.” u High i e o ae a Low Front Back

bernad
Download Presentation

Factor Analysis of MRI-Derived Tongue Shapes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Factor Analysis of MRI-Derived Tongue Shapes Mark Hasegawa-Johnson ECE Department and Beckman Institute University of Illinois at Urbana-Champaign

  2. Background The vowel sounds of English are classified in two dimensions: “high/low” and “front/back.” u High i e o ae a Low Front Back

  3. Background Tongue is composed of about 9 muscles (4 intrinsic, 5 extrinsic) Superior Longitudinalis Palatoglossus Styloglossus Verticalis Superior Phar. Constrictor Transversus Genioglossus Inferior Longitudinalis Hyoglossus

  4. Theories of Motor Control Theory 2: Hierarchical Control Theory 1: Direct Control

  5. Factor Analysis of X-Ray ImagesHarshman, Ladefoged, &Goldstein, 1977

  6. Factor Analysis of X-Ray ImagesHarshman, Ladefoged, &Goldstein, 1977

  7. Factor Analysis of X-Ray ImagesHarshman, Ladefoged, &Goldstein, 1977

  8. Factor Analysis of X-Ray ImagesHarshman, Ladefoged, &Goldstein, 1977 Finding: Two factors account for 92% of variance.

  9. Factor loadings seem to represent distinctive features: v1 = [a front] v2 = [b high]

  10. Can Three-Dimensional TongueShape be Explained Using ShapeFactors? Hypothesis 1 3D tongue shape during speech = weighted sum of 2-3 factors. Hypothesis 2 Shape of the factors t1(i), t2(i) is speaker-dependent. (??)

  11. Why is 3D Different from 2D? Linear Source-Filter Theory: - Vowel Quality is Determined by Areas - Area Correlated w/Midsagittal Width

  12. Do Shape Factors Exist in 3D? • If inter-speaker shape similarity is governed by desire for acoustic similarity, and... • If acoustic similarity depends on cross-sectional area, not cross-sectional shape... • Then Variation in 3D Shape May Not Have a Shape Factor Basis

  13. Factor Analysis of MRI-Derived Tongue Shapes: Methodology 1. Recruit Subjects 2. Collect MRI Images 3. Segment the Images 4. Interpolate ROI to Create 3D Tongue Shapes for Each Vowel 5. Speaker-Dependent Factor Analysis 6. Speaker-Independent Factor Analysis

  14. Subject Recruitment: • Ten subjects recruited; five successfully imaged (3 male, 2 female). • Subjects were college undergrads and grads with no metal fillings and no claustrophobia. • Subjects were trained to sustain vowel sounds with little variation. • Human subjects approval: both UCLA and Cedars-Sinai Medical Center.

  15. MRI Image Collection • GE Signa 1.5T • T1-weighted • 3mm slices • 24 cm FOV • 256 x 256 pixels • Coronal, Axial • 11-18 Sounds • per Subject. • Breath-hold in • vowel position • for 25 seconds

  16. Image Viewing and Segmentation: the CTMRedit GUI and toolbox • Display series of CT or MR image slices • Segment ROI manually or automatically • Interpolate and reconstruct ROI in 3D space

  17. Calibration: Segmentation of Phantom (J. Cha) • Test tubes of 3 sizes • Radius estimated from manual segmentation has an absolute error of • typical case: 0.1mm • worst case: 0.4mm

  18. Calibration: Articulatory Speech Synthesis (J. Cha) • /a,i,u/ synthesized using Maeda articulatory synthesizer • F1-F4 errors: • worst case: +/- 30% • mean error: +2.8% • std dev: 19.5%

  19. Reconstruction of ROI • Interpolate between image slices to create 3D object.

  20. Tongue Shape During /ae/

  21. Speaker Normalization: VT Length, Inter-Molar Width (S. Pizza)

  22. Speaker-Dependent Factor Analysis • 12 tongue shapes from one speaker: • Each tongue shape modeled as a 25 point x 40 point rubber sheet. • Principal Components Analysis: • 11 Non-Zero Factors (12 vowels - 1 mean vector = 11 degrees of freedom). • 2 Factors: 78% of variance • 3 Factors: 88% of variance

  23. “Excuses:” Why Didn’t it Work? • Tongue Length changes from /ao/ to /iy/. • Human Transcriber Error? • Interpolation to Form 3D Image Causes Error • Spline & Sinc interpolation: very large errors • Linear interpolation: smaller errors, but still too large.

  24. New Approaches: ---- Avoid Interpolation General Method: Avoid interpolation by modeling the measured data directly. • J. Huang: Control factor shape using an a priori probability distribution. • Y. Zheng: Limit factor to the set of polynomial surfaces.

  25. Polynomial Smoothing (Y. Zheng) • Polynomial Surface Modeling • Tongue shape = polynomial surface • 4D surface model enforces smoothness constraints. • Hybrid Polynomial/Factor model • Midsagittal tongue shape is as predicted by Harshman et al. • 3D shape = (midsag. shape)X(polynomial)

  26. Conclusions • X-ray analysis suggests hierarchical motor control, but... • “Hierarchical control” might reflect structure of the acoustic space. • MRI analysis does not find hierarchical control (yet), but... • Negative finding might be result of methodological weakness.

  27. Speaker-Dependent Factor Analysis

More Related