E N D
Applying speech / language technologiesto communication disorders: New challenges for basic research Jan van SantenCenter for Spoken Language Understanding OGI School of Science & EngineeringOregon Health & Science UniversityCollaborators: at CSLU: Lois Black, Peter Heeman, John-Paul Hosom, Alexander Kain, Esther Klabbers, Qi Miao, Taniya Mishra, Xiaochuan Niuelsewhere: Melanie Fried-Oken, Robert Margolis, Larry Shriberg
Key Points • Vast number of … • … potential applications of • speech and language technologies to • communication disorders • Some of these potential applications … • … can be realized right now • Other potential applications require … • … applied technology research … • … or basic technology research • New opportunities for • Speech/language technology research • Clinical research • Interdisciplinary research • Today: Some preliminary studies along these lines Center for Spoken Language Understanding
Outline • Defining Terms • What are communication disorders • What do we mean by “speech technology” • Classification of applications • Examples of current work in progress • Dysarthria • Intelligibility enhancement • Diagnostic • Personalized TTS system • Suspected Apraxia of Speech • Diagnostic • Autism • Clinical research • Future plans • Summary Center for Spoken Language Understanding
Outline • Defining Terms • What are communication disorders • What do we mean by “speech technology” • Classification of applications • Examples of current work in progress • Dysarthria • Intelligibility enhancement • Diagnostic • Personalized TTS system • Suspected Apraxia of Speech • Diagnostic • Autism • Clinical research • Future plans • Summary Center for Spoken Language Understanding
What are communication disorders • Input examples • Hearing loss • Congenital deafness • Receptive Aphasia • Receptive Developmental Language Disorders • Autism • … • Output examples • Apraxia of speech • Stuttering • Dysarthria • Aphasia • Expressive Developmental Language Disorders • Autism • … Center for Spoken Language Understanding
What do we mean by “speech technology” • Examples • Speech recognition • Speech compression / coding • Text-to-speech synthesis • Voice transformation • Speaker identification • Language identification • Summarization • Speech/language data mining • Text-to-text translation • … Center for Spoken Language Understanding
Classification of Applications • End user perspective • Assistive • Remedial • Diagnostic • Research tool for [basic or applied] clinical research • Research perspective • No research – ready for use as-is • Applied technology research • Research on fundamentally new technologies Center for Spoken Language Understanding
Outline • Defining Terms • What are communication disorders • What do we mean by “speech technology” • Classification of applications • Examples of current work in progress • Dysarthria • Intelligibility enhancement • Diagnostic • Personalized TTS system • Suspected Apraxia of Speech • Diagnostic • Autism • Clinical research • Future plans • Summary Center for Spoken Language Understanding
Outline • Defining Terms • What are communication disorders • What do we mean by “speech technology” • Classification of applications • Examples of current work in progress • Dysarthria • Intelligibility enhancement • Diagnostic • Personalized TTS system • Suspected Apraxia of Speech • Diagnostic • Autism • Clinical research • Future plans • Summary Center for Spoken Language Understanding
Dysarthria • What is it? • Motor speech disorder • Associated with TBI, stroke, Parkinson’s, ALS, … • 1-2 million individuals in the US • ~6 different syndromes [ataxic, spastic, …] • Poor – at times highly variable – speech production • “tip” “temf”, “sib”, … [poor control and coordination] • Applications: • Intelligibility enhancement [Assistive] • Analysis of coarticulation [Diagnostic, Assistive] • Personalized TTS [Assistive] Center for Spoken Language Understanding
Dysarthria • What is it? • Motor speech disorder • Associated with TBI, stroke, Parkinson’s, ALS, … • 1-2 million individuals in the US • ~6 different syndromes [ataxic, spastic, …] • Poor – at times highly variable – speech production • “tip” “temf”, “sib”, … [poor control and coordination] • Applications: • Intelligibility enhancement [Assistive] • Analysis of coarticulation [Diagnostic , Assistive] • Personalized TTS [Assistive] Center for Spoken Language Understanding
Intelligibility enhancement • What do we want: • Speech enhancement device: • Input: unrestricted vocabulary, unlabeled speech • Output: intelligibility andquality enhanced speech • Preliminary data: • What did not work: • ASR TTS • Standard [frame-based] voice transformation systems • Perception experiments using hybridization: • Everything matters – consonants, vowels, prosody • Why focus on vowels • Perception experiments show they matter • “Easy” Center for Spoken Language Understanding
Intelligibility enhancement • Approach: • Find robust “stable points” on formant trajectories • Gaussian mixture mapping of stable points between dysarthric and control F1 x F2 x Duration spaces [trained on parallel dysarthric/“normal” speaker recordings] • Use mapped stable points to draw synthetic formant trajectories • Draw synthetic pitch and energy contours • Formant synthesis Center for Spoken Language Understanding
Intelligibility enhancement • How bad are dysarthric formants? Control speaker Dysarthric Speaker Center for Spoken Language Understanding
1. Stable Points • Process: • Determine 50% central region of vowel • Shape decision • Median smooth • Multi-tonic fit • F1: • Shape decision: Concave assumed • Estimation: Smooth multi-tonic fit maximum • F2: • Shape decision: Mountain / valley / up / down [multi-tonic fit] • Estimation: Smooth multi-tonic fit max / min / median • Isotonic: Given x1, …, xn, find y1, …, yn such that • y1 <= y2 <= … <= yn, and • S (xi-yi)2 is minimized • Multitonic: Optimize over peak/valley locations Center for Spoken Language Understanding
2. Gaussian mixture mapping • Gaussian mixture Model • piece-wise linear, probabilistic transformation • Joint density of (bark) stable points and duration stable points (Control speaker ) stable points (Dysarthric speaker ) Center for Spoken Language Understanding
[2. Gaussian mixture mapping, cont’d] • Smooth mapping of F1, F2, Duration • Mapped to perceptually appropriate regions • Separability unchanged! Mapped dysarthric stable point Control stable point Center for Spoken Language Understanding
3, 4. Parameter trajectories • Formants: • Constant, horizontal lines • at target frequencies of mapped stable points • Pitch: • Single-peaked, by rule, based on robust mean • Energy • Heavy smooth Center for Spoken Language Understanding
5. Formant synthesis • Klatt glottal source model • constant parameters • overlap-add at consonant boundaries Center for Spoken Language Understanding
Evaluation • Conditions • “dysarthric” • “clean” (target: dysarthric stable points) • “map” (target: mapped stable points) • “oracle” (target: control stable points) • “control” • Perceptual Tests • Quality of “dysarthric” versus “clean” • CMOS: -2 (much worse)... 0 (same) ... +2 (much better) • Intelligibility [all 5 conditions] • Chose correct vowel: heed / hid / heck / … Center for Spoken Language Understanding
Results 1: Quality Test • 10 listeners • “dysarthric” vs. “clean”: • 0.19 on the CMOS scale (p=0.015) • when vocal fry: 0.26 (unanimous, p<0.001) Center for Spoken Language Understanding
Results 2: Intelligibility test 16 listeners, 100 stimuli / listener 33%, p<0.001 7%, p<0.015 Center for Spoken Language Understanding
Results 2: Intelligibility test 16 listeners, 100 stimuli / listener 11%, p<0.001 -5%, p<0.05 Center for Spoken Language Understanding
Intelligibility enhancementSummary • Obtained small [7%] but significant intelligibility enhancement • Very good quality enhancement • Future work: • Understand why “clean” was bad • Incorporate “de-coarticulation” to reduce overlap of stable point distributions [see next …] Center for Spoken Language Understanding
Dysarthria • What is it? • Motor speech disorder • Associated with TBI, stroke, Parkinson’s, ALS, … • 1-2 million individuals in the US • ~6 different syndromes [ataxic, spastic, …] • Applications: • Intelligibility enhancement [Assistive] • Analysis of coarticulation [Diagnostic, Assistive] • Personalized TTS [Assistive] Center for Spoken Language Understanding
Analysis of coarticulation Medians of Vowel Formants at Midpoints Control speaker Dysarthric Speaker Center for Spoken Language Understanding
Explaining compressed vowel space • Coarticulation: Average formants of vowels … • … more strongly dependent on … • … the average of the virtual formants … • … of the surrounding consonants • Variability: Average formants of vowels … • … result of very broad distributions that are … • … skewed by the boundaries of vowel space … • … so that the averages move inward • Bad targets: Average formants of vowels … • … result of a tendency to … • … to move articulators in the wrong direction Approach: use Linear Coarticulation Model Center for Spoken Language Understanding
Linear Coarticulation Model Center for Spoken Language Understanding
Linear Coarticulation Model Observed formant vector t: Time p: Preceding consonant v: Vowel n: Next consonant Center for Spoken Language Understanding
Linear Coarticulation Model Target Formants Observed formant vector t: Time p: Preceding consonant v: Vowel n: Next consonant Center for Spoken Language Understanding
Linear Coarticulation Model Weight Matrices Target Formants Observed formant vector t: Time p: Preceding consonant v: Vowel n: Next consonant Center for Spoken Language Understanding
Linear Coarticulation Model Weight Matrices Target Formants Observed formant vector t: Time p: Preceding consonant v: Vowel n: Next consonant Unobserved, estimable parameters Center for Spoken Language Understanding
Linear Coarticulation Model • Model based on earlier work by: • Broad, Oehman, Lindblom, Schouten, Pols, Stevens, … Center for Spoken Language Understanding
Two uses of coarticulation model • De-coarticulation [intelligibility enhancement] • Diagnostics Center for Spoken Language Understanding
De-coarticulation Center for Spoken Language Understanding
De-coarticulation implies Fv =est(I - Apt - Bnt)-1(F(t|p v n) - AptFp - BntFn) Center for Spoken Language Understanding
De-coarticulation implies Fv =est(I - Apt- Bnt)-1(F(t|p v n)- AptFp- BntFn) observed Virtual Formant recognition Center for Spoken Language Understanding
De-coarticulation implies Fv =est(I - Apt- Bnt)-1(F(t|p v n)- AptFp- BntFn) observed Virtual Formant recognition Use as de-coarticulated Stable Point Center for Spoken Language Understanding
De-coarticulation implies Fv =est(I - Apt- Bnt)-1(F(t|p v n)- AptFp- BntFn) 86% 5-class observed Virtual Formant recognition Use as de-coarticulated Stable Point Center for Spoken Language Understanding
Two uses of coarticulation model • De-coarticulation [intelligibility enhancement] • Diagnostics Center for Spoken Language Understanding
Use for Diagnostics Center for Spoken Language Understanding
Synchronous formants: Targets Control speaker [jp] Dysarthric Speaker [ll] Center for Spoken Language Understanding
Synchronous formants: Targets Control speaker [00] Dysarthric Speaker [09] Center for Spoken Language Understanding
Synchronous formants: Weights Control speaker [jp] Dysarthric Speaker [ll] More Coarticulation Less 1 – ant - bnt Center for Spoken Language Understanding
Synchronous formants: Weights Control speaker [00] Dysarthric Speaker [09] More Coarticulation Less 1 – ant - bnt Center for Spoken Language Understanding
Synchronous formants : Fit Root Mean Square of: Center for Spoken Language Understanding
Asynchronous formants: Targets Control speaker [jp] Dysarthric Speaker [ll] Center for Spoken Language Understanding
Asynchronous formants: Targets Control speaker [00] Dysarthric Speaker [09] F3: 1 – a”nt - b”nt F2: 1 – a’nt - b’nt F1: 1 – ant - bnt More Coarticulation Less Center for Spoken Language Understanding
Asynchronous formants : Fit Root Mean Square of: Center for Spoken Language Understanding
Linear Coarticulation Model: Summary • Two roles of model: • De-coarticulation • Better stable points for enhancement device • Diagnostics • Systematic parameter differences between dysarthric and control speakers • New work • Explore differences between syndromes • Explore other clinical uses, e.g. prognosis Center for Spoken Language Understanding