1 / 87

Key Points

lbolanos
Download Presentation

Key Points

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying speech / language technologiesto communication disorders: New challenges for basic research Jan van SantenCenter for Spoken Language Understanding OGI School of Science & EngineeringOregon Health & Science UniversityCollaborators: at CSLU: Lois Black, Peter Heeman, John-Paul Hosom, Alexander Kain, Esther Klabbers, Qi Miao, Taniya Mishra, Xiaochuan Niuelsewhere: Melanie Fried-Oken, Robert Margolis, Larry Shriberg

  2. Key Points • Vast number of … • … potential applications of • speech and language technologies to • communication disorders • Some of these potential applications … • … can be realized right now • Other potential applications require … • … applied technology research … • … or basic technology research • New opportunities for • Speech/language technology research • Clinical research • Interdisciplinary research • Today: Some preliminary studies along these lines Center for Spoken Language Understanding

  3. Outline • Defining Terms • What are communication disorders • What do we mean by “speech technology” • Classification of applications • Examples of current work in progress • Dysarthria • Intelligibility enhancement • Diagnostic • Personalized TTS system • Suspected Apraxia of Speech • Diagnostic • Autism • Clinical research • Future plans • Summary Center for Spoken Language Understanding

  4. Outline • Defining Terms • What are communication disorders • What do we mean by “speech technology” • Classification of applications • Examples of current work in progress • Dysarthria • Intelligibility enhancement • Diagnostic • Personalized TTS system • Suspected Apraxia of Speech • Diagnostic • Autism • Clinical research • Future plans • Summary Center for Spoken Language Understanding

  5. What are communication disorders • Input examples • Hearing loss • Congenital deafness • Receptive Aphasia • Receptive Developmental Language Disorders • Autism • … • Output examples • Apraxia of speech • Stuttering • Dysarthria • Aphasia • Expressive Developmental Language Disorders • Autism • … Center for Spoken Language Understanding

  6. What do we mean by “speech technology” • Examples • Speech recognition • Speech compression / coding • Text-to-speech synthesis • Voice transformation • Speaker identification • Language identification • Summarization • Speech/language data mining • Text-to-text translation • … Center for Spoken Language Understanding

  7. Classification of Applications • End user perspective • Assistive • Remedial • Diagnostic • Research tool for [basic or applied] clinical research • Research perspective • No research – ready for use as-is • Applied technology research • Research on fundamentally new technologies Center for Spoken Language Understanding

  8. Outline • Defining Terms • What are communication disorders • What do we mean by “speech technology” • Classification of applications • Examples of current work in progress • Dysarthria • Intelligibility enhancement • Diagnostic • Personalized TTS system • Suspected Apraxia of Speech • Diagnostic • Autism • Clinical research • Future plans • Summary Center for Spoken Language Understanding

  9. Outline • Defining Terms • What are communication disorders • What do we mean by “speech technology” • Classification of applications • Examples of current work in progress • Dysarthria • Intelligibility enhancement • Diagnostic • Personalized TTS system • Suspected Apraxia of Speech • Diagnostic • Autism • Clinical research • Future plans • Summary Center for Spoken Language Understanding

  10. Dysarthria • What is it? • Motor speech disorder • Associated with TBI, stroke, Parkinson’s, ALS, … • 1-2 million individuals in the US • ~6 different syndromes [ataxic, spastic, …] • Poor – at times highly variable – speech production • “tip”  “temf”, “sib”, … [poor control and coordination] • Applications: • Intelligibility enhancement [Assistive] • Analysis of coarticulation [Diagnostic, Assistive] • Personalized TTS [Assistive] Center for Spoken Language Understanding

  11. Dysarthria • What is it? • Motor speech disorder • Associated with TBI, stroke, Parkinson’s, ALS, … • 1-2 million individuals in the US • ~6 different syndromes [ataxic, spastic, …] • Poor – at times highly variable – speech production • “tip”  “temf”, “sib”, … [poor control and coordination] • Applications: • Intelligibility enhancement [Assistive] • Analysis of coarticulation [Diagnostic , Assistive] • Personalized TTS [Assistive] Center for Spoken Language Understanding

  12. Intelligibility enhancement • What do we want: • Speech enhancement device: • Input: unrestricted vocabulary, unlabeled speech • Output: intelligibility andquality enhanced speech • Preliminary data: • What did not work: • ASR  TTS • Standard [frame-based] voice transformation systems • Perception experiments using hybridization: • Everything matters – consonants, vowels, prosody • Why focus on vowels • Perception experiments show they matter • “Easy” Center for Spoken Language Understanding

  13. Intelligibility enhancement • Approach: • Find robust “stable points” on formant trajectories • Gaussian mixture mapping of stable points between dysarthric and control F1 x F2 x Duration spaces [trained on parallel dysarthric/“normal” speaker recordings] • Use mapped stable points to draw synthetic formant trajectories • Draw synthetic pitch and energy contours • Formant synthesis Center for Spoken Language Understanding

  14. Intelligibility enhancement • How bad are dysarthric formants? Control speaker Dysarthric Speaker Center for Spoken Language Understanding

  15. 1. Stable Points • Process: • Determine 50% central region of vowel •  Shape decision •  Median smooth •  Multi-tonic fit • F1: • Shape decision: Concave assumed • Estimation: Smooth  multi-tonic fit  maximum • F2: • Shape decision: Mountain / valley / up / down [multi-tonic fit] • Estimation: Smooth  multi-tonic fit  max / min / median • Isotonic: Given x1, …, xn, find y1, …, yn such that • y1 <= y2 <= … <= yn, and • S (xi-yi)2 is minimized • Multitonic: Optimize over peak/valley locations Center for Spoken Language Understanding

  16. 2. Gaussian mixture mapping • Gaussian mixture Model • piece-wise linear, probabilistic transformation • Joint density of (bark) stable points and duration stable points (Control speaker ) stable points (Dysarthric speaker ) Center for Spoken Language Understanding

  17. [2. Gaussian mixture mapping, cont’d] • Smooth mapping of F1, F2, Duration • Mapped to perceptually appropriate regions • Separability unchanged! Mapped dysarthric stable point Control stable point Center for Spoken Language Understanding

  18. 3, 4. Parameter trajectories • Formants: • Constant, horizontal lines • at target frequencies of mapped stable points • Pitch: • Single-peaked, by rule, based on robust mean • Energy • Heavy smooth Center for Spoken Language Understanding

  19. 5. Formant synthesis • Klatt glottal source model • constant parameters • overlap-add at consonant boundaries Center for Spoken Language Understanding

  20. Evaluation • Conditions • “dysarthric” • “clean” (target: dysarthric stable points) • “map” (target: mapped stable points) • “oracle” (target: control stable points) • “control” • Perceptual Tests • Quality of “dysarthric” versus “clean” • CMOS: -2 (much worse)... 0 (same) ... +2 (much better) • Intelligibility [all 5 conditions] • Chose correct vowel: heed / hid / heck / … Center for Spoken Language Understanding

  21. Results 1: Quality Test • 10 listeners • “dysarthric” vs. “clean”: • 0.19 on the CMOS scale (p=0.015) • when vocal fry: 0.26 (unanimous, p<0.001) Center for Spoken Language Understanding

  22. Results 2: Intelligibility test 16 listeners, 100 stimuli / listener 33%, p<0.001 7%, p<0.015 Center for Spoken Language Understanding

  23. Results 2: Intelligibility test 16 listeners, 100 stimuli / listener 11%, p<0.001 -5%, p<0.05 Center for Spoken Language Understanding

  24. Intelligibility enhancementSummary • Obtained small [7%] but significant intelligibility enhancement • Very good quality enhancement • Future work: • Understand why “clean” was bad • Incorporate “de-coarticulation” to reduce overlap of stable point distributions [see next …] Center for Spoken Language Understanding

  25. Dysarthria • What is it? • Motor speech disorder • Associated with TBI, stroke, Parkinson’s, ALS, … • 1-2 million individuals in the US • ~6 different syndromes [ataxic, spastic, …] • Applications: • Intelligibility enhancement [Assistive] • Analysis of coarticulation [Diagnostic, Assistive] • Personalized TTS [Assistive] Center for Spoken Language Understanding

  26. Analysis of coarticulation Medians of Vowel Formants at Midpoints Control speaker Dysarthric Speaker Center for Spoken Language Understanding

  27. Explaining compressed vowel space • Coarticulation: Average formants of vowels … • … more strongly dependent on … • … the average of the virtual formants … • … of the surrounding consonants • Variability: Average formants of vowels … • … result of very broad distributions that are … • … skewed by the boundaries of vowel space … • … so that the averages move inward • Bad targets: Average formants of vowels … • … result of a tendency to … • … to move articulators in the wrong direction Approach: use Linear Coarticulation Model Center for Spoken Language Understanding

  28. Linear Coarticulation Model Center for Spoken Language Understanding

  29. Linear Coarticulation Model Observed formant vector t: Time p: Preceding consonant v: Vowel n: Next consonant Center for Spoken Language Understanding

  30. Linear Coarticulation Model Target Formants Observed formant vector t: Time p: Preceding consonant v: Vowel n: Next consonant Center for Spoken Language Understanding

  31. Linear Coarticulation Model Weight Matrices Target Formants Observed formant vector t: Time p: Preceding consonant v: Vowel n: Next consonant Center for Spoken Language Understanding

  32. Linear Coarticulation Model Weight Matrices Target Formants Observed formant vector t: Time p: Preceding consonant v: Vowel n: Next consonant Unobserved, estimable parameters Center for Spoken Language Understanding

  33. Linear Coarticulation Model • Model based on earlier work by: • Broad, Oehman, Lindblom, Schouten, Pols, Stevens, … Center for Spoken Language Understanding

  34. Two uses of coarticulation model • De-coarticulation [intelligibility enhancement] • Diagnostics Center for Spoken Language Understanding

  35. De-coarticulation Center for Spoken Language Understanding

  36. De-coarticulation implies Fv =est(I - Apt - Bnt)-1(F(t|p v n) - AptFp - BntFn) Center for Spoken Language Understanding

  37. De-coarticulation implies Fv =est(I - Apt- Bnt)-1(F(t|p v n)- AptFp- BntFn) observed Virtual Formant recognition Center for Spoken Language Understanding

  38. De-coarticulation implies Fv =est(I - Apt- Bnt)-1(F(t|p v n)- AptFp- BntFn) observed Virtual Formant recognition Use as de-coarticulated Stable Point Center for Spoken Language Understanding

  39. De-coarticulation implies Fv =est(I - Apt- Bnt)-1(F(t|p v n)- AptFp- BntFn) 86% 5-class observed Virtual Formant recognition Use as de-coarticulated Stable Point Center for Spoken Language Understanding

  40. Two uses of coarticulation model • De-coarticulation [intelligibility enhancement] • Diagnostics Center for Spoken Language Understanding

  41. Use for Diagnostics Center for Spoken Language Understanding

  42. Synchronous formants: Targets Control speaker [jp] Dysarthric Speaker [ll] Center for Spoken Language Understanding

  43. Synchronous formants: Targets Control speaker [00] Dysarthric Speaker [09] Center for Spoken Language Understanding

  44. Synchronous formants: Weights Control speaker [jp] Dysarthric Speaker [ll] More  Coarticulation  Less 1 – ant - bnt Center for Spoken Language Understanding

  45. Synchronous formants: Weights Control speaker [00] Dysarthric Speaker [09] More  Coarticulation  Less 1 – ant - bnt Center for Spoken Language Understanding

  46. Synchronous formants : Fit Root Mean Square of: Center for Spoken Language Understanding

  47. Asynchronous formants: Targets Control speaker [jp] Dysarthric Speaker [ll] Center for Spoken Language Understanding

  48. Asynchronous formants: Targets Control speaker [00] Dysarthric Speaker [09] F3: 1 – a”nt - b”nt F2: 1 – a’nt - b’nt F1: 1 – ant - bnt More  Coarticulation  Less Center for Spoken Language Understanding

  49. Asynchronous formants : Fit Root Mean Square of: Center for Spoken Language Understanding

  50. Linear Coarticulation Model: Summary • Two roles of model: • De-coarticulation • Better stable points for enhancement device • Diagnostics • Systematic parameter differences between dysarthric and control speakers • New work • Explore differences between syndromes • Explore other clinical uses, e.g. prognosis Center for Spoken Language Understanding

More Related