1 / 36

STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria

STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria. Mark Parker Specialist Speech and Language Therapist Institute of General Practice. Aims of the Presentation . How ASR could be made to work for people with physical disability Basic understanding of how ASR works

gretchen
Download Presentation

STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria Mark Parker Specialist Speech and Language Therapist Institute of General Practice

  2. Aims of the Presentation • How ASR could be made to work for people with physical disability • Basic understanding of how ASR works • Why it might not work with people with severe speech problems • How it may be possible to get ASR to work with people with severe speech problems • See some of the technology under development

  3. What Does It Stand for? • Speech Training And Recognition for Dysarthric Users of aSsisted Technology

  4. Project Team • DoH • NEAT • University of Sheffield • Barnsley District General Hospital • Prof P Enderby/ M Parker – Clinical Speech Therapy • Prof P Green/ Dr A Hatzis – Computer Sciences • Prof M Hawley/ Dr S Brownsall – Medical Physics

  5. Aim of the Project • ASR used to access other technology • Many of the people with severe dysarthria will have associated severe physical disability • ECA operated with switching systems • slow, laborious, positioning • ASR to supplement or replace switching

  6. Implications for clinical staff • Role of OTs in the provision of environmental controls • Role of SLTs in enhancing performance to enable access to the technology

  7. Background • Speech recognition systems • commercially available packages -mobile phones, WP packages-Dragon DictateTM • Continuous vs Discrete • Normal speech - with recognition training can get >90% recognition rates (Rose and Galdo, 1999) • Dysarthric speech - mild 10-15% lower recognition rates (Ferrier, 1992), • Declining rapidly as speech deteriorates 30-40% single words (Thomas-Stonell, 1998)- functionally useless

  8. ‘Normal’ spectogram – “back”

  9. Normal speech “television”

  10. What is Dysarthria? • A neurological motor speech impairment characterised by slow, weak, imprecise and/or uncoordinated movements of the speech musculature. • May be congenital or acquired • 170/100 000 (Emerson & Enderby 1995)

  11. Examples of dysarthric speech • Picture description • Single words • Female • Male

  12. CC – Connected speech

  13. Picture description

  14. GR – single words

  15. ‘Dysarthric’ spectogram “back”

  16. Dysarthric speech “television”

  17. Severity Rating • Typically based on ‘intelligibility’ • ‘…the extent a listener understands the speech produced…’ (Yorkston et al, 1999) • Not a ‘pure’ measure – combination of articulatory accuracy and other variables – listener familiarity, topic knowledge …. • Mild 70-90% • Moderate 40-70% • Severe 10-40%

  18. Human speech recognition vs Machine speech recognition • Human recognition: • Focuses on meaning • Filters out small differences in sounds • Is aided by the context of the conversation • Machine recognition: • Focuses on sound (acoustics) • Small differences make a difference • Has no sense of context or meaning

  19. Intelligibility and Consistency • ‘Normal’ speech will be 100% intelligible and with few articulatory differences over time (consistency). • ‘Severe’ dysarthria may be completely unintelligible to the naïve listener but may show consistency of key elements which will make it more intelligible to the familiar listener.

  20. Developing the Speech Recogniser • Built around the HTK model from Cambridge • Recogniser compares an incoming word with a stored template • The ‘heard’ word is given a statistical score to determine what the recogniser believes to have been the intended target • The ‘by-product’ is a consistency score

  21. Current position • Software development – sophisticated recording and data logging facility to be combined with ‘consistency’ measure and spectography package. • Developing ‘user friendliness’ and possibility of ‘remote’ usage. • Identifying & Recording EC commands • ‘Labelling’ the sample • Attempting to define measures of baseline consistency at an ‘acoustic’ level • Experimenting with recognition accuracy of commercially available product - Sicare

  22. Demonstration of recording

  23. Sicare Recognition Results

  24. Sicare testing • Recognition rates compatible with previous research • Begins to illustrate the points at which a recogniser becomes ‘confused’ • May illustrate the areas where distinction has to be made • May start to illustrate some of the key acoustic factors that are crucial in dysarthric speech and VR • Non adapted commercial product functionally useless for this population

  25. Initial tests for consistency • Input 3 examples of a word to build a model • Input the remaining examples of the same word • Recogniser indicates the most probable match given the input • Initially mix normal and dysarthric speech

  26. Table 1: Test data for the consistency indicator Test Data for Consistency Indicator

  27. GR – Initial Recognisor Scores

  28. CC – Initial Recognisor Scores

  29. Conclusions – so far • Recognition results are better than currently available hardware • Initial indications suggest that increasing the number of states in the HMM improves recognition for ‘normal’ speech, but NOT dysarthric speech • May need independent recognisors for each speaker

  30. Subsidiary Questions • Is dysarthric speech consistent? • Does the underlying acoustic/soundwave pattern contain consistent differences in contrasts that are not perceptually distinguishable? • Can consistency be trained in the absence of intelligibility? • Does increasing consistency increase intelligibility?

  31. STARDUST Web site • www.dcs.shef.ac.uk/~spandh/projects/stardust

More Related