380 likes | 626 Views
STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria. Mark Parker Specialist Speech and Language Therapist Institute of General Practice. Aims of the Presentation . How ASR could be made to work for people with physical disability Basic understanding of how ASR works
E N D
STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria Mark Parker Specialist Speech and Language Therapist Institute of General Practice
Aims of the Presentation • How ASR could be made to work for people with physical disability • Basic understanding of how ASR works • Why it might not work with people with severe speech problems • How it may be possible to get ASR to work with people with severe speech problems • See some of the technology under development
What Does It Stand for? • Speech Training And Recognition for Dysarthric Users of aSsisted Technology
Project Team • DoH • NEAT • University of Sheffield • Barnsley District General Hospital • Prof P Enderby/ M Parker – Clinical Speech Therapy • Prof P Green/ Dr A Hatzis – Computer Sciences • Prof M Hawley/ Dr S Brownsall – Medical Physics
Aim of the Project • ASR used to access other technology • Many of the people with severe dysarthria will have associated severe physical disability • ECA operated with switching systems • slow, laborious, positioning • ASR to supplement or replace switching
Implications for clinical staff • Role of OTs in the provision of environmental controls • Role of SLTs in enhancing performance to enable access to the technology
Background • Speech recognition systems • commercially available packages -mobile phones, WP packages-Dragon DictateTM • Continuous vs Discrete • Normal speech - with recognition training can get >90% recognition rates (Rose and Galdo, 1999) • Dysarthric speech - mild 10-15% lower recognition rates (Ferrier, 1992), • Declining rapidly as speech deteriorates 30-40% single words (Thomas-Stonell, 1998)- functionally useless
What is Dysarthria? • A neurological motor speech impairment characterised by slow, weak, imprecise and/or uncoordinated movements of the speech musculature. • May be congenital or acquired • 170/100 000 (Emerson & Enderby 1995)
Examples of dysarthric speech • Picture description • Single words • Female • Male
Severity Rating • Typically based on ‘intelligibility’ • ‘…the extent a listener understands the speech produced…’ (Yorkston et al, 1999) • Not a ‘pure’ measure – combination of articulatory accuracy and other variables – listener familiarity, topic knowledge …. • Mild 70-90% • Moderate 40-70% • Severe 10-40%
Human speech recognition vs Machine speech recognition • Human recognition: • Focuses on meaning • Filters out small differences in sounds • Is aided by the context of the conversation • Machine recognition: • Focuses on sound (acoustics) • Small differences make a difference • Has no sense of context or meaning
Intelligibility and Consistency • ‘Normal’ speech will be 100% intelligible and with few articulatory differences over time (consistency). • ‘Severe’ dysarthria may be completely unintelligible to the naïve listener but may show consistency of key elements which will make it more intelligible to the familiar listener.
Developing the Speech Recogniser • Built around the HTK model from Cambridge • Recogniser compares an incoming word with a stored template • The ‘heard’ word is given a statistical score to determine what the recogniser believes to have been the intended target • The ‘by-product’ is a consistency score
Current position • Software development – sophisticated recording and data logging facility to be combined with ‘consistency’ measure and spectography package. • Developing ‘user friendliness’ and possibility of ‘remote’ usage. • Identifying & Recording EC commands • ‘Labelling’ the sample • Attempting to define measures of baseline consistency at an ‘acoustic’ level • Experimenting with recognition accuracy of commercially available product - Sicare
Sicare testing • Recognition rates compatible with previous research • Begins to illustrate the points at which a recogniser becomes ‘confused’ • May illustrate the areas where distinction has to be made • May start to illustrate some of the key acoustic factors that are crucial in dysarthric speech and VR • Non adapted commercial product functionally useless for this population
Initial tests for consistency • Input 3 examples of a word to build a model • Input the remaining examples of the same word • Recogniser indicates the most probable match given the input • Initially mix normal and dysarthric speech
Table 1: Test data for the consistency indicator Test Data for Consistency Indicator
Conclusions – so far • Recognition results are better than currently available hardware • Initial indications suggest that increasing the number of states in the HMM improves recognition for ‘normal’ speech, but NOT dysarthric speech • May need independent recognisors for each speaker
Subsidiary Questions • Is dysarthric speech consistent? • Does the underlying acoustic/soundwave pattern contain consistent differences in contrasts that are not perceptually distinguishable? • Can consistency be trained in the absence of intelligibility? • Does increasing consistency increase intelligibility?
STARDUST Web site • www.dcs.shef.ac.uk/~spandh/projects/stardust