180 likes | 325 Views
STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria. Mark Parker Specialist Speech and Language Therapist. Project Team. DoH NEAT University of Sheffield Barnsley District General Hospital Prof P Enderby/ M Parker – Clinical Speech Therapy
E N D
STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria Mark Parker Specialist Speech and Language Therapist
Project Team • DoH • NEAT • University of Sheffield • Barnsley District General Hospital • Prof P Enderby/ M Parker – Clinical Speech Therapy • Prof P Green/ Dr Athanassios Hatzis – Computer Sciences • Prof M Hawley/ Dr Simon Brownsall – Medical Physics
What is Dysarthria? • A neurological motor speech impairment characterised by slow, weak, imprecise and/or uncoordinated movements of the speech musculature. • May be congenital or acquired • 170/100 000 (Emerson & Enderby 1995)
Severity Rating • Typically based on ‘intelligibility’ • ‘…the extent a listener understands the speech produced…’ (Yorkston et al, 1999) • Not a pure measure – interaction of events • Mild 70-90% • Moderate 40-70% • Severe 10-40%
Aim • VRS used to access other technology • Many of the people with severe dysarthria will have associated severe physical disability • ECA operated with switching systems • slow, laborious, positioning • VRS to supplement or replace switching
Background • Voice recognition systems • commercially available packages -mobile phones, WP packages-Dragon Dictate • Continuous vs Discrete • Normal speech - with recognition training can get >90% recognition rates (Rose and Galdo, 1999) • Dysarthric speech - mild 10-15% lower recognition rates (Ferrier, 1992), • Declining rapidly as speech deteriorates 30-40% single words (Thomas-Stonell, 1998)- functionally useless
Intelligibility vs Consistency • Difference between machine recognition and human perception • ‘Normal’ speech may be 100% intelligible and with a narrow band of differences across time (consistency). • ‘Severe’ dysarthria may be completely unintelligible but may show consistency of key elements (or not)
Development of the system • 10-12 volunteers - severe dysarthria and physical disability • Speech <30% intelligibility rating • Video/DAT recording/computer sampling • Assessing for the range of phonetic contrasts that can be achieved
Development of a system (2) • Discrete system - the number of contrasts that can be achieved will determine the number of commands that the VRS can handle • Don’t need intelligibility - need consistency • Determine what word/sound/phonetic contrast will represent what command
Development of a system (3) • Train the VRS - neural networks and hidden Markhov modelling • Speech consistency training • Implement the system
Current position • Software development – sophisticated recording and data logging facility to be combined with ‘consistency’ measure and spectography package. • Developing ‘user friendliness’ and possibility of ‘remote’ usage. • Identifying & Recording EC commands • ‘Labelling’ the sample • Attempting to define measures of baseline consistency at an ‘acoustic’ level • Experimenting with recognition accuracy of commercially available product - Sicare
Labelling • Breaking an utterance into component parts • To establish the extent of variance over time
Sicare testing • Recognition rates compatible with previous research • Begins to illustrate the points at which a recogniser becomes ‘confused’ • May illustrate the areas where distinction has to be made • May start to illustrate some of the key acoustic factors that are crucial in dysarthric speech and VR • Non adapted commercial product functionally useless for this population
Subsidiary Questions • Is dysarthric speech consistent? • Does the underlying acoustic/soundwave pattern contain consistent differences in contrasts that are not perceptually distinguishable? • Can consistency be trained in the absence of intelligibility? • Does increasing consistency increase intelligibility?