160 likes | 400 Views
Using Speech Recognition for Speech Therapy. A multimodal application for users with aphasia Deborah A. Dahl SpeechTEK August 20-23, 2007. The Problem. Acquired aphasia is a disability that affects 1 million American adults
E N D
Using Speech Recognition for Speech Therapy A multimodal application for users with aphasia Deborah A. Dahl SpeechTEK August 20-23, 2007
The Problem • Acquired aphasia is a disability that affects 1 million American adults • Aphasia is a general term that refers to difficulty with language because of an injury to the parts of the brain that control language • Aphasia leads to social isolation and inability to work • Insurance only pays for a limited amount of speech therapy • Speech recognition could provide feedback to these users about the accuracy of their words
Participants • Myrna Schwartz and Ruth Fink (MossRehab) • MossRehab is part of the Albert Einstein Healthcare Network, a member of the Jefferson Health System. MossRehab has consistently been named one of “America's Best Hospitals” by U.S.News & World Report and focuses on innovative research and outstanding clinical care in medical and physical rehabilitation • Deborah Dahl (Conversational Technologies) • Conversational Technologies provides consulting services in speech technology which enable its customers to apply speech recognition and text to speech to create innovative applications and products • Funding: This project is funded, in part, under a grant with the Pennsylvania Department of Health. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions. MossTalk Words® was developed under partial funding from the McLean Contributionship and MossRehab
Users • Users have aphasia, most often due to a stroke • They are very motivated to improve their speech and language abilities and are receptive to using computer programs • However, they vary in their ability to use a mouse or keyboard, read, speak, and understand spoken language
The Approach • An earlier program, MossTalk Words ® (Fink, et. al, 2002) shows users a picture, and they try to say the word that corresponds to the picture • Cues are available to help the user remember the word if necessary • In the original program, users self-monitored the correctness of their words, or worked with a clinician. • This project adds speech recognition to reduce the need for a clinician to be present • As soon as the user says the right word, the speech recognizer plays a tone, says “that’s right”, says the target word, and shows the text of the target word • Advantages of the computer • Low cost • Available 24/7 • Consistent • User’s performance is automatically recorded
Architecture recorded files speech grammar speech recognizer therapy logic logging custom exercises Naming Exercise (applet) MossTalk Words (HTML pages) Web Browser
The User Interface © Albert Einstein Healthcare Network, 2001
Speech Recognition • System uses the Microsoft 6.1 speech recognition engine • dynamic grammars are modified to include just the words for each picture • in addition to the correct word and its synonyms, the grammars also include minor “acceptable” phonological variations
Testing • Pretest with 8 people without aphasia • Initial test with 7 users • mild to moderate aphasia, specifically anomia, in which errors are word-based rather than sound based • good articulation
Test Results – Speech Recognition • Goal: measure accuracy of speech recognition in this application • Metric is correct acceptance -- when the user says the correct word, the recognizer responds • if the recognizer doesn’t respond, that counts as an error
Test Results -- User Satisfaction • Goal: assess users’ subjective response to the system and to speech recognition • Developed a 7 item questionnaire (5 point scale) • Example questions • I enjoyed talking to the computer • I would recommend this system to other people with aphasia • I would like to have this computer program for practicing • Tested with default speech profile and female profile
Results (seven users) -- User Satisfaction and Speech Recognizer Accuracy same user with different profiles blue – male with male profile green – female with female profile red – female with male profile
Users’ Comments • “when we said it again, it understood me, it was perfect” (S1) • “the computer was good at understanding me” (S2) • “I liked it a lot when the computer understood me” (S3) • “I enjoyed hearing the computer tell me I was correct” (S5) • “It didn’t bother me when the computer didn’t understand me, I knew I was right” (S6)
Conclusions • Users with good articulation can get satisfactory speech recognition • Could use for naming practice • Next test: Could this system be used for pronunciation practice? • Test with users with articulation problems • How closely does ASR match intuitions of speech therapists about acceptable/unacceptable pronunciations?
More Information • Contact: http://www.ncrrn.org/contact/fink • http://www.mosstalk.com • References to earlier versions of MossTalk Words ® Fink, R.B., Brecher, A.R., Montgomery, M., and Schwartz, M.F.( 2001). MossTalk Words [computer software manual]. Philadelphia: Albert Einstein Healthcare Network. Fink, R.B., Brecher, A., Schwartz, M.F., & Robey, R.R. (2002). A computer implemented protocol for treatment of naming disorders: Evaluation of clinician-guided and partially self-guided instruction. Aphasiology, 16, 1061-1086.