330 likes | 602 Views
Can speech technology be useful for people with dysarthria? Speech technology & pathology. Helmer Strik Language & Speech Dept. of Linguistics Radboud University Nijmegen. Outline. Speech technology & pathology Applications: existing, possible In practice Target groups
E N D
Can speech technology be usefulfor people with dysarthria?Speech technology & pathology Helmer Strik Language & Speech Dept. of Linguistics Radboud University Nijmegen
Outline • Speech technology & pathology • Applications: existing, possible • In practice • Target groups • Speech technology & dysarthria • Introduction • Speech recognition for dysarthric speech • Conclusions SPACE symposium
Applications • AAC (Augmentative & Alternative Communication): • Improve communication • Interactive tools: • Training, reading, listening • Assessment: • Diagnosis, monitoring • Therapy SPACE symposium
AAC • Speaking problems • Speech generation • Speech manipulation • Speech recognition (of handicapped) + output (text, speech, talking head, etc.) • Hearing problems • Hearing aids, cochlear implants, etc. • Speech recognition (of others) + output (text,sign language, talking head, etc.) SPACE symposium
ASR & output channel text ASR speech synthesis SPACE symposium
Interactive tools • Speech generation • Reading tools: screen readers, reading pen, text processors, etc. • Writing tools: word prediction, TTS, (dedicated) spell checking • Analysis, manipulation, training • Delayed Auditory Feedback (DAF) and Frequency Altered Feedback (FAF), for stutterers • CAFET: Computer-Aided Fluency Establishment Training • CAPT: Computer Assisted Pronunciation Training SPACE symposium
Delayed Auditory Feedback (DAF) Frequency Altered Feedback (FAF) SPACE symposium
Assessment, therapy • Assessment: diagnosis, monitoring • Therapy • Clinical setting, with expert • Speech analysis + visualization, categorization, etc. • IBM speech viewer • … • Research SPACE symposium
Applications • Amount of applications differs • (from most to fewest): • speech generation • speech analysis, manipulation, etc. • speech recognition SPACE symposium
In practice • Many existing applications • Many more are possible • However, relatively little use • Why? SPACE symposium
In practice • However, relatively little use. Why? • Needed: • Tailor made, flexible applications • Tailor made: taking into account the capabilities & desires of the user + environment • Flexible: the capabilities & desires often change • More user tests & adequacy evaluation • instead of technology improvement & performance evaluation SPACE symposium
Target groups • International Classification of Functioning, Disability and Health (ICF): • Mental functions: aphasia, dyslexia, mental disabilities • Sensory functions: blindness, deafness, both • Voice & speech functions: dysarthria, anarthria, mutism, stuttering • Motorial functions: dyspraxia, apraxia, RSI / UEMSD (Upper Extremity Musculoskeletal Disorders) SPACE symposium
Speech technology & dysarthria • Dysarthria: speech disorder caused by dysfunctioning of nerves and muscles • Many different kinds of dysarthria SPACE symposium
Can speech technology be useful for people with dysarthria? • Yes! • AAC • Interactive tools • Assessment • Therapy SPACE symposium
Can speech technology be useful for people with dysarthria? • Speech generation • Prefer voice similar to their (old) voice • Preferably: own voice • AAC • Manipulation • Speech recognition + output channel • Pronunciation training: Speech recognition, analysis, feedback, etc. SPACE symposium
Speech technology & dysarthria ASR for dysarthric speech • Questions: • How well can dysarthric speech be recognized by a standard (“non-dysarthric”) speech recognizer? • Will the recognition results improve if we train the recognizer on speech of dysarthric speakers? SPACE symposium
Experimental setupSpeakers • Dysarthric: 2 Dutch males, DYS1 & DYS2 • Reference: 2 Dutch males, REF1 & REF2 • Total duration of the speech material (minutes) • DYS 2: speaks more slowly SPACE symposium
Experimental setupSpeech tasks • All four speakers read the same list of items, consisting of four different tasks: • 1. NUM: numbers 0-12 spoken in isolation • 2. PFU: from Polyphone the 50 most Frequent Utterances • 3. PMS: 130 Plomp-Mimpen Sentences (semantically unpredictable) • 4. PRS: 10 Phonetically Rich Sentences SPACE symposium
Experimental setupSpeech tasks • Number of utterances & words per task • The NUM and PRS task were both read three times. SPACE symposium
Experimental setupSpeech recognizer • General specifications • Standard phone based recognizer • 37 context independent phones • 3-state HMM’s • 14 cepstral coeffiecients + delta’s from Melbank freq 350-3400 Hz • 16ms Hamming window, 10 ms step SPACE symposium
Experimental setupExperiments • Lexicon & language model (uni- and bigram) • Based on all words in 4 tasks • Task specific & same for all speakers • Perplexity SPACE symposium
Experimental setupSpeaker Indep. & Dependent • SI: Speaker Independent training material • Polyphone (5000+ speaker Dutch telephone database) • 4022 connected digit strings • 3702 polyphone most frequent items • 20,110 phonetically rich sentences • SD: Speaker Dependent training material • Speakers own speech SPACE symposium
Speaker Independent (SI) Results Word Error Rates (WERs) for SI recognition SPACE symposium
Speaker Independent (SI)Conclusions • REF better than DYS • DYS1 better than DYS2 in short utterances because of speaking rate (table 1) • Results DYS quite reasonable (especially for sentences) because of tight language model SPACE symposium
Speaker Dependent (SD) • Models (also) trained on speech of speakers • Jackknife procedure = semi randomly selected test set = rest = training set SPACE symposium
Speaker Dependent (SD) Results • Word Error Rates (WERs) for the whole test set • for different number of Gaussians (2N) SPACE symposium
Speaker Dependent (SD) Results Word Error Rates (WERs) for SD recognition SPACE symposium
Speaker Dependent (SD) Results • Word Error Rates (WERs) • for SD / SI recognition SPACE symposium
Speaker Dependent (SD)Conclusions • For REF results for SD equal or worse than for SI (counterbalance between own models, but less training material) • For DYS results for SD much better than for SI • DYS2 better than DYS1, almost as good as REF SPACE symposium
ConclusionsASR for dysarthric speech • Results for DYS2 are remarkable • SI: High WERs, esp. for NUM & PFU • SD: sometimes better than REF • Low speaking rate! • Automatic recognition of dysarthric speech is possible. Better results: • Lower speaking rate • Speaker dependent models • Even better: also speaker dependent lexicon SPACE symposium
ConclusionsST & pathology • Applications: • Many already exist • Many more are possible • Needed: • Tailor made, flexible applications • User tests, adequacy evaluation SPACE symposium
References • http://lands.let.ru.nl/TSpublic/strik/pres/ • p97-SPACE.ppt • E. Sanders, M. Ruiter, L. Beijer, H. Strik (2002) Automatic recognition of dutch dysarthric speech: A pilot study. ICSLP-2002, Denver, USA, pp. 661-664. • T. Rietveld & I. Stolte (2005) • Taal- en spraaktechnologie en communicatieve beperkingen SPACE symposium
END SPACE symposium