290 likes | 489 Views
Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen, Jean-Pierre Martens, Marc De Bodt. Intelligibility = popular measure for pathological speech assessment Perceptual assessment affected by non-speech information :
E N D
Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen, Jean-Pierre Martens, Marc De Bodt SPACE Symposium - 06/02/09
Intelligibility = popular measure for pathological speech assessment Perceptual assessment affected by non-speech information : familiarity of listener with speaker and type of disorder hard to eliminate this subjective bias guessing on the basis of linguistic context test material design must eliminate this bias Replacing the human listener by an automatic speech recognizer (ASR) can solve the two problems, but is the ASR sufficiently reliable? test case : automation of the Dutch Intelligibility Assessment (DIA) Introduction SPACE Symposium - 06/02/09
top Dutch Intelligibility Assessment (DIA) • 50 isolated (nonsense) words • intelligibility = percent phonemes correct SPACE Symposium - 06/02/09
How to apply ASR in the DIA? • Two approaches • let ASR recognize the words and count the percentage of correct decisions • let ASR check how well on average the acoustics support the phonetic transcription of the target word (=alignment) • Our experience • intelligibility emerging from first approach insufficiently reliable • therefore we developed a system based on alignment SPACE Symposium - 06/02/09
acoustic feature sequence Xt target speech transcription System architecture : flow chart Speech aligner speaker features Intelligibility Prediction Model objective score SPACE Symposium - 06/02/09
acoustic feature sequence Xt speaker features target speech transcription Intelligibility Prediction Model objective score System architecture : flow chart Speech aligner • Two systems: • complex state-of-the-art HMM-based system (ASR-ESAT) • simple system with a phonological layer (ASR-ELIS) • (point more directly to articulatory problems) • Acoustic models trained on speech of normal adult speakers SPACE Symposium - 06/02/09
ASR - ESAT • Acoustic models • state-of-the-art Semi-Continuous HMM • triphone models trained on normal speech • states tied using decision trees + phonological questions • Output • each frame t assigned to state st • per frame : st, P(st|Xt) SPACE Symposium - 06/02/09
ASR - ELIS Xt 24 binary phonological features concerning : • voicing • manner of articulation • place of articulation target speech transcription PLF extractor P(S1|Xt), …, P(Sn|Xt) P(K1|Xt), …, P(K24|Xt) Probability product model Viterbi decoder P(K1|Xt)..P(K24|Xt) st, P(st|Xt) SPACE Symposium - 06/02/09
acoustic feature sequence Xt target speech transcription System architecture : flow chart Speech aligner speaker features Intelligibility Prediction Model • Three feature sets: • Phonemic features (patient has trouble pronouncing a certain phoneme) • Phonological features (patient has problems with voicing, manner or place of articulation) • NEW : context-dependent features (patient has problems with a desired change of voicing, manner or place of articulation) objective score SPACE Symposium - 06/02/09
Extraction of phonemic features (PMF) # : (0.7+0.5+0.3) /3 /p/ : (0.4+0.8) /2 /o/ : (0.6+0.8) /2 /l/ : 0.6 Speech aligner = ASR-ESAT Phonemic features SPACE Symposium - 06/02/09
Extraction of phonological features (PLF) Speech aligner = ASR-ELIS Phonological features Burst : 0.6 Back : (0.7+0.9)/2 Voiced : (0.8+0.6+0.5)/3 SPACE Symposium - 06/02/09
Extraction of phonological features (PLF) Speech aligner = ASR-ELIS Phonological features Not burst : (0.2+0.1+… Not back : (0.1+0.1+… Not voiced : (0.1+0.1+… SPACE Symposium - 06/02/09
Extraction of phonological features (PLF) Speech aligner = ASR-ELIS Phonological features Irrelevant features for these phones SPACE Symposium - 06/02/09
Extraction of context-dependent phonological features (CD-PLF) • How well is change in PLF realized? • use PLF target in preceding/succeeding phone as context • binary features two values for target (present/absent) • binary features restricted number of left & right contexts • Left or right context can be • present, absent, not relevant, silence • Model selection (preliminary) • maximum 4 * 2 * 4 = 32 CD-PLFs per PLF 768 in total • select only those CD-PLFs occurring at least twice in every test 123 in total SPACE Symposium - 06/02/09
Extraction of context-dependent phonological features (CD-PLF) Speech aligner = ASR-ELIS CD-PLF features SPACE Symposium - 06/02/09
acoustic feature sequence Xt target speech transcription System architecture : flow chart Speech aligner speaker features Intelligibility Prediction Model objective score SPACE Symposium - 06/02/09
Intelligibility prediction model (IPM) • Objective map speaker features (PMF, PLF, CD-PLF or combinations) to speaker intelligibility score • Model training • train on DIA recordings • pathological speakers (+ some normal control speakers) • Model type and size • limited number of pathological speakers • high number of features linear regression model feature selection SPACE Symposium - 06/02/09
Reference material (DIA) • 211 speakers : • 51 normals • 60 dysarthric • 12 clefts (children) • 42 hearing impaired • 37 with laryngectomy • 7 with dysphonia • 2 others • Pathological speakers : mean of 78,7 % • Normals : mean of 93,3 % • Few with very low score SPACE Symposium - 06/02/09
Solving microphone issues • Two microphones were used. • Difference can be found in cepstral means ( Cepstral mean subtraction was performed) : SPACE Symposium - 06/02/09
Training / validation • Models chosen with five-fold cross validation • Measure = Standard deviation (STD) : in case of normality, 67% of the computed score lie in an interval of STD around the perceptual score • More features = more chance of overfitting • Rule of thumb : take 1 feature for every 10 training examples Restrict number of features to maximum 15 SPACE Symposium - 06/02/09
Results : individual systems PMFelis : 9.52 PMFesat : 8.57 SPACE Symposium - 06/02/09
Results : individual systems PLF (elis) : 9.35 CD-PLF (elis) : 8.48 SPACE Symposium - 06/02/09
Results : all systems • New models with CD-PLF outperform old PLF models • CD-PLFs form best system with one feature set • PMFesat + CD-PLF best system with combined feature sets • Using three ELIS feature sets yields next best result and needs only one recognizer (the simplest one) less complex system SPACE Symposium - 06/02/09
Results : combined system CD-PLF + PMFesat: STD = 7.34 SPACE Symposium - 06/02/09
Results : pathology-specific IPM • Instead of creating one general IPM, one can create IPMs for specific pathologies : • trained on all speakers (to have enough speakers) • model selection based on performance on speakers of that pathology (importance of features depends on type of disorder) SPACE Symposium - 06/02/09
Results : pathology-specific IPM (2) • Very good match in case CD-PLFs are involved • New models with CD-PLF outperform old PLF models • CD-PLFs form best system with one feature set • Using three ELIS feature sets yields (almost) best result and needs only one recognizer (the simplest one) less complex system SPACE Symposium - 06/02/09
Results : pathology-specific IPM • Dysarthria : 6.32 (red circles) • Dispersion of other speakers is increased • Largest deviations in low intelligibility area : • scarce data in that area • can be solved by adding more weight to patients with very low intelligibility SPACE Symposium - 06/02/09
Conclusions and future work • PMF, PLF and CD-PLF can predict intelligibility of pathological speech: • CD-PLFs seem to play an important role : • STD = 7.34 for general model combining CD-PLF and PMFesat • STDs less than 6.32 for pathology specific model using 3 elis feature sets not the articulation pattern but the change in the articulation pattern matters? • More research is needed before adding this feature set to the tool • Results on validation set compete with human inter-rater agreements. • Future work: • more profound articulatory assessment, which is directly related to determination of appropriate therapy • monitoring of effectiveness of chosen therapy • using more natural speech (words, phrases) in tests SPACE Symposium - 06/02/09
Questions? SPACE Symposium - 06/02/09