1 / 13

Modern speech synthesis: communication aid personalisation

Modern speech synthesis: communication aid personalisation. Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University of Sheffield. Introduction. Building voices for VIVOCA (communication aids) Speech synthesis techniques

dalbaugh
Download Presentation

Modern speech synthesis: communication aid personalisation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University of Sheffield

  2. Introduction • Building voices for VIVOCA (communication aids) • Speech synthesis techniques • Future research: personalisation of synthetic voices

  3. Current speech synthesis: communication aids • High quality voices available • E.g. Toby Churchill Lightwriter • DECtalk™ (Fonix) for American English • Acapela for British English • Personalisation limited: age, gender, language

  4. Personalisation • Voice = identity • Gender • Age • Geographic background • Socio-economic background • Ethnic background • As that individual • Maintains social relationships • Maintains social closeness • Sets group membership

  5. VIVOCA • Voice Input Voice Output Communication Aid Dysarthric speech input Recognised text Speech Recognition Text-to-Speech Synthesis Intelligible and personalised synthesised speech output Intelligiblesynthesised speech output • Retain elements of clients’ identity for • synthesised speech output

  6. VIVOCA: personalisation • Sheffield/Barnsley user group • Retain local accent • geographic identity • Speaker database • Arctic database: 593 + 20 sentences • Professional local speakers • Ian McMillan • Christa Ackroyd

  7. Concatenative synthesis Festvox: http://festvox.org/ Speech recordings Unit segmentation Input data i a sh Unit database Text input Synthesised speech Unit selection Concatenation + smoothing … + + + …

  8. Concatenative synthesis • High quality • Natural sounding • Sounds like original speaker • Need a lot of data (~600 sentences) • Can be inconsistent • Difficult to manipulate prosody

  9. HMM synthesis y e s yes

  10. HMM synthesis procedure HTS http://hts.sp.nitech.ac.jp/ Speech recordings Text input Input data e Training Synthesis t Synthesised speech Speaker model

  11. HMM synthesis • Consistent • Intelligible • Needs relatively little input (~20 mins) • Can be adapted with small amount of data (>5 sentences) • Easier to manipulate • Buzzy quality • Less natural than concatenative

  12. Future research • Further personalisation for individuals with progressive speech disorders • Capturing the essence of a voice • Voice banking • Before deterioration • Adaptation using HMM synthesis • Before or during deterioration

  13. Thank you This work is sponsored by EPSRC Doctoral Training grant For further details of VIVOCA see: http://www.shef.ac.uk/cast/ Email: S.Creer@dcs.shef.ac.uk

More Related