1 / 19

Data-driven approach to rapid prototyping Xhosa speech synthesis

Data-driven approach to rapid prototyping Xhosa speech synthesis. Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University South Africa. Introduction. Japan-South African Intergovernmental Science and Technology Cooperation Programme. Goals:

varick
Download Presentation

Data-driven approach to rapid prototyping Xhosa speech synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University South Africa

  2. Introduction • Japan-South African Intergovernmental Science and Technology Cooperation Programme. • Goals: • Understand what is needed from a linguistic and technology standpoint. • Build a text-analysis front-end. • Experimental platform.

  3. Outline • Xhosa: • orthography, • phonetics, • tone • Approach: • Text analysis, • HTS.

  4. Xhosa • Xhosa is spoken in South Africa, by about 8 million people. • One of the official languages of South Africa • Writing system is relatively young, and based on English letters. • Many dialects. • Borrowed clicks from Khoisan.

  5. Xhosa: Orthography Agglutinative language. Nouns: • 15 classes (including plural & singular). • Nouns affixed for dimunitive. Verbs: • Verbs affixed according to subject, tense, negative etc. Examples: teach: -fund- preacher (teacher): umfundisi  u + m(u) + fund + is + i small preacher: umfundisana  u + m(u) + fund + is + ana He/she will teach them: uzakubafundisa  u + za + ku + ba + fund + is + a

  6. Xhosa: Phonetics Consonants: • Implosive /b/ • Ejectives and aspirated versions of stops. • 15 Clicks Vowels • Five basic vowels, including long versions.

  7. Xhosa: Tone • According to the literature, it’s a tone language. • High, Low, and Falling tones. • Recent dictionary: has tone marked for root morphemes, rules can be constructed to predict movement under morphological composition. • Recent work: • Downing, Roux, argue for accent. • Kuun: Statistical experiment suggests highly regular structure. • Observed regularity on pitch rises and duration increase gives a simple method to use in a first prototype.

  8. Approach Focus on language dependent components: • Build the text analyser, • use an existing synthesiser. Choice: HTS 2.0 • Model driven, trainable synthesiser. • Contains language independent F0 and duration models • Good use of synthesis database by predicting spectrum, F0 and segment duration separately.

  9. HTS

  10. HTS: Symbolic Features Each segment of audio (HMM state) is labelled according to its linguistic context Examples: • Phonetic context: labels of preceding and following phones. • Parts-of-speech. • Stress or canonical tone. • Counting.

  11. Text Analyser Components Components: • Orthographic to phonetic • Morphological analysis • Parts-of-speech • Canonical tone marks

  12. Orthographic to Phonetic • The orthography is very young, and highly consistent with the pronunciation. • Hand-written letter-to-sound rewrite rules. • Lexicon for loan words.

  13. Morphology • Specially bootstrapped from a Zulu version for this project. • Requires a lexicon of root morphemes. • Works with isolated words. • Ambiguous! • Ideal: root morpheme boundaries, affix types, POS tagger for disambiguation. • Implemented: None

  14. Parts-of-Speech • Morphological analysis. • Ideal: POS tagger. • Implemented: Exhaustive lists of closed sets – pronouns, conjunctions, prepositions, etc.

  15. Tone • A printed dictionary with canonical tone markings for root morphemes is available. • Rules can be constructed to determine movement of at least High tones, under morphological composition. • Highly regular structure: 3rd-from-last syllable starts high pitch excursion, 2nd-from-last syllable lengthened. • Ideal: Exhaustive specification of set tones • Implemented: Word-level syllable counts (3-1, 2-2, 1-3)

  16. Tests • Basic intelligibility test:Listeners asked to transcribe what they hear. • Incomplete phrases. • Two versions of the question set, and natural utterances (recoded) • Mother-tongue and second language speakers. • Impressions: • “He’s from the townships.” • “That’s perfect, there’s nothing wrong with that.” • Also frowns and repeats.

  17. Next Steps • Comprehension test? • Impressions. • Baseline comparative/preference test. • Improvements • Question phrases. • Information from morphological analysis. • Canonical tone markings. • Zulu

  18. Conclusion • The system worked very well, considering the bare minimum of knowledge currently incorporated. • Data driven approach with HTS well suited to bootstrapping a new language. • Got experimental platform

  19. Demos “Ubangele amadoda amaninzi kule lali,” • Natural: • Synthesised: “waqalisa ukunqwenela ukuba nomzi.” • Natural: • Synthesised: Click song:

More Related