100 likes | 116 Views
Internationalising SSML Perspectives from the Local Language Speech Technology Initiative Ksenia Shalonova & Roger Tucker Outside Echo Ltd. Local Language Speech Technology Initiative.
E N D
Internationalising SSMLPerspectives from the Local Language Speech Technology InitiativeKsenia Shalonova & Roger TuckerOutside Echo Ltd
Local Language Speech Technology Initiative Mission–provide tools, support and training for developing Speech and Language systems for indigenous languages mainly in developing countries. kiSwahiliUkimkirimu Mola wako hukosi fungu lako. (If you are generous to your God, you will not miss your share.) isiZulu IMalaria idalwa amagciwane ahlasela igazi lomuntu. (Malaria is caused by parasites that infect human red blood cells.) HindiMaleria kithnaa gambheer hai?(How serious is Malaria?)
UK Germany South Africa india Nigeria Kenya Local Language TTS (Text to Speech) Outside Echo, Bristol CSTR, Edinburgh University of Bielefeld IIIT Hyderabad HP Labs IISc Bangalore University of Uyo University of Nairobi Meraka, Pretoria
Decomposition of a Word into its Constituents (1) • Required for the following TTS modules: • Proper grapheme-to-sound rules (agglutinating languages) • Proper tone assignment (agglutinating tonal Bantu languages) • <morph decompose_as="dep+pe">deppe</morph> • (Ibibio reduplication dep – “buying” and deppe – “not buying”) The following Turkish word is quite easily decomposed into its constituents: osman+li+las+tir+ama+yabil+ecek+ler+imiz+den+mis+siniz+cesine (“as if you were of those whom we might consider converting into an Ottoman”)
Decomposition of a Word into its Constituents (2) (Schwa deletion in Hindi Compound Words) • Each consonant in Hindi is associated with inherent schwa when schwa can be either deleted or preserved. In order to provide proper schwa deletion rules the following decompositions are required: • Decomposing Hindi compound words into single words • Decomposition of Hindi non-compound words into morphemes • Lok(“public”)+sabhA(“gathering”)=>loksəbhA(“lower house of the parliament”) • <morph decompose_as=“lok+sabhA">loksabhA</morph> • Another option for specifying Hindi schwa deletion could be <say-as>
Decomposition of a Word into its Constituents (3) (Moving Lexical Stress in Russian) Proper lexical stress assignment is based on the stem type and the morphological class. Possible solutions in SSML annotation: 1. Adapting a morphological lexicon (a pronunciation lexicon is not helpful as the number of wordforms in Russian is enormous) 2. Decompose a word into its constituents. Decomposing words in the inflecting languages into its constituents by naïve speakers is much more difficult than decomposing words in the agglutinating languages. 3. Inserting an explicit tag <lexical_stress> would be the easiest way of handling a moving lexical stress. E.g. b<lexical_stress>e</lexical_stress>gal
Prosody (general remarks) Prosody is mainly realised on the syllabic level Tags either for all syllables or only for particular syllables are required for proper assignment of prosodic features A Tag for a particular syllable: good <syllable prosody_rate="+10%" stress=“yes” emphasis_level="strong">mor</syllable>ning
Prosody (African Tonal Languages) 1. Lexical tones (tones function as phonemes). Require phonemic tone markup as used for Mandarian. 1.1. Floating tone (a morpheme that contains only tone) <tone floating_tone= "yes">ba</tone> 2. Grammatical tones (tones define grammatical categories). <morph decompose_as="dep+pe" tone= "h+l">deppe</morph> 3. Terraced tones (realisations of grammatical tones on the basis of a finite state model). <morph decompose_as="dep+pe" terrace_pos= "1+2">deppe</morph>
Dialects and Styles 1. Tags for the Dialects <lang= "kiSwahili"region= "Comoros Islands"> <lang= "kiSwahili"dialect="Kingozi" normative= "yes"> 2. Tags for the Styles Ibibio culture requires a TTS in a gentle voice. Is the available attribute volume= "soft" enough? <voice type = "gentle">
Summary • Decomposition of words into either morphemes or syllables is required for • tone assignment • pitch & duration assignment • grapheme to phoneme rules • 2. Moving lexical stress may need to be tagged explicitly • 3. Dialects and styles need to be supported