160 likes | 256 Views
MAI Internship April-May 2002. What?. The AST Project promotes development of speech technology for official languages of South Africa SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho Create reusable databases & software Prototype hotel booking dialogue system 2000-2003.
E N D
What? • The AST Project promotes development of speech technology for official languages of South Africa • SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho • Create reusable databases & software • Prototype hotel booking dialogue system • 2000-2003
AST dialogue system: basics Telephone Network DATABASE Speech Synthesis Speech Recognition Dialogue Manager Natural LanguageUnderstanding
AST Speech Database • Use? input ASR: acoustic training • output ASR: dictionary • Start from scratch, even for SAE • Telephone data based on SpeechDat • Datasheet utterances • Hierarchical recruiting method • Labeling Tool: PRAAT
Language Spoken Code No. of Speakers 1 English (E) Speech varieties: Mother-tongue English Black English Coloured English Asian English Afrikaans English EE BE CE ASE AE 1500-2000 300-400 300-400 300-400 300-400 300-400 2 isiXhosa (X) XX 300-400 3 Sesotho (S) SS 300-400 4 isiZulu (Z) ZZ 300-400 5 Afrikaans (A) Speech varieties: Mother-tongue Afrikaans Black Afrikaans Coloured Afrikaans AA BA CA 900-1200 300-400 300-400 300-400
AST Speech Database Acoustic signal Manual labour Orthographic annotation Rules & dictionary: Patana Phonemic transcription Forced alignment: HTK Phonetic alignment
AST Speech Recognition • Difficult: • Speaker independent, noisy conditions • Medium-size vocabulary (10.000 words) • Training data sparse • Not so difficult: • Dialogue Manager helps • Phoneme-based HMMs future diphones • Finite-state language model • Pitch & clicks African languages ignored
AST Natural Language Understanding • Same finite-state network as language model recogniser • +: all utterances ‘understood’ • -: FSG are limited • Makes no sense to recognise more than we can understand • Semantic labels are activated • Alternative: robust parsing (Phoenix, ATIS)
Meaning Recognised utterance Grammar ID Grammar ID AST Natural Language Understanding Speech Recognition Dialogue Manager NLU FSG
AST Natural Language Understanding • Embedded semantic tags: • ‘drie honderd duisend agt en neëntig’ 3 0 0 0 9 8 t1=3 t2=0 t3=0 V6=3 V5=0 V4=0 V3=0 V2=9 V1=8
AST Dialogue Manager • Trade-off: naturalness response restriction • System-directed: predictability user utterances, simple dialogues • Mixed-initiative: shorter dialogues, more recognition errors • User-initiative: unpopular
AST Dialogue Manager • Design: • Early focus on users and task • Wizard-of-Oz: pay no attention to the man behind the curtain • System-in-the-loop • Finite-state structure because of simplicity and functionality • Possible frame-based approach in future
AST Speech Synthesis • Fixed machine utterances: pre-recorded speech • Database queries: limited-domain synthesis (Festival platform)
Conclusion • Finite-state approach in • Recogniser • NLU component • Dialogue manager • Workable prototype • New fundings 2003