E N D
Language Resources and CALL ApplicationsHelmer Strik1, Jozef Colpaert2, Joost van Doremalen1, and Catia Cucchiarini11 Centre for Language and Speech Technology (CLST) Dept. of Linguistics, Radboud Univ. Nijmegen, The Netherlands2 Linguapolis, University of Antwerp, Antwerp, Belgium
Language Resources and CALL • The current presentation: • The relation between • language resources and CALL systems • CALL: Computer Assisted Language Learning • We focus here on the project DISCO: • Development and Integration of Speech technology into COurseware for language learning LREC 2010, Malta, 22-05-2010
Overview • A short introduction to DISCO • Resources used to develop a CALL system • Resources obtained during development of a CALL system • Resources obtained using a CALL system • Conclusions Dr. Spraak (Dr. Speech) LREC 2010, Malta, 22-05-2010
A short introduction to DISCO • DISCO project: • develop a prototype of a CALL system • that can give feedback • on spoken utterances • Levels: • pronunciation (of sounds) • grammar (syntax & morphology) LREC 2010, Malta, 22-05-2010
Syntax exercise LREC 2010, Malta, 22-05-2010
Morphology exercise LREC 2010, Malta, 22-05-2010
Pronunciation exercise – with feedback LREC 2010, Malta, 22-05-2010
Menu: conversation environment report, learner is listening to own speech in complete conversation LREC 2010, Malta, 22-05-2010
Menu: conversation environment report, learner is reviewing pronunciation mistakes by listening to own speech LREC 2010, Malta, 22-05-2010
Menu: remediation environment, overall scores for phonemes, learner can start remediation by clicking on a phoneme LREC 2010, Malta, 22-05-2010
Menu: remediation environment, pronunciation exercise LREC 2010, Malta, 22-05-2010
Menu: remediation environment, learner is reviewing progress LREC 2010, Malta, 22-05-2010
Characters in DISCO LREC 2010, Malta, 22-05-2010
ASR-based CALL • ASR: Automatic Speech Recognition • standard ASR: from (native) speech to words LREC 2010, Malta, 22-05-2010
ASR: Automatic Speech Recognition LanguageModel AcousticModels Lexicon Decoder W1 W2 W3 W4 Speech SignalInput WordsOutput LREC 2010, Malta, 22-05-2010
ASR-based CALL • ASR: Automatic Speech Recognition • standard ASR: from (native) speech to words • ASR for CALL, 2 phases: • 1. content, what has been said, tolerant; • recognize words despite non-native variation • 2. form, how has it been said, strict; • error detection, find deviations from native … LREC 2010, Malta, 22-05-2010
Resources used to develop a CALL system (1) • More general, native resources: • ASR toolkit – e.g. SPRAAK [from Stevin] • Corpus with native speech – e.g. Spoken Dutch Corpus (CGN) [from TST-Centrale] • Native lexicon – e.g. e-Lex [from TST-Centrale] LREC 2010, Malta, 22-05-2010
Resources used to develop a CALL system (2) • More specific, non-native resources (often not available) to develop / improve the 2 phases: • Phases 1 + 2. Corpora with non-native speech : JASMIN [from Stevin]; CITO, Triest, Dutch-CAPT • Phase 1. word recognition, content Resources, information to model non-native 'behavior', in order to improve: • Acoustic Models: mainly by training on non-native audio (from speech corpora) • Lexicon & Language Model: data-driven, from non-native audio, or knowledge based, from lit. etc. LREC 2010, Malta, 22-05-2010
Resources used to develop a CALL system (3) • More specific, non-native resources (often not available) to develop / improve the 2 phases: • phase 2. error detection (classifiers), strict; • A. Decide which errors to address, criteria + selection => inventory data-driven and/or knowledge based • B. Develop classifiers, train and test; data-driven • A & B. data-driven => Resources needed: annotations for audio Levels: • Pronunciation: sounds [& prosody, not in DISCO] • Grammar: syntax & morphology LREC 2010, Malta, 22-05-2010
Resources obtained during development • Blue-print of the design • Content • specifications for exercises and feedback strategies • a list of predicted correct and incorrect utterances • Modules for the 2 phases: 1. word recognition, 2. error detection • The CALL system itself, the whole system, prototype with content LREC 2010, Malta, 22-05-2010
Resources obtained using a CALL system • Audio recordings • Log-files: user + system 'behavior' • Videos LREC 2010, Malta, 22-05-2010
Conclusions • Language Resources • important role in relation to CALL systems • Language Resources • are needed to develop a CALL system • can be obtained during development of a CALL system • can be obtained using a CALL system • Language Resources obtained give rise to new opportunities: • research • system development LREC 2010, Malta, 22-05-2010
THE END • Website DISCO • lands.let.ru.nl/~strik/research/DISCO/ LREC 2010, Malta, 22-05-2010
Stevin project DISCO • Trainen van spreekvaardigheid • uitspraak, morfologie, syntax • Correct • Voorbeeld Ik loop naar huis • Fouten • Uitspraak Ik lop nar guis • Morfologie Ik lopen naar huis • Syntax Ik naar huis lopen • Fouten automatisch detecteren • m.b.v. spraaktechnologie LREC 2010, Malta, 22-05-2010
DisplayLogic FeedbackGeneration ErrorDetection Grading PromptGenerator Segmentation Words ASR LREC 2010, Malta, 22-05-2010