1 / 22

HLT R&D in South Africa

HLT R&D in South Africa. HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa. Overview. Specific R&D challenges Areas of active research Text processing Speech processing Applications of HLT Main projects: current and recent

gur
Download Presentation

HLT R&D in South Africa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

  2. Overview • Specific R&D challenges • Areas of active research • Text processing • Speech processing • Applications of HLT • Main projects: current and recent • Research institutions active in HLT • Main R&D sponsors

  3. Specific R&D Challenges • Incompleteness of basic linguistic knowledge • Scarcity of resources • Linguistic data • Technology components • Uniqueness of user populations and languages

  4. Research areas (1) • Text processing: • Computational morphological analysis, POS tagging • Spelling checkers, grammar checkers • Machine translation, machine-aided translation • Computational lexicography • Wordnets • Research focus: • Development of basic required components and tools • Data collection and corpus development • Technology transfer, cross-language learning, bootstrapping, language distances • MA for agglutinative languages

  5. Research areas (2) • Speech processing: • ASR, TTS, spoken dialogue systems • Phonetic investigations for HLT • Speaker verification, S-LID • Speech tools (diarization, channel normalisation, speech detection) • Research focus: • Development of basic required components and tools • Data collection and corpus development • Technology transfer, cross-language learning, bootstrapping, language distances • Timing information in speech • Multi-accent and multilingual acoustic modelling • Higher order Markov models and other non-standard acoustic models

  6. Research areas (3) • Applications of HLT • Telephone-based information systems • Computer assisted language learning • Document proofing tools • Accessibility devices • Mobile devices

  7. Main R&D initiatives • Department of Arts and Culture (DAC) Applications that support multilingualism, especially related to government service delivery • DAC A: Spelling checkers • DAC B: Machine-aided translation • DAC C: Lwazi: Multilingual telephony-based information delivery • Department of Science and Technology (DST) Directed research in HLT aimed at addressing SA national priorities. • National HLT Network projects • International collaborative projects • Various individual research projects

  8. Main R&D projects • Text processing: • Computational morphological analysis: Unisa • Spellcheckers: DAC A • Machine translation: EtsaTrans, DAC B • Speech: • Phonetic investigations: NHN PAST • ASR/TTS/spoken dialogue systems: • AST, Limpopo ASR • OpenPhone, Lwazi (DAC C) • Mobile E-learning for Africa (MELFA)

  9. UNISA Computational Morphological Analysis • Development of parsing tools for Bantu languages: • computational morphological analysers • disambiguators • syntactic parsers • Development of supporting resources for development & testing, includes extensive underlying machine-readable lexicons • Status: • Initiated in 2002 (for isiZulu morphological analyser) • Various prototypes under development (isiZulu, isiXhosa, Siswati, isiNdebele, Northern Sotho and Setswana) • Extended until 2010 • Principal researchers: • Sonja Bosch (Project Leader), Laurette Pretorius • Ansu Berg, Axel Fleisch, Albert Kotze, Petro Kotze, Memezi Mfusi, Lydia Mojapelo, Rigardt Pretorius, Linda van Huyssteen, Biffy Viljoen • Sponsor: NRF

  10. DAC A: Spelling checkers for public administration domain • Development of spelling checkers for 10 official SA languages • Specifically for use in government departments. • Spelling checkers for isiNdebele, isiXhosa, isiZulu and Siswati include morphological analysers for effective spellchecking of these agglutinative languages • Status: • Final evaluation by client in progress • Principal researchers: • MJ Puttkammer (NWU), S Pilon (NWU), DJ Prinsloo (UP), SE Bosch (Unisa) • Sponsor: Department of Arts and Culture, CText

  11. EtsaTrans Machine Translation • Development of a functional machine translation system. • Focus domain: mainly administrative documents • Main languages: English to Afrikaans, Afrikaans to English • Other languages: English to Xhosa, English to Southern Sotho • Harvesting previously translated information to create parallel corpora • Status: • Initiated in 2003, ongoing • Prototypes in use • Principal researchers: • JA Naudé, L Jordaan • Sponsor: UFS

  12. DAC B: Machine-aided translation tools • Development of translation tools: • An integrated translation environment (ITE) • Word translators • Machine translation systems for three language pairs • Terminology management system • Document management system • Status: • Under development (2007-2010) • All tools, data and research output to be made available publicly • Principal researchers: • HJ Groenewald, S Pilon (NWU) • DJ Prinsloo (UP) • Sponsor: DAC

  13. NHN PAST: Phonetics for Advanced Speech Technology • Technology-orientated investigation and description of the vowel system of the Sotho languages and tone in Sotho and Nguni language • Status: • Initiated May 2008, • Due for completion June 2009 • Principal researchers: • E. Barnard (Meraka) • B. Khoali (independent consultant) • D. Wissing (NWU) • S. Zerbian (Wits) • Sponsor: National HLT Network (DST/Meraka)

  14. African Speech Technologies (AST) • Development of a multilingual telephone-based hotel reservation system. • Developed corpora and technology components (TTS, ASR, dialogue systems) for SAE, Afrikaans, isiZulu, isiXhosa and Sesotho. • Status: • Completed 2004 • Gave rise to commercial company: Catchword • Data available for research purposes (release imminent) • Principal researchers: • J.C. Roux, E.C. Botha, J. du Preez • Various collaborators • Sponsor: • DACST (Innovation Fund)

  15. Limpopo ASR • Development of baseline automatic speech recognition systems for the major languages of the Limpopo Province • Languages: Sepedi (Sesotho sa Leboa), Setswana, Tshivenda and Xitsonga. • Telephone speech data collection and manual annotation • Extension to text-to-speech synthesis and domain-specific prototype dialogue systems • Status: • Baseline ASR systems completed (2004-2006) • Extension ongoing • Principal researchers: • HJ Oosthuizen and MJD Manamela • Sponsor: Telkom and other industry partners

  16. OpenPhone • Demonstrated use of telephone-based information services in providing health information in a rural setting. • Automated health information system that provides information to caregivers looking after HIV-positive children living in the vicinity of Gabarone in Botswana • Includes Setswana TTS and ASR development • Status: • Completed 2008, currently live. • http://www.meraka.org.za/hlt_projects_ophone.htm • Principal researchers: • Etienne Barnard, Marelie Davel, Madelaine Plauche • Sponsor • OSI/OSISA, DST

  17. Lwazi • Development and piloting of a fully Open Source multilingual telephone-based information system • ASR and TTS systems in 11 official languages • ASR and TTS integrated into a telephony platform • Open Source resources and tools • Various pilots: first significant pilot with DPSA Community Development Workers • Status: • Initiated September 2006 • On track for completion September 2009 • Principal researchers: • Etienne Barnard, Marelie Davel, Gerhard van Huyssteen • Sponsor: • DAC

  18. Mobile E-learning for Africa (MELFA) • Mobile solutions for on-site literacy training and skills development for workers in the Building and Construction Industry • Includes text-to-speech, speech-to-speech translation • Initially 30 test persons in Western Cape are involved in testing the modules for interactive M-learning. • Status: • Initiated in 2007, completing in 2009. • Principal researchers: • JC Roux (Project leader, SA), A Visagie, H Engelbrecht, A Magnusdottir, P Scholtz. • Sponsor: Danida (Danish government organisation)

  19. Research institutions: Text 1 Size: snr researchers / post-graduate students

  20. Research institutions: Speech

  21. Main R&D sponsors • Department of Arts and Culture (DAC) Applications that support multilingualism, especially related to government service delivery • Department of Science and Technology (DST) Directed research in HLT aimed at addressing SA national priorities. • National Research Foundation (NRF) Support for individual researchers • Industry: Addressing industry-specific needs • ASR/TTS (Telkom, Intelleca, IBM, Google and others), Spelling checkers (Microsoft) • Speech processing tools (Grintek,Armscor), Speech-to-speech translation (Armscor) • International donor funding Addressing developmental needs • Open Society Initiative (OSI/OSISA), Danish Danida, • UK Dept for International Development (DfID) • Canadian International Development Research (IDRC), and others • Host institutions (Universities, CSIR, etc)

More Related