350 likes | 470 Views
Dictionaries for the Human Language Technologies virtual network. Dr Mariëtta Alberts Focus Area Manager Standardisation and Terminology Development Pan South African Language Board (PanSALB). Outline of presentation. Introduction Reviewing Human Language Technologies Scope of HLT
E N D
Dictionaries for the Human Language Technologies virtual network Dr Mariëtta Alberts Focus Area Manager Standardisation and Terminology Development Pan South African Language Board (PanSALB)
Outline of presentation • Introduction • Reviewing Human Language Technologies • Scope of HLT • Potential of HLT • Multilingualism and HLT • The South African HLT initiative • History of South African HLT project • National Facility • South African HLT model • Terminology Training initiative of PanSALB • Conclusion Afrilex,13 - 15 July 2005, UFS, Bloemfontein
1. Introduction • South Africa is on the verge of establishing a Human Language Technology (HLT) Centre • The Centre will probably be managed as a national facility • It will provide an appropriate and sustainable virtual (or otherwise) infrastructure conducive to the development and effective management of reusable electronic text and speech resources Afrilex,13 - 15 July 2005, UFS, Bloemfontein
2. Reviewing Human Language Technologies (HLT) • Human Language Technologies are enabling technologies • They enable human beings to interact with computers by using human language (text and speech) Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Human Language Technologies range from: • high-level parsing and machine translation • applications in education and training • public service (e-governance and e-commerce applications) • voice-operated educational systems • voice-operated commercial systems that can be used by illiterate people Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Human Language Technologies: • Provide interfaces that enable spoken human-machine interaction (telephone-based information systems, automated booking systems); • Provide linguistic assistance (spelling and grammar checking) • Provide access to multilingual polythematic information • Empower people to actively participate in the Information Society Afrilex,13 - 15 July 2005, UFS, Bloemfontein
2.1 The scope of HLT: • Text based language processing • Text analysis (e.g. spellcheckers, term extraction, search engines) • Summarisation • Text translation • Speech processing • Speech recognition (e.g. desktop or telephony environment) • Speech synthesis Afrilex,13 - 15 July 2005, UFS, Bloemfontein
2.2 Potential of HLT: • Access for all to the information era • Enhanced mother-tongue or first language teaching • Affordable multilingual documents • Improved functionality and quality of languages • Contact with the developing-world context Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Potential of HLT... • Availability of multilingual words and polythematic terminology: indicator of development • Specialised communication has a central axle or hub in terminology • Standardised terminology contributes to quality of translations, interpreting and communication • Streamlined translation and interpreting services provide competitive advantages Afrilex,13 - 15 July 2005, UFS, Bloemfontein
2.3 Multilingualism and HLT: South African situation • South Africa has a severe illiteracy rate • Only 22% of the citizens can function through medium of English • A small percentage of South Africans have access to computers - fewer still are IT literate • The divide is even greater in the rural versus urban scene Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Effective e-government is necessary (i.e. birth certificates, identity documents, marriage and death certificates, telephone, electricity and water bills, traffic fines, etc.) • All citizens should have access to information in the languages they understand best (e.g. 11 official languages; South African Sign Language; Khoe and San languages) • Government should communicate to citizens in their own languages regarding key services (e.g. health; safety and security; education; postal services; justice (courts); banks (economy); media (electronic and print); labour (jobs); social welfare (pensions); etc.) Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Language Policy and Legislation • Multilingual policy since 1994 - South African Constitution of 1996 (Act 108 of 1996) • Mechanisms of protecting and promoting linguistic rights were put in place • Section 6 of the South African Constitution specifically mentions the principles of language policy which takes into consideration the multilingual nature of the South African society Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Establishment of PanSALB • The Pan South African Language Board (PanSALB) (Act 59 of 1995) was established: • to develop, promote and ensure use of South Africa’s eleven official languages, South African Sign Language (SASL) and the Khoe and San languages, and • to promote respect for other languages used in the country (e.g. heritage languages (Dutch, French, German, Hindu, KiSwahili, Portuguese, Tamil, etc. ) Afrilex,13 - 15 July 2005, UFS, Bloemfontein
PanSALB ensures the implementation of the National Language Policy Framework (NLPF) to ensure access to services to all citizens through: • 9 Provincial Language Committees (PLCs) • Assist Provinces with language policy formulation and implementation • 13 National Language Bodies (NLBs) • Standardisation (e.g. spelling and orthography rules) • Terminology development • Dictionary needs (general vocabulary) • Literacy and media • Research and Education • 11 National Lexicography Units (NLUs) • Compilation of comprehensive monolingual and other types of dictionaries Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3. The South African HLT initiative 3.1 History • Lexinet research programme of HSRC (1988) (Wordnet, Termnet, Docnet, Transnet, Ailang, etc.) • PanSALB and DACST (now DAC) initiated the HLT project in 1999 • The former Minister of DACST appointed a panel of experts to investigate the establishment of a HLT virtual network • The HLT task team concluded that a HLT National Facility should be established • The developers of the envisaged HLT National Facility should ensure that HLT advance multilingualism in different respects, i.e.: Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Key government documents in the languages the citizens can understand best • Electronic systems to connect lexicographers and terminologists with other language practitioners • Electronic systems to disseminate lexicographical and terminological data • Electronic systems to connect translators and other language workers with word and term banks • Central government assistance to meet communication needs of all its citizens • Local and provincial governments to serve as focal points of information dissemination (e.g. multipurpose community centres) Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3. The South African HLT initiative 3.2 National Facility • Purpose of HLT project: • to fast track the use and development of indigenous languages • to promote the SA government’s policy of multilingualism • to facilitate better service delivery for citizens to access or supply information in any of the official languages Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Basic premises for the development of HLT: • development and effective management of reusable text and speech resources in all official languages of SA; • capacity building with respect to research and development in the field of HLT; and • stimulation of an HLT industry that will provide language-based electronic products which, in turn, will be applicable in all relevant sectors, especially in the government sector. Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3 SA Human Language Technologies Model • The South African HLT model is based on a model being implemented by the European Union (EU) • EU model is effectively implemented in the EU Framework Programmes (FP 3/4/5/6) • South African HLT model will grow exponentially as expertise and resources are developed Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3.1 Aims of envisaged HLT virtual network • An e-government process needs to provide citizens with: • Access to online facilities • Required and necessary service delivery • Infrastructure to make it work • Two basic prerequisites are: • A technical infrastructure (IT access; proven and multipurpose IT systems; online language services) • Human capital(capacity building e.g. trained and reskilled language practitioners) Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3.2 Identified needs: • Low general awareness level regarding HLT benefits • Interdisciplinary curricula at tertiary level to advance HLT development • Systematic presentation of short dedicated HLT courses • Theoretical and practical training in the fields of lexicography and terminology • Job creation should be carefully planned • Upgrade and maintain a knowledge base on HLT Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3.3 Proposed three-step strategy for development of HLT model: • Step 1: Applied research and capacity building, production of language resources, development of enabling technologies and of a HLT industry. • Step 2: Development of a legal framework to ensure systematic acquisition, administration and conservation of electronic language resources. • Step 3: Development of an infrastructure to manage the implementation of the proposed HLT model Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3.4 Role players • Government services: national, provincial and local (e.g. e-government, e-learning, e-commerce, etc.) • Parastatal institutions (e.g. PanSALB) • Private sector • Academia (tertiary education) • Education (primary and secondary education) Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3.5 Progress • Parsing (Zulu and other African languages) by Special Interest Group (SiG), African Languages Association of Southern Africa (ALASA) • Speech recognition (Tourism: pilot booking service) • Amalgamated Banks of South Africa (ABSA) multilingual pilot project: ATM screen prompts and telephone banking prompts in African languages (Zulu, Xhosa and South Sotho) Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Progress... • TISSA (Telephone Interpreting Service of South Africa) (all ports of entry; health services; police charge offices; etc.) • Spellcheckers: Afrikaans developed by North-West University; African Languages by University of Pretoria/North West University; future development combined effort • Microsoft human/machine interface: combined effort re terminology development • Afrilingo: e-learning tool for language acquisition (11 official SA languages) Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Progress ... • TshwaneLex: dedicated computer software program for data capturing (lexicography) • 11 National Lexicography Units (NLUs) of PanSALB: Monolingual dictionaries for each of the 11 official South African languages • NLUs: Data collection and building of corpora • NLUs: on-line dictionaries (e.g. Afrikaans, Northern Sotho (Sesotho sa Leboa)) • TshwaneTerm: dedicated computer software program for data capturing (terminology)?? Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Progress ... • National term bank (multilingual, polythematic): Terminology Coordination Section (TCS) of the National Language Service (NLS), Department of Arts and Culture (DAC) • Latin terminology: interactive multilingual e-learning project (PanSALB, CLTAL, Trydian Interactive) • Mathematics on-line dictionary project: South African Multilingual Mathematical Lexicon (SAMML) Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Lexicographical and Terminological information available on HLT virtual network • SA Government has approved the development of a human language technology (HLT) virtual network • All lexicography and terminology endeavours to be part of HLT virtual network • For multilingual words and terms to be available on HLT virtual network to end-users (subject specialists, students, language practitioners, general public) - dictionaries are needed!!! Afrilex,13 - 15 July 2005, UFS, Bloemfontein
4. New terminology training initiative from PanSALB: • Members of TCs, NLBs: Guidelines to verify and authenticate terms • Skills development: Language practitioners: terminologists, lexicographers (e.g. NLUs), translators, interpreters, linguists, teachers, journalists, language students, etc. • Skills development: subject specialists • Reskilling: Unemployed language workers Afrilex,13 - 15 July 2005, UFS, Bloemfontein
NLUs Lexicography School for Languages NLBs Terminology Statistics PLCs University A Human Language Technology Virtual Network Zoology TCS NLS Psychology LUs Afrilex,13 - 15 July 2005, UFS, Bloemfontein
5. Conclusion: • Development of skills • Enhancement of South African languages • Development of languages into functional languages • Dissemination of multilingual polythematic (speech and text) information within the South African community • Better communication among all citizens in different spheres of life • Improvement of computer literacy Afrilex,13 - 15 July 2005, UFS, Bloemfontein
“Utilising technology for the development of the South African languages and developing these languages for use with Human Language Technology applications such as spellcheckers, translation memories and speech-recognition systems will enhance the status of the indigenous languages and will result in increased job opportunities in the language field.” Dr Ben Ngubane (former Minister of Arts Culture Science and Technology) 2003