110 likes | 266 Views
Language Resources in Indonesia. Language Technology & Applied Information Laboratory Directorate for Information Technology and Electronics Agency for the Assessment & Application of Technology (BPPT) Indonesia. TBIT Laboratory - BPPT.
E N D
Language Resources in Indonesia Language Technology & Applied Information Laboratory Directorate for Information Technology and Electronics Agency for the Assessment & Application of Technology (BPPT) Indonesia
TBIT Laboratory - BPPT • Apply, assess and develop Language Technology & Applied Information Technology supporting Government’s program in development of IT & Electronics in Indonesia • Advise and setup government national policy in developing language technology and information technology • Develop and deploy language technologies in the area of language processing, text analysis and generation, information retrieval and extraction, machine translation • Develop and maintain Language Resources i.e. grammar rules, electronic dictionaries and annotated corpus • Develop Electronic Data Interchange (EDI) and Electronic Commerce suite for SME
Project Portfolio • Multilingual Machine Translation System (CICC-MMTS) • KEBI (Indonesian Electronic Dictionaries) • UNL (Universal Networking Language) • INCI (Indonesian National Corpus Initiative) • Online I-E Dictionary on news portal (Detik.com) • Multimedia Dictionary (including speech synthesizer) • Yanetra (NLP tools for the blind) • Others • Manufacturing Technology supported by advanced and integrated information system through International Cooperation (MATIC) for Automotive, Apparel, and Electronics • Web Information Gateway for Apparel • Electronic Commerce Projects
Indonesian Electronic Dictionaries - KEBI • Word dictionary (50K root words ~250K derivational words) • Concept dictionary • Co-occurrence dictionary • Terminology dictionary (15K terms)
Indonesian Dictionary Online – KEBI Online http://nlp.aia.bppt.go.id
English Summarization Online Dictionary Content Management System News Article User Mini Web Pages with English word links Dynamic HTML Generator Indonesian-English Online Dictionary • Indonesia-English Online Dictionary on Detik.com Portal (number 1 for online breaking news)
Indonesian National Corpus Initiative INCI/KNBI • Source from national news agency LKBN ANTARA • 50.000 sentences • ~ 1 million words • ambiguous word-type • ambiguous word-token • POS and phrase attachment ambiguity
BIAS (Bahasa Indonesia Analysis System) • Part of CICC-MMTS • Improvement using stochastic-symbolic approach • Supervised and unsupervised learning • 15.000 sentences of annotated corpus (based on GDA tagset) • ISTAG (POS Tagger) • ISPARSE (Skeleton Parser)
Universal Networking Language (UNL). - Deconverter & Enconverter System - UNL graph displayer System - UNL editor System - Indonesia Language Server : http:// unlserver . aia .bppt.go.id 12 UNL Project
Other resources • Speech recognition system (Bandung Institute Technology) • Indonesian spelling checker for Microsoft Word (Gajah Mada University) • Computational lexicon research (National Language Center) • Computational morphology (Atmajaya University)