1 / 11

Language Resources in Indonesia

Language Resources in Indonesia. Language Technology & Applied Information Laboratory Directorate for Information Technology and Electronics Agency for the Assessment & Application of Technology (BPPT) Indonesia. TBIT Laboratory - BPPT.

quincy
Download Presentation

Language Resources in Indonesia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Language Resources in Indonesia Language Technology & Applied Information Laboratory Directorate for Information Technology and Electronics Agency for the Assessment & Application of Technology (BPPT) Indonesia

  2. TBIT Laboratory - BPPT • Apply, assess and develop Language Technology & Applied Information Technology supporting Government’s program in development of IT & Electronics in Indonesia • Advise and setup government national policy in developing language technology and information technology • Develop and deploy language technologies in the area of language processing, text analysis and generation, information retrieval and extraction, machine translation • Develop and maintain Language Resources i.e. grammar rules, electronic dictionaries and annotated corpus • Develop Electronic Data Interchange (EDI) and Electronic Commerce suite for SME

  3. Project Portfolio • Multilingual Machine Translation System (CICC-MMTS) • KEBI (Indonesian Electronic Dictionaries) • UNL (Universal Networking Language) • INCI (Indonesian National Corpus Initiative) • Online I-E Dictionary on news portal (Detik.com) • Multimedia Dictionary (including speech synthesizer) • Yanetra (NLP tools for the blind) • Others • Manufacturing Technology supported by advanced and integrated information system through International Cooperation (MATIC) for Automotive, Apparel, and Electronics • Web Information Gateway for Apparel • Electronic Commerce Projects

  4. Indonesian Electronic Dictionaries - KEBI • Word dictionary (50K root words ~250K derivational words) • Concept dictionary • Co-occurrence dictionary • Terminology dictionary (15K terms)

  5. Indonesian Dictionary Online – KEBI Online http://nlp.aia.bppt.go.id

  6. English Summarization Online Dictionary Content Management System News Article User Mini Web Pages with English word links Dynamic HTML Generator Indonesian-English Online Dictionary • Indonesia-English Online Dictionary on Detik.com Portal (number 1 for online breaking news)

  7. Indonesian National Corpus Initiative INCI/KNBI • Source from national news agency LKBN ANTARA • 50.000 sentences • ~ 1 million words • ambiguous word-type • ambiguous word-token • POS and phrase attachment ambiguity

  8. BIAS (Bahasa Indonesia Analysis System) • Part of CICC-MMTS • Improvement using stochastic-symbolic approach • Supervised and unsupervised learning • 15.000 sentences of annotated corpus (based on GDA tagset) • ISTAG (POS Tagger) • ISPARSE (Skeleton Parser)

  9. Universal Networking Language (UNL). - Deconverter & Enconverter System - UNL graph displayer System - UNL editor System - Indonesia Language Server : http:// unlserver . aia .bppt.go.id 12 UNL Project

  10. Other resources • Speech recognition system (Bandung Institute Technology) • Indonesian spelling checker for Microsoft Word (Gajah Mada University) • Computational lexicon research (National Language Center) • Computational morphology (Atmajaya University)

More Related