1 / 14

NLP platform for EU-LINGUAL DIGITAL SINGLE MARKET

Alexander Rylov. NLP platform for EU-LINGUAL DIGITAL SINGLE MARKET. LTi Summit 2013. Market fragmentation. By domains By languages. WHY should LT vendors share their resources?. Many of LT vendors have their own LT LTs are focused on particular domain/language(s)

amos-noel
Download Presentation

NLP platform for EU-LINGUAL DIGITAL SINGLE MARKET

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alexander Rylov NLP platformfor EU-LINGUALDIGITAL SINGLE MARKET LTi Summit 2013 Confidential 

  2. Market fragmentation By domains By languages Confidential 

  3. WHY should LT vendors share their resources? • Many of LT vendors have their own LT • LTs are focused on particular domain/language(s) • Resources are critical for enabling such technologies • If case of share vendors may loose competitive advantage Confidential 

  4. Technologies ability and restrictions • Language specific = language centric = limited by language • Difficulties - Controlled links • Anaphora • Long distance links • Ellipsis • Ontology, dictionaries, statistic = trained on limited set of data = covers only limited variety of meaning representations = sometimes good to achieve 40% of recall (NER US DoD track) Confidential 

  5. WHAT IS BigData… • Multilingual • Covers more than 1 domain • 85 – 90% is in unstructured text documents • Language expression of the same meaning vary by uncountable number of ways Confidential 

  6. A fundamental Natural language technologyrequired Scalable by domains and languages Confidential 

  7. ABBYY Compreno as proposal • Interlingua approach: • semantic model is based on universal language independent  representation both for lexis and grammar • Working Languages: • Russian, English:   at the stage of terminological and collocation expansion • German: full prototype (lexis, syntax) is completed; at the stage of main lexis expansion (from core to periphery) • French: full prototype is completed (tested on controlled MT task) ; • Chinese: lexical system prototype is completed (challenged task never carried out before); • It is proved that Compreno is a scalable technology to use for any language Confidential 

  8. Complete syntactic and semantic analysis The bank was located at the bank of the river; it was closed. The complete analysis helps overcome linguistic problems in the text, if any..

  9. Compreno current achievements Confidential 

  10. Applications • BigData analytics – analysis of facts, extraction of objects • Intelligence, eDiscovery (any kind) • Search by meaning rather than by concepts • Dialogues systems by natural language • Translation Confidential 

  11. Few facts about Compreno • 18 years of development • About 350 people involved • More than 2000 man-years Confidential 

  12. Barriers for wide implementation • At least 3 years per language • At least 30 linguists per language • At least 12M € per language • Then support and improvement Confidential 

  13. EU project idea • Describe ALL EU languages • Describe Major domains: healthcare, law, government, major industries • ABBYY commitment: • Methodology, management, instruments Confidential 

  14. EU benefits – create Single Digital LT Market • Operate not with language but with universal model of it – interlingual approach • Describe one domain in one language – apply in all other languages • A platform for LT vendors to create solutions and products easy scalable by languages and domains Confidential 

More Related