140 likes | 259 Views
Alexander Rylov. NLP platform for EU-LINGUAL DIGITAL SINGLE MARKET. LTi Summit 2013. Market fragmentation. By domains By languages. WHY should LT vendors share their resources?. Many of LT vendors have their own LT LTs are focused on particular domain/language(s)
E N D
Alexander Rylov NLP platformfor EU-LINGUALDIGITAL SINGLE MARKET LTi Summit 2013 Confidential
Market fragmentation By domains By languages Confidential
WHY should LT vendors share their resources? • Many of LT vendors have their own LT • LTs are focused on particular domain/language(s) • Resources are critical for enabling such technologies • If case of share vendors may loose competitive advantage Confidential
Technologies ability and restrictions • Language specific = language centric = limited by language • Difficulties - Controlled links • Anaphora • Long distance links • Ellipsis • Ontology, dictionaries, statistic = trained on limited set of data = covers only limited variety of meaning representations = sometimes good to achieve 40% of recall (NER US DoD track) Confidential
WHAT IS BigData… • Multilingual • Covers more than 1 domain • 85 – 90% is in unstructured text documents • Language expression of the same meaning vary by uncountable number of ways Confidential
A fundamental Natural language technologyrequired Scalable by domains and languages Confidential
ABBYY Compreno as proposal • Interlingua approach: • semantic model is based on universal language independent representation both for lexis and grammar • Working Languages: • Russian, English: at the stage of terminological and collocation expansion • German: full prototype (lexis, syntax) is completed; at the stage of main lexis expansion (from core to periphery) • French: full prototype is completed (tested on controlled MT task) ; • Chinese: lexical system prototype is completed (challenged task never carried out before); • It is proved that Compreno is a scalable technology to use for any language Confidential
Complete syntactic and semantic analysis The bank was located at the bank of the river; it was closed. The complete analysis helps overcome linguistic problems in the text, if any..
Compreno current achievements Confidential
Applications • BigData analytics – analysis of facts, extraction of objects • Intelligence, eDiscovery (any kind) • Search by meaning rather than by concepts • Dialogues systems by natural language • Translation Confidential
Few facts about Compreno • 18 years of development • About 350 people involved • More than 2000 man-years Confidential
Barriers for wide implementation • At least 3 years per language • At least 30 linguists per language • At least 12M € per language • Then support and improvement Confidential
EU project idea • Describe ALL EU languages • Describe Major domains: healthcare, law, government, major industries • ABBYY commitment: • Methodology, management, instruments Confidential
EU benefits – create Single Digital LT Market • Operate not with language but with universal model of it – interlingual approach • Describe one domain in one language – apply in all other languages • A platform for LT vendors to create solutions and products easy scalable by languages and domains Confidential