1 / 8

WP2: Multilingual NER and matching

Develop a language-neutral system for multilingual named entity recognition, integrate NE recognisers, and enable cross-lingual name matching. Enhance system for new product types.

kanem
Download Presentation

WP2: Multilingual NER and matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP2: Multilingual NER and matching • Start date: month 1 • End date: month 26 • Coordinator: Edinburgh CROSSMARC kick-off meeting: WP2

  2. Objectives • Develop an open, well-defined and language-neutral architecture for multilingual named entity recognition and matching. • Integrate existing NE recognisers for the project’s four languages according to the above architecture. • Specify a methodology for cross-lingual name matching; incorporate name matching into the project’s NE recogniser for the four languages. • Develop techniques to adapt NER and name matching to new product types – extend system to second product type. CROSSMARC kick-off meeting: WP2

  3. D2.1Month 6 Architecture of named entity recognition D2.2Month 12 Version 1 of named entity recognition D2.3Month 18 Version 2 of named entity recognition D2.4Month 26 Version 3 of named entity recognition CROSSMARC kick-off meeting: WP2

  4. D2.1 Architecture of named entity recognition • Starting from existing monolingual architectures, CROSSMARC will develop an open and clearly defined language-neutral architecture, which will allow named entity recognisers for different languages to be integrated seamlessly in a multilingual system. • Underlying techniques of the different named entity recognisers may be different |(hand-crafted, entirely probabilistic, hybrid). • It will provide a clear path for others to add recognisers for other languages. • This architecture will be extended further in subsequent work packages, becoming CROSSMARC’s overall architecture. CROSSMARC kick-off meeting: WP2

  5. Overall System Architecture • Browser-based user interface: user enters search requirements, these are matched against stored ‘facts’, results are presented back to the user. • Shopping agents (shopbots) which access both static and dynamically generated pages. • NER in 4 languages on results from shopping agents, multilingual name matching and fact extraction - stored ‘facts’. CROSSMARC kick-off meeting: WP2

  6. NER: input and output • Format for output from shopbots: HTML or will there be a module that ‘regularises’ the pages that are found? XML? • Format for input to NE recognisers: XML with a common DTD? An XML attribute to indicate which language so it gets routed correctly? • Format for output from NE recognisers = input to fact extraction. XML? CROSSMARC kick-off meeting: WP2

  7. NER: input and output shopbots Language independent regulariser? XML Multilingal NER and Name Matching Multilingual and multimedia fact extraction XML English NER & NM French NER & NM Greek NER & NM Italian NER & NM CROSSMARC kick-off meeting: WP2

  8. NER: input and output shopbots Multilingal NER and Name Matching Multilingual and multimedia fact extraction XML English NER & NM French NER & NM Greek NER & NM Italian NER & NM CROSSMARC kick-off meeting: WP2

More Related