80 likes | 95 Views
Develop a language-neutral system for multilingual named entity recognition, integrate NE recognisers, and enable cross-lingual name matching. Enhance system for new product types.
E N D
WP2: Multilingual NER and matching • Start date: month 1 • End date: month 26 • Coordinator: Edinburgh CROSSMARC kick-off meeting: WP2
Objectives • Develop an open, well-defined and language-neutral architecture for multilingual named entity recognition and matching. • Integrate existing NE recognisers for the project’s four languages according to the above architecture. • Specify a methodology for cross-lingual name matching; incorporate name matching into the project’s NE recogniser for the four languages. • Develop techniques to adapt NER and name matching to new product types – extend system to second product type. CROSSMARC kick-off meeting: WP2
D2.1Month 6 Architecture of named entity recognition D2.2Month 12 Version 1 of named entity recognition D2.3Month 18 Version 2 of named entity recognition D2.4Month 26 Version 3 of named entity recognition CROSSMARC kick-off meeting: WP2
D2.1 Architecture of named entity recognition • Starting from existing monolingual architectures, CROSSMARC will develop an open and clearly defined language-neutral architecture, which will allow named entity recognisers for different languages to be integrated seamlessly in a multilingual system. • Underlying techniques of the different named entity recognisers may be different |(hand-crafted, entirely probabilistic, hybrid). • It will provide a clear path for others to add recognisers for other languages. • This architecture will be extended further in subsequent work packages, becoming CROSSMARC’s overall architecture. CROSSMARC kick-off meeting: WP2
Overall System Architecture • Browser-based user interface: user enters search requirements, these are matched against stored ‘facts’, results are presented back to the user. • Shopping agents (shopbots) which access both static and dynamically generated pages. • NER in 4 languages on results from shopping agents, multilingual name matching and fact extraction - stored ‘facts’. CROSSMARC kick-off meeting: WP2
NER: input and output • Format for output from shopbots: HTML or will there be a module that ‘regularises’ the pages that are found? XML? • Format for input to NE recognisers: XML with a common DTD? An XML attribute to indicate which language so it gets routed correctly? • Format for output from NE recognisers = input to fact extraction. XML? CROSSMARC kick-off meeting: WP2
NER: input and output shopbots Language independent regulariser? XML Multilingal NER and Name Matching Multilingual and multimedia fact extraction XML English NER & NM French NER & NM Greek NER & NM Italian NER & NM CROSSMARC kick-off meeting: WP2
NER: input and output shopbots Multilingal NER and Name Matching Multilingual and multimedia fact extraction XML English NER & NM French NER & NM Greek NER & NM Italian NER & NM CROSSMARC kick-off meeting: WP2