220 likes | 255 Views
The MOSES project. SPINN PhD course PISA, September 2004 Patrizia Paggio . Outline. Project goals and organisation Requirements: the questions Monolingual question answering Federated scenario Mapping ontologies Practical assignment. MOSES – the goals.
E N D
The MOSES project SPINN PhD course PISA, September 2004 Patrizia Paggio
Outline • Project goals and organisation • Requirements: the questions • Monolingual question answering • Federated scenario • Mapping ontologies • Practical assignment Pisa Sep 2004
MOSES – the goals • To develop a knowledge grid in which: • the content of WEB pages can be managed in a modular and scalable way, and • queries can be posed in natural language to extract relevant content based on the underlying ontologies. • support for cross-lingual querying is provided Pisa Sep 2004
MOSES – the testbed • The project will produce a demonstrator to search university sites: • University of Copenhagen • University of Roma III • Scenario: researcher or student asking questions to a federation of university sites. Pisa Sep 2004
MOSES – the consortium Pisa Sep 2004
Requirements: the questions • Questions identified by project users in IT/DA. • Site-specific vs. federated questions: • Hvad hedder dekanen på det humanistiske fakultet på KU? • På hvilke universiteter kan man læse tysk filologi? • Finding answer requires more or less complex navigation. • Application domains: people/courses/research. Pisa Sep 2004
Syntactic variation • Questions • Ja-nej Underviser Nina Grønnum i efteråret? • Wh-pron as det Hvilke arrangementer afholdes inden for …? • Wh-pron as NP Hvornår kan man træffe Nina Grønnum? • Relative clauses Hvad er navnet på de værker der er blevet publiceret af …? • Active/Passive Hvilke kurser afholder Grønnum/afholdes på IAAS? • ‘Der’ sentences Hvilke emner forskes der på engelsk institut ? Pisa Sep 2004
Topics and associations • Point of departure: university ontology from DAML+OIL ontology library. • An Italian and a Danish version defined in the Topic Maps formalism, instances from sites added. • Danish ontology: • 200 concepts (topics) • 50 relations (associations) • Can be shown in Protegé. Pisa Sep 2004
Query Matcher Query Analyzer Q? A! Answer Generator Knowledge structure www site Knowledge Feeder Lexical KB Domain Ontology Single node architecture
Multilingual querying Italian and in Danish Each linguistic sub-system analyses questions in the relevant language and produces a semantic representation to be input to the content matcher. Matching performed between semantic representations of questions and topic maps. An answer is generated in the language of the query. Pisa Sep 2004
NL User Query Preprocessing module Tokeniser(STO) Tokeniser lex Named Entity Recogniser Danish grm NER lex POS-tagger POS lex Lemmatiser Lemmatiser lex Parsing module Parser User Query Analysis Analysis in Danish Architecture
Semantic analysis • Input question: Hvem underviser i filmhistorie? • (Who teaches film history?) • Output feature structure: [ FOCUS-CONST #1: lærerstab, COUNT all, LOA <teacherOf[ TEACHER #1, COURSE #2:kursus ], hasSubject [ WORK #2, SUBJECT emne, [NAME “filmhistorie” ]]>] Pisa Sep 2004
From language to ontology different words same content • Lexicon: words mapped onto concepts or relations: • ContentSyntax • TeacherOf & undervise/undervisning/… (teach) • [ TEACHER #arg1 [ARG-ST • COURSE #Course] < #arg1 • StudyWorkSubject & • [ WORK #Course #arg2> • SUBJECT #arg2] ] Pisa Sep 2004
Connecting a wordnet with the ontology Linguistic KB Domain Concept Hierarchy person {person, individual, someone, somebody, mortal, human, soul} {worker} {adult, grownup} student employee {employee} {professional} faculty {educator, pedagague} professor {accademic, faculty member} {professor} {teacher, instructor}
K2 Kn Q A K1 Kn+1 The federated scenario
Federated querying • The analysis produced by one of the nodes must be passed onto the others. • However, not all nodes have the exact same conceptualisation of the domain! Pisa Sep 2004
Lærerstab (Faculty) DK Professorat (Professorship) Lektor (Associate Professor) Adjunkt (Assistant Professor) … Professor (FullProfessor) GæsteProfessor (GuestProfessor) Faculty IT Professore (Tenured Professor) TitolareCorso (Teaching Assistant) Ricercatore (Research Assistant) … Ordinario (FullProfessor) ProfessoreAssociato (Associated Professor) Ontology mismatches b Query: hvilke lektorer er ansatte på Roma III?
What to do? • Define a mapping between the two ontologies based on equivalence or similarity between concepts and relations. • This can be done manually, or algorithms based on structural measures (distance, density) and label similarity can be used. • In a cross-lingual environment, bilingual dictionaries can be used to translate between concept labels. Pisa Sep 2004
Manual mapping • It can be used • as a static resource by the system • as the basis on which to evaluate mapping algorithms. • In MOSES, we are defining a manual mapping and experimenting with algorithms. • Examples of class and relation mapping. Pisa Sep 2004
Assignment • Based on the two university ontologies defined in the first assignment: • find examples of federated questions supported by the two ontologies • explain how a question answering system can find the answer in both ontologies by defining necessary mappings of type: • input-word ontology I ontology II Pisa Sep 2004
Acknowledgements • The following people have contributed to the MOSES project at CST: • Dorte H. Hansen, Lina Henriksen • and Lene Offersgaard. Pisa Sep 2004