170 likes | 349 Views
KYOTO ( ICT - 211423) Y ielding O ntologies for T ransition-Based O rganization Intelligent Content and Semantics WordNet LMF Monica Monachini – CNR-ILC. Outline. Background: a KYOTO format for lexical resources WordNet-LMF The KYOTO Lexical Grid. KYOTO: the lexical resource perspective.
E N D
KYOTO (ICT-211423)Yielding Ontologies for Transition-Based Organization Intelligent Content and Semantics WordNetLMFMonica Monachini – CNR-ILC Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
Outline • Background: a KYOTO format for lexical resources • WordNet-LMF • The KYOTO Lexical Grid Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
KYOTO: the lexical resource perspective • KYOTO objectives • “ … facilitating the exchange of information across languages, domains and cultures” • “ … allow definition of word meaning in a shared Wiki platform” • from the point of view of linguistic resources … • needs to share lexical and knowledge bases, both general and domain-related, under the form of lexical repositories and ontologies Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
WnJP Wn IT WnNL WnEN WnES WnJP WnCH WnEU Wn IT WnNL WnEN WnES WnEU WnCH A common representation format for WordNets Seven WordNets • similar but not identical hampered interoperability • to be accessed both intra- and inter-linguistically need to support easier integration • endow WordNet with a representation format to allow easy access, integration and interoperability among resources Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
Standard and Interoperability: SoA To be achieved • Existing standards developed in isolation (not widely accepted) • Disagreement concerning theories/linguistic annotation • Lack of standard representation format(s)/framework(s) • Lack of accessibility Achievements • SubCommittee devoted to standards for linguistic annotation • Catalogues of linguistic categories and annotation schemas • Interest group (ACL) for developing standard annotation of language data • Efforts towards interlinked resources • Harmonized systems and frameworks • International conferences/workshops • EU-funded common resources and technology infrastructure; roadmap for achieving interoperability Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
LMF • Specifically designed to accommodate as many models of lexical representation as possible • Its pros: • Meta-model: a high-level specification ISO24613 • Data Category Registry: low-level specifications ISO12620 Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
Main Features • Not a monolithic model rather a modular framework • LMF library provides the hierarchy of lexical objects (with structural relations among them) • Data Category Registry provides a library of descriptors to encode linguistic information associated to lexical objects (N.B. Data Categories can be also user-defined) Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
Structural skeleton to represent the basic hierachy of a lexicon Components required to describe additional classes and relations Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
DCR Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
Centralized WordNet DC Registry A list of 85 sem.rels as a result of a mapping of the KYOTOWordNet grid Intra-WN Inter-WN Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
Principles of WordNet-LMF Balance between: • Maintain adherence to architectural principles of LMF • Main conceptual building blocks and structural relationships between them maintained • The expression of the linguistic info (synset relations) falls in the realm of DCs • Adapt standard LMF to suit efficiency needs • Promote feat-att structures to element attributes • Use of bracketing elements Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
Data Categories LexicalResource 1..* 0..1 1..1 GlobalInformation Lexicon SenseAxes 1..* 0..* 1..* 0..1 Meta Synset SenseAxis LexicalEntry 0..1 0..1 0..* 0..1 0..1 1..1 MonolingualExternalRefs InterlingualExternalRefs Lemma Sense Definition SynsetRelations 0..1 0..* 1..* 1..* 1..* MonolingualExternalRefs MonolingualExternalRef InterlingualExternalRef Statement SynsetRelation 0..1 0..1 0..1 1..* MonolingualExternalRef Meta Meta Meta 0..1 Meta Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009 Diagram of the WordNet-LMF format
<?xml version='1.0' encoding="UTF-8"?> <!ELEMENT LexicalResource (GlobalInformation, Lexicon+, SenseAxes?)> <!ELEMENT GlobalInformation EMPTY> <!ATTLIST GlobalInformation label CDATA #IMPLIED> <!ELEMENT Lexicon (LexicalEntry+, Synset*)> <!ATTLIST Lexicon languageCoding CDATA #FIXED "ISO 639-3" label CDATA #IMPLIED language CDATA #REQUIRED owner CDATA #REQUIRED version CDATA #REQUIRED> The triplets encodes the basic building blocks WN3.0 <footprint_1 footmark_1> 06645039-n <!ELEMENT LexicalEntry (Meta?, Lemma, Sense*)> <!ATTLIST LexicalEntry id ID #IMPLIED> <!ELEMENT Lemma EMPTY> <!ATTLIST Lemma writtenForm CDATA #IMPLIED partOfSpeech CDATA #REQUIRED> <!ELEMENT Sense (Meta?, MonolingualExternalRefs?)> <!ATTLIST Sense id ID #REQUIRED synset IDREF #REQUIRED> <!ELEMENT MonolingualExternalRefs (MonolingualExternalRef+)> <!ELEMENT MonolingualExternalRef (Meta?)> <!ATTLIST MonolingualExternalRef externalSystem CDATA #REQUIRED externalReference CDATA #REQUIRED relType (at|plus|equal) #IMPLIED> links a Sense to another resource WordNet-LMF administrative and core packagesRepesentation of synset variants Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
clusters together senses of different Lexical Entries WN3.0 <footprint_1 footmark_1> 06645039-n <!ELEMENT Synset (Meta?, Definition?, SynsetRelations, MonolingualExternalRefs)> <!ATTLIST Synset id ID #REQUIRED baseConcept (1|2|3) #REQUIRED> <!ELEMENT Definition (Statement*)> <!ATTLIST Definition gloss CDATA #REQUIRED> <!ELEMENT Statement EMPTY> <!ATTLIST Statement example CDATA #REQUIRED> <!ELEMENT SynsetRelations (SynsetRelation+)> <!ELEMENT SynsetRelation (Meta?)> <!ATTLIST SynsetRelation target IDREF #REQUIRED relType CDATA #REQUIRED> <!ELEMENT MonolingualExternalRefs (MonolingualExternalRef+)> <!ELEMENT MonolingualExternalRef (Meta?)> <!ATTLIST MonolingualExternalRef externalSystem CDATA #REQUIRED externalReference CDATA #REQUIRED relType (at|plus|equal) #IMPLIED> harmonized Kyoto Data categories WordNet-LMF semantic levelRepesentation of synset and synset relations represents the variuos relations holding between synsets Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009
<!ELEMENT SenseAxes (SenseAxis+)> <!ELEMENT SenseAxis (Meta?, Target+, InterlingualExternalRefs?)> <!ATTLIST SenseAxis id ID #REQUIRED relType CDATA #REQUIRED> <!ELEMENT Target EMPTY> <!ATTLIST Target ID CDATA #REQUIRED> <!ELEMENT InterlingualExternalRefs (InterlingualExternalRef+)> <!ELEMENT InterlingualExternalRef (Meta?)> <!ATTLIST InterlingualExternalRef externalSystem CDATA #REQUIRED externalReference CDATA #REQUIRED relType (at|plus|equal) #IMPLIED> IWN <fuoco_1, fiamma_1> 00001251-n SWN <fuego_3, llama_1> 09686541-n groups together monolingual synsets that correspond each other and share the same relations to English WN3.0 <fire_1 flame_1 flaming_1> 13480848-n specifies the type of correspondence link to ontology/(ies) Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009 WordNet-LMF multilingual levelRepesentation of cross-lingual synset relations
Kyoto Knowledge Base Domain WnJP Domain Domain WnIT WnNL Domain Ontology Ontology Domain Domain Ontology WnES WnEN Domain Domain WnEU WnCH Monica Monachini – 1° KYOTO review – Luxembourg 3/17/2009