750 likes | 771 Views
Computational Lexicons and the Semantic Web. Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica Computazionale - CNR. Tutorial Outline. Computational lexicons for the Semantic Web (SW) how they are how they should be
E N D
Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica Computazionale - CNR Bucharest, 30 July 2003
Tutorial Outline • Computational lexicons for the Semantic Web (SW) • how they are • how they should be • The SW for computational lexicons • lexicon design in the age of the SW • Training session • case study – lexical modelling in RDF/S Bucharest, 30 July 2003
Ontologies Knowledge Markup The Semantic Web Vision Turning the WWW into a machine understandable knowledge base Documents Intelligent Agents Semantic Web Databases Applications Bucharest, 30 July 2003
Six Challenges for the SW(Benjamins et al. 2002) • Content availability • Ontology availability • Multilinguality • Scalability • Visualization • Stability of SW languages Bucharest, 30 July 2003
Six Challenges for the SW(Benjamins et al. 2002) • Content availability • Ontology availability • Multilinguality • Scalability • Visualization • Stability of SW languages Human Language Technology (HLT) Bucharest, 30 July 2003
Lexical Information and HLT • All language analysis involves determining meaning at some level • Anything from groups of related words to a full-blown representation of each sentence Information retrieval bank…………… ………account ……………………… money………… John went to the store Topic = financial GO AGENT John TARGET store Bucharest, 30 July 2003
Computational Lexicons and HLT • Explicit representation of word meaning • word content accessible to computational agents • Word meaning linked to word syntax and morphology • Multilingual lexical links Computational lexicons provide machine understandable word knowledge Bucharest, 30 July 2003
Computational Lexicons and HLT • Contain the linguistic information required to build meaning representations Lexicon went vpast GO go v. (NP_SUBJ ((role AGENT) (sem +animate)) (VP ((verb GO) (PP ((prep TO) (NP ((role TARGET) (sem +loc))))) John n. sem : human store n. sem: loc Lexicon account n. domain [financial] account v. … bank_1 n. domain: [financial] bank_2 n. domain: [geography] money n. domain: [financial] bank…………… ………account ……………………… money………… John went to the store Topic = financial GO AGENT John TARGET store Bucharest, 30 July 2003
Computational Lexicons and HLT • Critical language resources for NLP systems • syntactic subcategorization frames for parsing • semantic selectional preferences for ambiguity reduction • semantic classes for WSD, semantic tagging, etc. • Key components of HLT • monolingual lexicons – IE, QA, etc. • multilingual lexicons – MT, CLIR, etc. Bucharest, 30 July 2003
Ontologies and Computational Lexicons Access to Content HLT Semantic Web Ontologies Computational Lexicons ? Bucharest, 30 July 2003
Ontologies • An ontology is a system ofconcepts relevant for knowledge and action in (a portion of) the world • categorization of objects and processes • inference • action planning • … “An ontology is a specification of a conceptualization” (Gruber 1993) Bucharest, 30 July 2003
Ontologies “A set of knowledge terms, including the vocabulary, the semantic interconnections, and some simple rule of inference and logic” (Hendler 2001) ARTIFACT OBJECT ANIMAL LOCATION ENTITY EVENT Bucharest, 30 July 2003
Types of Ontologies Vertical typology: Foundational Ontology OBJECT Domain Core Ontology SOFTWARE Domain Specific Ontology WORD_PROCESSOR Horizontal typology: • Information System ontology • AI ontology • Linguistic ontology Bucharest, 30 July 2003
Linguistic Ontology • A system of symbols representing the concepts (meanings) encoded by NL expressions (lexical units, terms, etc.) • specify semantic classes grouping semantically similar terms • semantic representation language • interlingua car, van, truck VEHICLE ARTIFACT OBJECT dog, cat, horse MAMMAL ANIMAL beach BEACH LOCATION ENTITY spiaggia piano concert, rock concert CONCERT EVENT Bucharest, 30 July 2003
Ontologies and Computational Lexicons Ontology Concept Space Semantics Syntax Multilinguality Morphology Language/s Computational Lexicon Bucharest, 30 July 2003
Computational Lexiconstipology • Monolingual vs. multilingual • General purpose vs. domain (application) specific • Content type • (Morpho)-Syntactic • Semantic • Mixed • Terminological Bucharest, 30 July 2003
Syntactic Computational Lexicons • Syntactic lexical information is distilled in subcategorization frames • ComLex, PAROLE, etc. • Syntactic frames typically include: • number of selected arguments • syntactic categories of their realizations (PP, NP, etc.) • lexical constraints on argument realization (e.g. preposition heading a PP) • argument functional role (Subj, Obj, etc.) • optionality, control, auxiliary selection, etc. hit [V: (Subj: NP) (Objd: NP)] answer [N: (Obji: PP_to)] Bucharest, 30 July 2003
Semantic Computational Lexicons • Representing the meaning of a word (minimally) requires • Distinguishing different senses of the word • E.g. bank : finacial institution vs. geographical configuration • Capturing inferences • E.g. being human implies being animate • Representing similarity of meaning with other words • E.g. bank, account, money all related to finances Bucharest, 30 July 2003
Semantic Computational Lexicons • Mikrokosmos (Nirenburg, Mahesh et al.) • WordNet (Miller, Fellbaum et al.) • EuroWordNet (Vossen et al.) • SIMPLE (Calzolari, Lenci et al.) • FrameNet (Fillmore et al.) Bucharest, 30 July 2003
Computational Lexiconsdesign issues • Network based • hierarchy (taxonomy) • WordNet • heterarchy • EuroWordNet • Frame based • Mikrokosmos • FrameNet • Hybrid • SIMPLE Bucharest, 30 July 2003
EuroWordNet Bucharest, 30 July 2003
EuroWordNetTop Ontology Bucharest, 30 July 2003
EuroWordNet Bucharest, 30 July 2003
PAROLE-SIMPLE Lexicons • 12 EU monolingual core lexicons built according to a harmonized model and further extended at the national level • Integrated combinations of syntactic and semantic information: • syntactic subcategorization frames • semantic type (“Ontology”) • semantic frames linked to syntax • semantic roles • selectional preferences • etc. • semantic relations • Pustejovsky’s “qualia roles”, etc. • regular polysemy • event structure Bucharest, 30 July 2003
Greek lexicon Italian lexicon Lexical Templates Ontology Catalan lexicon Language Independent Module SIMPLE Architecture Italian lexicon PAROLE Syntax SemU Semantic Frame (semantic roles, etc.) Semantic Relations Event Structure Polysemy etc. Bucharest, 30 July 2003
SIMPLEsemantic relations Top Telic Formal Constitutive Agentive Is_a Is_a_part_of Property Created_by Agentive_cause Indirect_telic Activity ... Contains ... Instrumental Is_the_habit_of Used_for Used_as Bucharest, 30 July 2003
SIMPLEsemantic network <fabbricare> make Ala(wing) Agentive SemU: 3232 Type: [Part] Part of an airplane Agentive <volare> fly Used_for Is_a_part_of <aeroplano> airplane Isa SemU: 3268 Type: [Part] Part of a building Isa <parte> part Used_for Isa SemU: D358 Type: [Body_part] Organ of birds for flying <edificio> building Is_a_part_of Is_a_part_of SemU: 3467 Type: [Role] Role in football <giocatore> player <uccello> bird Isa Bucharest, 30 July 2003
SIMPLEsemantic frames PREDemploy#1 Arg#1<AGENT - HUMAN> Arg#2<PATIENT - HUMAN> agent nominalization master link patient nominalization event nominalization SemU employee SemU employment SemU to employ SemU employer Bucharest, 30 July 2003
Comprensione N SemU: 61726 Type: [Cognitive_event] Understanding SIMPLEsemantic frames Comprendere V SemU: 61725 Type: [Cognitive_event] To understand SemU: 6962 Type: [Constitutive_state] To include PREDComprendere#1 <Arg1 [+human]>, <Arg2 [+semiotic]> PREDComprendere#2 <Arg1 [+Entity]>, <Arg2[Entity]> Bucharest, 30 July 2003
SIMPLEsemantic frames il difensore di Berlusconi (Berlusconi's defender) il difensore del Milan (the Milan fullback) Difensore N agent nominalization SemU: 4125 Type: [Role] Defender PREDDifendere#1 <Arg1>, <Arg2> SemU: 3526 Type: [Role] Fullback <squadra> team Is_a_member_of Bucharest, 30 July 2003
Semantic multidimensionality • Identification of the semantic contribution of an NP requires to access a rich representation of semantic content of the nominal heads • The “semantic structure” of the nominal head determines the semantic relation expressed by a modifying PP (in Italian): • la pagina del libro (the page of the book) • il difensore del Milan (the Juventus fullback) • il suonatore di liuto (the lute player) • il tavolo di legno (the wooden table) PART-OF MEMBER-OF TELIC MADE-OF Bucharest, 30 July 2003
SIMPLEsample entries semantic relations ontology semantic frame Bucharest, 30 July 2003
Computational Lexiconsloose ends • Non-compositional aspects in the lexicon • collocations, terms, MWEs, etc. • Integration between lexicons and corpus data • lexical tuning, data-driven lexicon population, etc. • Semantic dynamics (polysemy, lexical creativity, etc.) • “context-sensitivity” of meaning as a challenge for lexical semantics • sense enumeration vs. sense generation • heavy smoker, heavy book, heavy road, heavy sea, heavy wine, heavy sky, heavy artillery, etc. Bucharest, 30 July 2003
Computational Lexiconsloose ends • Semantic type system for lexical senses must account for a non-static kaleidoscope of senses • Salience of aspects of meaning differ for different types • natural kindsIs-a; artifacts function • Possible solutions: • multiple layers of representation • explicit identification of information so that NLP systems can access what is needed at a given time • “dynamic type systems” Bucharest, 30 July 2003
Computational Lexiconsnew challenges from the SW • From language resources for HLT to knowledge resources for inferential engines • in-depth lexical description for better content understanding • Content interoperability between computational lexicons • better integration between lexical information from different sources • Beyond the lexical information bottleneck • automatic lexical knowledge acquisition Bucharest, 30 July 2003
Lexical Inferences “Midfielder Scott Sellars was sold to Blackburn for $35,000 and was bought back in the summer for $750,000.” (FrameNet Corpus) after e1: OWN (buyer, goods) NOT(OWN (buyer, money)) after e2: NOT(OWN (seller, goods)) OWN (seller, money) e1 < e2 TIME e2 = SUMMER Bucharest, 30 July 2003
Hot Topics To provide SW agents with high inferential capacities in accessing linguistic content • In-depth lexical analysis • e.g. X buys Y from Z at t ==> Z owns Y before t & X owns Y after t • Key issues at the lexicon-grammar interface • predicate event structure • states, processes, accomplishments, etc. • temporal adverbs and temporal expressions • e.g. in three years, etc. • quantificational expressions etc. • syntax-semantics argument linking Bucharest, 30 July 2003
Computational Lexicons and the Semantic Web Part 2 Lexicon Design in the Age of the Semantic Web Bucharest, 30 July 2003
Lexicons of the Future • General purpose • portable over different domains • Multilingual • relations among lexical entities in different languages • Flexible and extensible • enable use of information at appropriate granularity for the application • enable continual extension : “dynamic” • Integrated with Web technology • content interoperability Bucharest, 30 July 2003
Lexical Content Interoperability The Lexical Web Enable universal access to lexical information FrameNet SIMPLE WordNet EuroWordNet Intelligent Agents Bucharest, 30 July 2003
Some Requirements for Lexical Content Interoperability • Compatibility between different models of lexical analysis • relational semantic models (e.g. WordNet) • Syntactic and semantic frames • … • Compatibility between different degrees of lexical specification • deep lexical representations (e.g. PAROLE-SIMPLE) • shallow semantic descriptions • Compatibility between different paradigms of multilinguality • lexicons for transfer-based MT • interlingua-based lexicons • … Bucharest, 30 July 2003
The Need for Standards • To represent common information … …while keeping flexibility • To enhance the sharing and reusability of multilingual lexical resources • To establish an open environment for the development and integration of multilingual resources • Information must be consistent with related technologies in order to take advantage of them • XML, RDF/S, etc. Bucharest, 30 July 2003
Computational Lexicon Working Group (CLWG) International Standards for Language Engineering Definition of standards for multilingual computational lexicons both at the content and at the representational level Bucharest, 30 July 2003
PAROLE-SIMPLE Lexicons Multilingual Lexicons (EuroWordNet, etc.) ISLE EAGLES guidelines for syntactic and semantic lexicons GENELEX Model MILE Lexical Model Bucharest, 30 July 2003
The MILE Lexical Model • A general architecture to foster the content interoperability between multilingual computational lexicons • Key issues: • Modularity • User-adaptability • Resource sharing • Reusability SW technologies and standards applied at lexicon modelling Bucharest, 30 July 2003
The MILE Lexical Model (MLM) • The MLM core is the Multilingual ISLE Lexical Entry (MILE) • a general schemafor multilingual lexical resources • a lexical meta-entry as a common representational layer for multilingual lexicons • Computational lexicons can be viewed as different instances of the MILE schema MILE Lexical Model lexicon#1 lexicon#2 lexicon#3 Bucharest, 30 July 2003
MILEthe building-block model • The MILE architecture is designed according to the building-block model: • Lexical entries are obtained by combining various types of lexical objects (atomic and complex) • Users design their lexicon by: • selecting and/or specifying the relevant lexical objects • combine the lexical objects into lexical entries • Lexical objects may be shared: • within the same lexicon (intra-lexicon reusability) • among different lexicons (inter-lexicon reusability) Bucharest, 30 July 2003
Lexical entry 1 Lexical entry 2 Lexical entry 3 Lexical Objects Sem feature syntactic frame slot Syn feature phrase MILEthe building-block model Bucharest, 30 July 2003
semantic layer linking conditions syntactic layer morphological layer mono-Mile mono-Mile Modularity in MILE multi-MILE multilingual correspondence conditions multiple levels of modularity Bucharest, 30 July 2003
Each monolingual layer within Mono-MILE identifies a basicunit of lexical description The Mono-MILE SemU basic unit to describe the semantic properties of the MU semantic layer basic unit to describe the syntactic behavior of the MU SynU syntactic layer basic unit to describe the inflectional and derivational morphological properties of the word MU morphological layer Bucharest, 30 July 2003