1 / 75

Computational Lexicons and the Semantic Web

Computational Lexicons and the Semantic Web. Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica Computazionale - CNR. Tutorial Outline. Computational lexicons for the Semantic Web (SW) how they are how they should be

jeromev
Download Presentation

Computational Lexicons and the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica Computazionale - CNR Bucharest, 30 July 2003

  2. Tutorial Outline • Computational lexicons for the Semantic Web (SW) • how they are • how they should be • The SW for computational lexicons • lexicon design in the age of the SW • Training session • case study – lexical modelling in RDF/S Bucharest, 30 July 2003

  3. Ontologies Knowledge Markup The Semantic Web Vision Turning the WWW into a machine understandable knowledge base Documents Intelligent Agents Semantic Web Databases Applications Bucharest, 30 July 2003

  4. Six Challenges for the SW(Benjamins et al. 2002) • Content availability • Ontology availability • Multilinguality • Scalability • Visualization • Stability of SW languages Bucharest, 30 July 2003

  5. Six Challenges for the SW(Benjamins et al. 2002) • Content availability • Ontology availability • Multilinguality • Scalability • Visualization • Stability of SW languages Human Language Technology (HLT) Bucharest, 30 July 2003

  6. Lexical Information and HLT • All language analysis involves determining meaning at some level • Anything from groups of related words to a full-blown representation of each sentence Information retrieval bank…………… ………account ……………………… money………… John went to the store Topic = financial GO AGENT John TARGET store Bucharest, 30 July 2003

  7. Computational Lexicons and HLT • Explicit representation of word meaning • word content accessible to computational agents • Word meaning linked to word syntax and morphology • Multilingual lexical links Computational lexicons provide machine understandable word knowledge Bucharest, 30 July 2003

  8. Computational Lexicons and HLT • Contain the linguistic information required to build meaning representations Lexicon went vpast GO go v. (NP_SUBJ ((role AGENT) (sem +animate)) (VP ((verb GO) (PP ((prep TO) (NP ((role TARGET) (sem +loc))))) John n. sem : human store n. sem: loc Lexicon account n. domain [financial] account v. … bank_1 n. domain: [financial] bank_2 n. domain: [geography] money n. domain: [financial] bank…………… ………account ……………………… money………… John went to the store Topic = financial GO AGENT John TARGET store Bucharest, 30 July 2003

  9. Computational Lexicons and HLT • Critical language resources for NLP systems • syntactic subcategorization frames for parsing • semantic selectional preferences for ambiguity reduction • semantic classes for WSD, semantic tagging, etc. • Key components of HLT • monolingual lexicons – IE, QA, etc. • multilingual lexicons – MT, CLIR, etc. Bucharest, 30 July 2003

  10. Ontologies and Computational Lexicons Access to Content HLT Semantic Web Ontologies Computational Lexicons ? Bucharest, 30 July 2003

  11. Ontologies • An ontology is a system ofconcepts relevant for knowledge and action in (a portion of) the world • categorization of objects and processes • inference • action planning • … “An ontology is a specification of a conceptualization” (Gruber 1993) Bucharest, 30 July 2003

  12. Ontologies “A set of knowledge terms, including the vocabulary, the semantic interconnections, and some simple rule of inference and logic” (Hendler 2001) ARTIFACT OBJECT ANIMAL LOCATION ENTITY EVENT Bucharest, 30 July 2003

  13. Types of Ontologies Vertical typology: Foundational Ontology OBJECT Domain Core Ontology SOFTWARE Domain Specific Ontology WORD_PROCESSOR Horizontal typology: • Information System ontology • AI ontology • Linguistic ontology Bucharest, 30 July 2003

  14. Linguistic Ontology • A system of symbols representing the concepts (meanings) encoded by NL expressions (lexical units, terms, etc.) • specify semantic classes grouping semantically similar terms • semantic representation language • interlingua car, van, truck VEHICLE ARTIFACT OBJECT dog, cat, horse MAMMAL ANIMAL beach BEACH LOCATION ENTITY spiaggia piano concert, rock concert CONCERT EVENT Bucharest, 30 July 2003

  15. Ontologies and Computational Lexicons Ontology Concept Space Semantics Syntax Multilinguality Morphology Language/s Computational Lexicon Bucharest, 30 July 2003

  16. Computational Lexiconstipology • Monolingual vs. multilingual • General purpose vs. domain (application) specific • Content type • (Morpho)-Syntactic • Semantic • Mixed • Terminological Bucharest, 30 July 2003

  17. Syntactic Computational Lexicons • Syntactic lexical information is distilled in subcategorization frames • ComLex, PAROLE, etc. • Syntactic frames typically include: • number of selected arguments • syntactic categories of their realizations (PP, NP, etc.) • lexical constraints on argument realization (e.g. preposition heading a PP) • argument functional role (Subj, Obj, etc.) • optionality, control, auxiliary selection, etc. hit [V: (Subj: NP) (Objd: NP)] answer [N: (Obji: PP_to)] Bucharest, 30 July 2003

  18. Semantic Computational Lexicons • Representing the meaning of a word (minimally) requires • Distinguishing different senses of the word • E.g. bank : finacial institution vs. geographical configuration • Capturing inferences • E.g. being human implies being animate • Representing similarity of meaning with other words • E.g. bank, account, money all related to finances Bucharest, 30 July 2003

  19. Semantic Computational Lexicons • Mikrokosmos (Nirenburg, Mahesh et al.) • WordNet (Miller, Fellbaum et al.) • EuroWordNet (Vossen et al.) • SIMPLE (Calzolari, Lenci et al.) • FrameNet (Fillmore et al.) Bucharest, 30 July 2003

  20. Computational Lexiconsdesign issues • Network based • hierarchy (taxonomy) • WordNet • heterarchy • EuroWordNet • Frame based • Mikrokosmos • FrameNet • Hybrid • SIMPLE Bucharest, 30 July 2003

  21. EuroWordNet Bucharest, 30 July 2003

  22. EuroWordNetTop Ontology Bucharest, 30 July 2003

  23. EuroWordNet Bucharest, 30 July 2003

  24. PAROLE-SIMPLE Lexicons • 12 EU monolingual core lexicons built according to a harmonized model and further extended at the national level • Integrated combinations of syntactic and semantic information: • syntactic subcategorization frames • semantic type (“Ontology”) • semantic frames linked to syntax • semantic roles • selectional preferences • etc. • semantic relations • Pustejovsky’s “qualia roles”, etc. • regular polysemy • event structure Bucharest, 30 July 2003

  25. Greek lexicon Italian lexicon Lexical Templates Ontology Catalan lexicon Language Independent Module SIMPLE Architecture Italian lexicon PAROLE Syntax SemU Semantic Frame (semantic roles, etc.) Semantic Relations Event Structure Polysemy etc. Bucharest, 30 July 2003

  26. SIMPLEsemantic relations Top Telic Formal Constitutive Agentive Is_a Is_a_part_of Property Created_by Agentive_cause Indirect_telic Activity ... Contains ... Instrumental Is_the_habit_of Used_for Used_as Bucharest, 30 July 2003

  27. SIMPLEsemantic network <fabbricare> make Ala(wing) Agentive SemU: 3232 Type: [Part] Part of an airplane Agentive <volare> fly Used_for Is_a_part_of <aeroplano> airplane Isa SemU: 3268 Type: [Part] Part of a building Isa <parte> part Used_for Isa SemU: D358 Type: [Body_part] Organ of birds for flying <edificio> building Is_a_part_of Is_a_part_of SemU: 3467 Type: [Role] Role in football <giocatore> player <uccello> bird Isa Bucharest, 30 July 2003

  28. SIMPLEsemantic frames PREDemploy#1 Arg#1<AGENT - HUMAN> Arg#2<PATIENT - HUMAN> agent nominalization master link patient nominalization event nominalization SemU employee SemU employment SemU to employ SemU employer Bucharest, 30 July 2003

  29. Comprensione N SemU: 61726 Type: [Cognitive_event] Understanding SIMPLEsemantic frames Comprendere V SemU: 61725 Type: [Cognitive_event] To understand SemU: 6962 Type: [Constitutive_state] To include PREDComprendere#1 <Arg1 [+human]>, <Arg2 [+semiotic]> PREDComprendere#2 <Arg1 [+Entity]>, <Arg2[Entity]> Bucharest, 30 July 2003

  30. SIMPLEsemantic frames il difensore di Berlusconi (Berlusconi's defender) il difensore del Milan (the Milan fullback) Difensore N agent nominalization SemU: 4125 Type: [Role] Defender PREDDifendere#1 <Arg1>, <Arg2> SemU: 3526 Type: [Role] Fullback <squadra> team Is_a_member_of Bucharest, 30 July 2003

  31. Semantic multidimensionality • Identification of the semantic contribution of an NP requires to access a rich representation of semantic content of the nominal heads • The “semantic structure” of the nominal head determines the semantic relation expressed by a modifying PP (in Italian): • la pagina del libro (the page of the book) • il difensore del Milan (the Juventus fullback) • il suonatore di liuto (the lute player) • il tavolo di legno (the wooden table) PART-OF MEMBER-OF TELIC MADE-OF Bucharest, 30 July 2003

  32. SIMPLEsample entries semantic relations ontology semantic frame Bucharest, 30 July 2003

  33. Computational Lexiconsloose ends • Non-compositional aspects in the lexicon • collocations, terms, MWEs, etc. • Integration between lexicons and corpus data • lexical tuning, data-driven lexicon population, etc. • Semantic dynamics (polysemy, lexical creativity, etc.) • “context-sensitivity” of meaning as a challenge for lexical semantics • sense enumeration vs. sense generation • heavy smoker, heavy book, heavy road, heavy sea, heavy wine, heavy sky, heavy artillery, etc. Bucharest, 30 July 2003

  34. Computational Lexiconsloose ends • Semantic type system for lexical senses must account for a non-static kaleidoscope of senses • Salience of aspects of meaning differ for different types • natural kindsIs-a; artifacts function • Possible solutions: • multiple layers of representation • explicit identification of information so that NLP systems can access what is needed at a given time • “dynamic type systems” Bucharest, 30 July 2003

  35. Computational Lexiconsnew challenges from the SW • From language resources for HLT to knowledge resources for inferential engines • in-depth lexical description for better content understanding • Content interoperability between computational lexicons • better integration between lexical information from different sources • Beyond the lexical information bottleneck • automatic lexical knowledge acquisition Bucharest, 30 July 2003

  36. Lexical Inferences “Midfielder Scott Sellars was sold to Blackburn for $35,000 and was bought back in the summer for $750,000.” (FrameNet Corpus) after e1: OWN (buyer, goods) NOT(OWN (buyer, money)) after e2: NOT(OWN (seller, goods)) OWN (seller, money) e1 < e2 TIME e2 = SUMMER Bucharest, 30 July 2003

  37. Hot Topics To provide SW agents with high inferential capacities in accessing linguistic content • In-depth lexical analysis • e.g. X buys Y from Z at t ==> Z owns Y before t & X owns Y after t • Key issues at the lexicon-grammar interface • predicate event structure • states, processes, accomplishments, etc. • temporal adverbs and temporal expressions • e.g. in three years, etc. • quantificational expressions etc. • syntax-semantics argument linking Bucharest, 30 July 2003

  38. Computational Lexicons and the Semantic Web Part 2 Lexicon Design in the Age of the Semantic Web Bucharest, 30 July 2003

  39. Lexicons of the Future • General purpose • portable over different domains • Multilingual • relations among lexical entities in different languages • Flexible and extensible • enable use of information at appropriate granularity for the application • enable continual extension : “dynamic” • Integrated with Web technology • content interoperability Bucharest, 30 July 2003

  40. Lexical Content Interoperability The Lexical Web Enable universal access to lexical information FrameNet SIMPLE WordNet EuroWordNet Intelligent Agents Bucharest, 30 July 2003

  41. Some Requirements for Lexical Content Interoperability • Compatibility between different models of lexical analysis • relational semantic models (e.g. WordNet) • Syntactic and semantic frames • … • Compatibility between different degrees of lexical specification • deep lexical representations (e.g. PAROLE-SIMPLE) • shallow semantic descriptions • Compatibility between different paradigms of multilinguality • lexicons for transfer-based MT • interlingua-based lexicons • … Bucharest, 30 July 2003

  42. The Need for Standards • To represent common information … …while keeping flexibility • To enhance the sharing and reusability of multilingual lexical resources • To establish an open environment for the development and integration of multilingual resources • Information must be consistent with related technologies in order to take advantage of them • XML, RDF/S, etc. Bucharest, 30 July 2003

  43. Computational Lexicon Working Group (CLWG) International Standards for Language Engineering Definition of standards for multilingual computational lexicons both at the content and at the representational level Bucharest, 30 July 2003

  44. PAROLE-SIMPLE Lexicons Multilingual Lexicons (EuroWordNet, etc.) ISLE EAGLES guidelines for syntactic and semantic lexicons GENELEX Model MILE Lexical Model Bucharest, 30 July 2003

  45. The MILE Lexical Model • A general architecture to foster the content interoperability between multilingual computational lexicons • Key issues: • Modularity • User-adaptability • Resource sharing • Reusability SW technologies and standards applied at lexicon modelling Bucharest, 30 July 2003

  46. The MILE Lexical Model (MLM) • The MLM core is the Multilingual ISLE Lexical Entry (MILE) • a general schemafor multilingual lexical resources • a lexical meta-entry as a common representational layer for multilingual lexicons • Computational lexicons can be viewed as different instances of the MILE schema MILE Lexical Model lexicon#1 lexicon#2 lexicon#3 Bucharest, 30 July 2003

  47. MILEthe building-block model • The MILE architecture is designed according to the building-block model: • Lexical entries are obtained by combining various types of lexical objects (atomic and complex) • Users design their lexicon by: • selecting and/or specifying the relevant lexical objects • combine the lexical objects into lexical entries • Lexical objects may be shared: • within the same lexicon (intra-lexicon reusability) • among different lexicons (inter-lexicon reusability) Bucharest, 30 July 2003

  48. Lexical entry 1 Lexical entry 2 Lexical entry 3 Lexical Objects Sem feature syntactic frame slot Syn feature phrase MILEthe building-block model Bucharest, 30 July 2003

  49. semantic layer linking conditions syntactic layer morphological layer mono-Mile mono-Mile Modularity in MILE multi-MILE multilingual correspondence conditions multiple levels of modularity Bucharest, 30 July 2003

  50. Each monolingual layer within Mono-MILE identifies a basicunit of lexical description The Mono-MILE SemU basic unit to describe the semantic properties of the MU semantic layer basic unit to describe the syntactic behavior of the MU SynU syntactic layer basic unit to describe the inflectional and derivational morphological properties of the word MU morphological layer Bucharest, 30 July 2003

More Related