1 / 59

Overview

Human Language Technology for the Semantic Web Paul Buitelaar DFKI GmbH Language Techology Lab & DFKI Competence Center Semantic Web Saarbrücken, Germany. Overview.

Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Language Technology for the Semantic WebPaul BuitelaarDFKI GmbHLanguage Techology Lab & DFKI Competence Center Semantic WebSaarbrücken, Germany

  2. Overview Human Language Technology for the SW HLT and the Ontology Lifecycle  Levels of Linguistic AnalysisSmartWeb - Applications of HLT for the SW  Knowledge Markup for Ontology Population Ontology Learning from Text

  3. Human Language Technology for the Semantic Web

  4. Ontology Lifecycle Populate Knowledge Base Generation Validate Consistency Checks Create Ontology Development Deploy Knowledge Retrieval Evolve Ontology Extension Maintain Usability Tests

  5. HLT in the Ontology Lifecycle Populate Knowledge Base Generation Validate Consistency Checks Create Ontology Development Deploy Knowledge Retrieval Evolve Ontology Extension Maintain Usability Tests

  6. HLT in the Ontology Lifecycle Ontology Learning Ontology Creation & Development Ontology Learning (Creationand Evolution) Linguistic Analysis to Extract Classes & Relations from Text Extract Classes & Relations Extract (Annotate) Instances Ontology Population Linguistic Analysis to Annotate and Extract Instances from Text Ontology Population Knowledge Base Generation

  7. HLT for the SW Semantic Web Ontology Learning & Population Information Extraction (from Text) Dependency Structure & Discourse Analysis Part-of-Speech Tagging, Morphological Analysis, Semantic Tagging Phrase Recognition, Grammatical Functions

  8. Part-of-Speech & Semantic Tagging Morphological Analysis “HLT Layer Cake” Levels of Linguistic Analysis Discourse Analysis [[He:SUBJ] [booked:PRED] [[this] [table:HEAD] NP:DOBJ:X1] …] … [[It:SUBJ:X1] [was:PRED] still available …] Dependency Struct. (S) [[He:SUBJ] [booked:PRED] [[this] [table:HEAD] NP:DOBJ] S] Dependency Struct. (Phrases) [[the:SPEC] [large:MOD] [table:HEAD] NP] Phrase Recognition / Chunking [[the] [large] [table] NP] [[in] [the] [corner] PP] [table:N:ART] [Sommer~schule:N] [work~ing:V] [table:N:furniture_01] [table:N:ARTIFACT] Tokenization (incl. Named-Entity Rec.) [table] [2005-06-01] [John Smith]

  9. Levels of Linguistic Analysis Lexical Analysis Word Class: Part-of-Speech  Word Structure: Morphology Phrase Analysis Analysis of Sentence Constituents, i.e. its Structure Phrases (if non-recursive: Chunks) Dependency Structure Analysis Analysis of the Meaning of a Sentence or Phrase  Sentence-Internal: Predicate Argument Structure (Clause)  Phrase-Internal: Head Modifier Structure

  10. Part-of-Speech, Morphology Part-of-Speech (PoS) • noun, verb, adjective, preposition, … PoS tag sets may have between 10 and 50 (or more) tags Morphology • Most languages have inflection and declination, e.g.Singular, Plural computer, computers Present, Past Tense rejects, rejected Many languages have also complex (de)composition, e.g.Flachbildschirm(flat screen) > flach + Bildschirm> flach + Bild + Schirm

  11. Phrases, Terms, Named Entities Phrases, e.g. Nominal Phrase (NP), Prepositional Phrase (PP)NP (non-recursive) a flat screen PP with a flat screen NP (recursive) a Dell computer with a flat screenTerms: Domain-Specific PhrasesDell computerDell computer with a flat screenNamed Entities: Person and Organization Names, Dates, …COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell

  12. Dependency Structure Analysis The Dell computer with a flat screen had to be rejected because of a failure in the motherboard. flat screen Dell computer has-a reject has-a animate-entity motherboard failure location-of

  13. Dependency Structure Dependencies between Predicates and Argumentsthe Dell computer with a flat screen had to be rejectedPRED: reject ARG1: ENTITY ARG2: ‘the Dell computer with a flat screen’‘Logical Form’ reject(x,y) & animate-entity(x) & computer(y) & … Dependency Structure Analysis is based onSub-categorization Framesreject :: Subj:NP, Obj:NP Selection Restrictionsreject :: Subj:NP:ANIMATE-ENTITY, Obj:NP:ENTITY

  14. PRED claim < NULL, XCOMP > claim XCOMP SUBJ PRED computer MOD Dell y1 suffer SUBJ y1 PRED reject < NULL, SUBJ > ADJUNCT SUBJ OBL-from MOD ADJUNCT SUBJ y1 handling reject y1 Dell PRED suffer < SUBJ, OBL-from > SUBJ SUBJ y1 XCOMP OBL-from handling y1 : computer y1 Dependency Structure The Dell computer that has been rejected was claimed to have suffered from handling.reject(e1,x1,y1) & animate-entity(x1) & Dell_computer(y1) & claim(e2,x2,e3) & animate-entity(x2) & suffer_from(e3,y1,y2) & handling (y2)

  15. Dependency Structure: Tools • HPSG: Head-Driven Phrase Structure Grammar • Main Development at Stanford Univ., DFKI, Tokyo Univ., others • Demo @ DFKI: http://www.dfki.de/hog/analyze.php • Resources (Grammars, Parsers): http://www.delph-in.net/ • EN, FR, DE, other • Freely Available

  16. Dependency Structure: Tools • LFG: Lexical Functional Grammar • Main Development at Xerox (PARC, XRCE) • Demo @ DCU, Ireland: http://research.computing.dcu.ie/~acahill/get_lfg.html • EN, FR, DE • Not Freely Available

  17. Dependency Structure: Tools • CCG: Combinatory Categorial Grammar • Main Development at Univ. of Edinburgh • Demo: http://groups.inf.ed.ac.uk/ccg/software.html • EN • Open Source

  18. Dependency Structure: Tools • LG: Link Grammar • Main Development at CMU, USA • Demo: http://bobo.link.cs.cmu.edu/link/ • EN • Open Source

  19. Applications of HLT for the SWSmartWebhttp://www.smartweb-projekt.de/”Mobile Access to the Semantic Web”Funded by BMB+F (German Ministry for Education and Research)

  20. SmartWebKnowledge Markup for Ontology Population

  21. Some History (in NLP/IR) and Current Work Language Understanding Deep Semantic Text Analysis Compositional Semantics, Sense Resolution Deep Grammars, Large-Scale Semantic Lexicons (WordNet) Information Extraction Shallow Semantic Text Analysis  Template Extraction: Lexical Trigger Spotting (incl. Word Sense Disambiguation), Named-Entity Recognition Large-Scale Gazetteers, NE Grammars/Models, Robust WSD Knowledge Markup for Ontology Population Shallow/Deep Semantic Text Analysis  Class & Relation Instantiation  Large-Scale Ontologies > To be merged with: Semantic Lexicons, Gazetteers, NE Grammars/Models, WSD Algorithms

  22. Semantic Web Ontologies Knowledge Base WWW Documents Wrapping (Semi-Structured Data) Inference Engine Named-Entity Recognition & Concept Annotation Information Extraction Linguistic Annotation

  23. Information Extraction from Text MATCH-RESULT FOOTBALL-PLAYER

  24. Mark Crossley saved twice with his legs from Huckerby. Named Entity Recognition & Concept Annotation [Mark Crossley GOALKEEPER][saved GOALKEEPER_ACTION]twice with his legs from [Huckerby PLAYER]. Linguistic Annotation [Mark Crossley GOALKEEPER : SUBJ][saved PRED : GOALKEEPER_ACTION] twice [with his legs PP_OBJ][from [Huckerby PLAYER] PP_ADJUNCT]. Template [GOALKEEPER_ACTION = 'save‘, GOALKEEPER = 'Mark Crossley‘,PLAYER = 'Huckerby‘,MANNER = ‘legs']

  25. Annotation/Extraction Format Ballack shoots in the 25th minute but the goalkeeper gets it.

  26. SmartWebOntology Learning from Text

  27. Synonyms & Multilingual Variants “Ontology Learning Layer Cake” Rules Relations cure(dom:DOCTOR,range:DISEASE) Taxonomy is_a(DOCTOR,PERSON) Concepts DISEASE:=<Int,Ext,Lex> {disease,illness, Krankheit} Terms disease, illness, hospital Introduced in: Philipp Cimiano, PhD Thesis University of Karlsruhe, forthcoming

  28. Some History (in NLP/IR) Lexical Knowledge Extraction Extraction of lexical semantic representations (word meaning) from Machine Readable Dictionaries – 70‘s/80‘s Extraction of semantic lexicons from corpora for Information Extraction systems - 80‘s/90‘s, e.g. CRYSTAL (Soderland) Answer Extraction in Question Answering - now, e.g. Webclopedia (Hovy) Thesaurus Extraction Similar work: multilingual term extraction, clustering Sextant (Grefenstette), DR-Link (Liddy), other

  29. Some Current Work AIFB Univ. Karlsruhe Taxonomy Extraction – TextToOnto – Ling. Preproc., Clustering (FCA)CNTS Univ. Antwerpen, Free Univ. Brussels (VUB) Term & Relation Extraction – DOGMA – Ling. & Statistical Analysis DFKI  Term & Relation Extraction – OntoLT / RelExt – Ling. & Stat. AnalysisFree Univ. Amsterdam (VU) Term & Relation Extr. for Web Service Ontology - Linguistic AnalysisUniv. Paris Term, Rel. & Taxonomy Extraction – ASIUM – Ling. Analysis, ClusteringUniv. Rome Term Extr., Synonyms, Lexical Relations – OntoLearn – WordNet based Overview in: Paul Buitelaar, Philipp Cimiano, Bernardo Magnini Ontology Learning from Text: Methods, Evaluation and Applications Frontiers in Artificial Intelligence and Applications Series, Vol. 123, IOS Press, July 2005.

  30. Ontology Learning from Text • Approaches • Taxonomy Extraction, Document Clustering String-based, Document Level • “Unnamed” Relation Extraction, Word Clustering Stemming & PoS, Token Level • Extraction of Terms, “Named” Relations Dependency Structure Analysis, Term Level • Linguistic Aspects • Textual Grounding of Concepts Retain Linguistic Contexts and Realizations • Text-based Ontology Monitoring Compare Language Use over Time

  31. Ontology Learning from Text • Approaches • Taxonomy Extraction, Document Clustering String-based, Document Level • “Unnamed” Relation Extraction, Word Clustering Stemming & PoS, Token Level • Extraction of Terms, “Named” Relations Dependency Structure Analysis, Term Level • Linguistic Aspects • Textual Grounding of Concepts Retain Linguistic Contexts and Realizations • Text-based Ontology Monitoring Compare Language Use over Time OntoLT & RelExt

  32. Protégé PlugIn OntoLTJoint Work with Michael Sintek, Daniel Olejnik, Yasir Iqbal

  33. OntoLT What is it? OntoLT provides a middleware solution in ontology development that enables the ontology engineer to bootstrap or extend a domain-specific ontology from a relevant text collection Version 1.0 available from: http://olp.dfki.de/OntoLT/OntoLT.htm How does it work? • based on automatic linguistic annotation • integrates statistical preprocessing options • manual definition of mapping rules • interactive user validation of candidates • automatic generation/extension of ontology fragments

  34. OntoLT: Architecture

  35. Mapping Rules Map Text Elements to Classes/Slots

  36. Compute Statistical Relevance of Text Elements

  37. Extract Class/Slot Candidates

  38. Inspect Extraction Contexts

  39. Extracted Ontology Fragments

  40. Corpus-based Relation Extraction (RelExt)Joint Work with Alexander Schutz

  41. Corpus Relevance Measure Frequencies In BNC, NZZ NER & Concept Tagging Linguistic Annotation Relevance Scores Heads, Preds Annotated Corpus Cooccurrence Scores Heads <> Preds Cooccurrence Measure Linguistic Processing Statistical Processing Manual Evaluation Triple Generation Triples Head : Pred : Head Relation Extraction and Evaluation

More Related