590 likes | 598 Views
Explore how Human Language Technology accelerates the Semantic Web process, from ontology creation to knowledge retrieval and extension. Learn about linguistic analysis, ontology learning, and population techniques for efficient SW development. Discover tools like HPSG and LFG for dependency structure analysis. Visit DFKI's demo links for a deeper understanding.
E N D
Human Language Technology for the Semantic WebPaul BuitelaarDFKI GmbHLanguage Techology Lab & DFKI Competence Center Semantic WebSaarbrücken, Germany
Overview Human Language Technology for the SW HLT and the Ontology Lifecycle Levels of Linguistic AnalysisSmartWeb - Applications of HLT for the SW Knowledge Markup for Ontology Population Ontology Learning from Text
Ontology Lifecycle Populate Knowledge Base Generation Validate Consistency Checks Create Ontology Development Deploy Knowledge Retrieval Evolve Ontology Extension Maintain Usability Tests
HLT in the Ontology Lifecycle Populate Knowledge Base Generation Validate Consistency Checks Create Ontology Development Deploy Knowledge Retrieval Evolve Ontology Extension Maintain Usability Tests
HLT in the Ontology Lifecycle Ontology Learning Ontology Creation & Development Ontology Learning (Creationand Evolution) Linguistic Analysis to Extract Classes & Relations from Text Extract Classes & Relations Extract (Annotate) Instances Ontology Population Linguistic Analysis to Annotate and Extract Instances from Text Ontology Population Knowledge Base Generation
HLT for the SW Semantic Web Ontology Learning & Population Information Extraction (from Text) Dependency Structure & Discourse Analysis Part-of-Speech Tagging, Morphological Analysis, Semantic Tagging Phrase Recognition, Grammatical Functions
Part-of-Speech & Semantic Tagging Morphological Analysis “HLT Layer Cake” Levels of Linguistic Analysis Discourse Analysis [[He:SUBJ] [booked:PRED] [[this] [table:HEAD] NP:DOBJ:X1] …] … [[It:SUBJ:X1] [was:PRED] still available …] Dependency Struct. (S) [[He:SUBJ] [booked:PRED] [[this] [table:HEAD] NP:DOBJ] S] Dependency Struct. (Phrases) [[the:SPEC] [large:MOD] [table:HEAD] NP] Phrase Recognition / Chunking [[the] [large] [table] NP] [[in] [the] [corner] PP] [table:N:ART] [Sommer~schule:N] [work~ing:V] [table:N:furniture_01] [table:N:ARTIFACT] Tokenization (incl. Named-Entity Rec.) [table] [2005-06-01] [John Smith]
Levels of Linguistic Analysis Lexical Analysis Word Class: Part-of-Speech Word Structure: Morphology Phrase Analysis Analysis of Sentence Constituents, i.e. its Structure Phrases (if non-recursive: Chunks) Dependency Structure Analysis Analysis of the Meaning of a Sentence or Phrase Sentence-Internal: Predicate Argument Structure (Clause) Phrase-Internal: Head Modifier Structure
Part-of-Speech, Morphology Part-of-Speech (PoS) • noun, verb, adjective, preposition, … PoS tag sets may have between 10 and 50 (or more) tags Morphology • Most languages have inflection and declination, e.g.Singular, Plural computer, computers Present, Past Tense rejects, rejected Many languages have also complex (de)composition, e.g.Flachbildschirm(flat screen) > flach + Bildschirm> flach + Bild + Schirm
Phrases, Terms, Named Entities Phrases, e.g. Nominal Phrase (NP), Prepositional Phrase (PP)NP (non-recursive) a flat screen PP with a flat screen NP (recursive) a Dell computer with a flat screenTerms: Domain-Specific PhrasesDell computerDell computer with a flat screenNamed Entities: Person and Organization Names, Dates, …COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell
Dependency Structure Analysis The Dell computer with a flat screen had to be rejected because of a failure in the motherboard. flat screen Dell computer has-a reject has-a animate-entity motherboard failure location-of
Dependency Structure Dependencies between Predicates and Argumentsthe Dell computer with a flat screen had to be rejectedPRED: reject ARG1: ENTITY ARG2: ‘the Dell computer with a flat screen’‘Logical Form’ reject(x,y) & animate-entity(x) & computer(y) & … Dependency Structure Analysis is based onSub-categorization Framesreject :: Subj:NP, Obj:NP Selection Restrictionsreject :: Subj:NP:ANIMATE-ENTITY, Obj:NP:ENTITY
PRED claim < NULL, XCOMP > claim XCOMP SUBJ PRED computer MOD Dell y1 suffer SUBJ y1 PRED reject < NULL, SUBJ > ADJUNCT SUBJ OBL-from MOD ADJUNCT SUBJ y1 handling reject y1 Dell PRED suffer < SUBJ, OBL-from > SUBJ SUBJ y1 XCOMP OBL-from handling y1 : computer y1 Dependency Structure The Dell computer that has been rejected was claimed to have suffered from handling.reject(e1,x1,y1) & animate-entity(x1) & Dell_computer(y1) & claim(e2,x2,e3) & animate-entity(x2) & suffer_from(e3,y1,y2) & handling (y2)
Dependency Structure: Tools • HPSG: Head-Driven Phrase Structure Grammar • Main Development at Stanford Univ., DFKI, Tokyo Univ., others • Demo @ DFKI: http://www.dfki.de/hog/analyze.php • Resources (Grammars, Parsers): http://www.delph-in.net/ • EN, FR, DE, other • Freely Available
Dependency Structure: Tools • LFG: Lexical Functional Grammar • Main Development at Xerox (PARC, XRCE) • Demo @ DCU, Ireland: http://research.computing.dcu.ie/~acahill/get_lfg.html • EN, FR, DE • Not Freely Available
Dependency Structure: Tools • CCG: Combinatory Categorial Grammar • Main Development at Univ. of Edinburgh • Demo: http://groups.inf.ed.ac.uk/ccg/software.html • EN • Open Source
Dependency Structure: Tools • LG: Link Grammar • Main Development at CMU, USA • Demo: http://bobo.link.cs.cmu.edu/link/ • EN • Open Source
Applications of HLT for the SWSmartWebhttp://www.smartweb-projekt.de/”Mobile Access to the Semantic Web”Funded by BMB+F (German Ministry for Education and Research)
Some History (in NLP/IR) and Current Work Language Understanding Deep Semantic Text Analysis Compositional Semantics, Sense Resolution Deep Grammars, Large-Scale Semantic Lexicons (WordNet) Information Extraction Shallow Semantic Text Analysis Template Extraction: Lexical Trigger Spotting (incl. Word Sense Disambiguation), Named-Entity Recognition Large-Scale Gazetteers, NE Grammars/Models, Robust WSD Knowledge Markup for Ontology Population Shallow/Deep Semantic Text Analysis Class & Relation Instantiation Large-Scale Ontologies > To be merged with: Semantic Lexicons, Gazetteers, NE Grammars/Models, WSD Algorithms
Semantic Web Ontologies Knowledge Base WWW Documents Wrapping (Semi-Structured Data) Inference Engine Named-Entity Recognition & Concept Annotation Information Extraction Linguistic Annotation
Information Extraction from Text MATCH-RESULT FOOTBALL-PLAYER
Mark Crossley saved twice with his legs from Huckerby. Named Entity Recognition & Concept Annotation [Mark Crossley GOALKEEPER][saved GOALKEEPER_ACTION]twice with his legs from [Huckerby PLAYER]. Linguistic Annotation [Mark Crossley GOALKEEPER : SUBJ][saved PRED : GOALKEEPER_ACTION] twice [with his legs PP_OBJ][from [Huckerby PLAYER] PP_ADJUNCT]. Template [GOALKEEPER_ACTION = 'save‘, GOALKEEPER = 'Mark Crossley‘,PLAYER = 'Huckerby‘,MANNER = ‘legs']
Annotation/Extraction Format Ballack shoots in the 25th minute but the goalkeeper gets it.
Synonyms & Multilingual Variants “Ontology Learning Layer Cake” Rules Relations cure(dom:DOCTOR,range:DISEASE) Taxonomy is_a(DOCTOR,PERSON) Concepts DISEASE:=<Int,Ext,Lex> {disease,illness, Krankheit} Terms disease, illness, hospital Introduced in: Philipp Cimiano, PhD Thesis University of Karlsruhe, forthcoming
Some History (in NLP/IR) Lexical Knowledge Extraction Extraction of lexical semantic representations (word meaning) from Machine Readable Dictionaries – 70‘s/80‘s Extraction of semantic lexicons from corpora for Information Extraction systems - 80‘s/90‘s, e.g. CRYSTAL (Soderland) Answer Extraction in Question Answering - now, e.g. Webclopedia (Hovy) Thesaurus Extraction Similar work: multilingual term extraction, clustering Sextant (Grefenstette), DR-Link (Liddy), other
Some Current Work AIFB Univ. Karlsruhe Taxonomy Extraction – TextToOnto – Ling. Preproc., Clustering (FCA)CNTS Univ. Antwerpen, Free Univ. Brussels (VUB) Term & Relation Extraction – DOGMA – Ling. & Statistical Analysis DFKI Term & Relation Extraction – OntoLT / RelExt – Ling. & Stat. AnalysisFree Univ. Amsterdam (VU) Term & Relation Extr. for Web Service Ontology - Linguistic AnalysisUniv. Paris Term, Rel. & Taxonomy Extraction – ASIUM – Ling. Analysis, ClusteringUniv. Rome Term Extr., Synonyms, Lexical Relations – OntoLearn – WordNet based Overview in: Paul Buitelaar, Philipp Cimiano, Bernardo Magnini Ontology Learning from Text: Methods, Evaluation and Applications Frontiers in Artificial Intelligence and Applications Series, Vol. 123, IOS Press, July 2005.
Ontology Learning from Text • Approaches • Taxonomy Extraction, Document Clustering String-based, Document Level • “Unnamed” Relation Extraction, Word Clustering Stemming & PoS, Token Level • Extraction of Terms, “Named” Relations Dependency Structure Analysis, Term Level • Linguistic Aspects • Textual Grounding of Concepts Retain Linguistic Contexts and Realizations • Text-based Ontology Monitoring Compare Language Use over Time
Ontology Learning from Text • Approaches • Taxonomy Extraction, Document Clustering String-based, Document Level • “Unnamed” Relation Extraction, Word Clustering Stemming & PoS, Token Level • Extraction of Terms, “Named” Relations Dependency Structure Analysis, Term Level • Linguistic Aspects • Textual Grounding of Concepts Retain Linguistic Contexts and Realizations • Text-based Ontology Monitoring Compare Language Use over Time OntoLT & RelExt
Protégé PlugIn OntoLTJoint Work with Michael Sintek, Daniel Olejnik, Yasir Iqbal
OntoLT What is it? OntoLT provides a middleware solution in ontology development that enables the ontology engineer to bootstrap or extend a domain-specific ontology from a relevant text collection Version 1.0 available from: http://olp.dfki.de/OntoLT/OntoLT.htm How does it work? • based on automatic linguistic annotation • integrates statistical preprocessing options • manual definition of mapping rules • interactive user validation of candidates • automatic generation/extension of ontology fragments
Mapping Rules Map Text Elements to Classes/Slots
Corpus-based Relation Extraction (RelExt)Joint Work with Alexander Schutz
Corpus Relevance Measure Frequencies In BNC, NZZ NER & Concept Tagging Linguistic Annotation Relevance Scores Heads, Preds Annotated Corpus Cooccurrence Scores Heads <> Preds Cooccurrence Measure Linguistic Processing Statistical Processing Manual Evaluation Triple Generation Triples Head : Pred : Head Relation Extraction and Evaluation