1 / 36

Text Understanding Agents and the Semantic Web

Text Understanding Agents and the Semantic Web. Akshay Java, Tim Finin, Sergei Nirenburg 01/04/2005. Outline. Motivation: Language Understanding Agents Ontological Semantics Bridging the Knowledge Gap Preliminary Evaluation SemNews : An Application Testbed Conclusion Q&A. Motivation.

soleil
Download Presentation

Text Understanding Agents and the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Understanding Agents and the Semantic Web Akshay Java, Tim Finin, Sergei Nirenburg 01/04/2005

  2. Outline • Motivation: Language Understanding Agents • Ontological Semantics • Bridging the Knowledge Gap • Preliminary Evaluation • SemNews: An Application Testbed • Conclusion • Q&A

  3. Motivation • Intelligent agents need knowledge and information. • Most Web content is NL text. • SW can benefit NLP tools in their language understanding tasks Facts from NL NLP Tools Natural Language RDF/OWL WWW Semantic Web Text Images Audio video Ontologies Instances triples Web of documents Web of data structured information

  4. Provides RDF version of the news. Language Understanding Agents Motivation

  5. Ontological Semantics OntoSem is a Natural Language Processing System that processes the text and converts them into facts. Supported by a constructed world model encoded in a rich Ontology.

  6. Ontological Semantics

  7. Static Knowledge Sources • Ontology • 8000 concepts • Avg 16 properties each • Lexicons • English: 45000 entries • Spanish: 40000 entries • Chinese: 3000 entries • Fact repository • 20000 facts • Onomasticon • NNNNN names

  8. The OntoSem Ontology FILLER PROPERTY FACET ONTOLOGY ::= CONCEPT+ CONCEPT ::= ROOT | OBJECT-OR-EVENT | PROPERTY SLOT ::= PROPERTY | FACET | FILLER

  9. Text Meaning Representation (TMR) Word sense addressed disambiguated A persistent fact stored in the FR Semantic dependency established

  10. Text Meaning Representation (TMR) REQUEST-ACTION-69   AGENT HUMAN-72 THEME ACCEPT-70   BENEFICIARY ORGANIZATION-71   SOURCE-ROOT-WORD ask TIME (< (FIND-ANCHOR-TIME)) ACCEPT-70  THEME WAR-73   THEME-OF REQUEST-ACTION-69   SOURCE-ROOT-WORD authorizeORGANIZATION-71   HAS-NAME United-Nations  BENEFICIARY-OF REQUEST-ACTION-69  SOURCE-ROOT-WORD UNHUMAN-72   HAS-NAME Colin Powell  AGENT-OF REQUEST-ACTION-69 SOURCE-ROOT-WORD he ; reference resolution has been carried outWAR-73   THEME-OF ACCEPT-70  SOURCE-ROOT-WORD war Heaskedthe UNto authorizethe war.

  11. Mapping OntoSem to web based KR Fact Repository NL Text OntoSem TMR TMRs In OWL Lexicon OntoSem2OWL Ontology OWL Ontology

  12. Mapping Rules for Classes OntoSem LISP version (make-frame patent (definition (value (common "the exclusive right to make, use or sell an invention, which is granted to the inventor"))) (is-a (value (common intangible-asset legal-right)))) OWL Version: <owl:Class rdf:about="&ontosem;patent"> <rdfs:subClassOf> <owl:Class rdf:about="&ontosem;intangible-asset"> </owl:Class> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Class rdf:about="&ontosem;legal-right"> </owl:Class> </rdfs:subClassOf> <rdfs:comment> he exclusive right to make, use or sell an invention, which is granted to the inventor </rdfs:label> </owl:Class>

  13. Mapping Rules for Properties • Properties can be • ObjectProperty owl:ObjectProperty • Datatype Property owl:DatatypeProperty • Property hierarchy is defined by owl:subPropertyOf • Domain maps to rdfs:domain • Range maps to rdfs:range • Restrictions are handled using owl:Restriction • Numeric datatypes are handled using XSD

  14. Mapping Rules for Properties… (make-frame controls (domain (sem (common physical-event physical-object social-event social-role))) (range (sem (common actualize artifact natural-object social-role))) (is-a (value (common relation))) (inverse (value (common controlled-by))) (definition (value (common "A relation which relates concepts to what they can control"))))

  15. Mapping Rules for Properties… (make-frame <owl:ObjectProperty rdf:ID= "controls"> <rdfs:domain> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#physical-event"/> <owl:Class rdf:about="#physical-object"/> <owl:Class rdf:about="#social-event"/> <owl:Class rdf:about="#social-role"/> </owl:unionOf> </owl:Class> </rdfs:domain> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#actualize"/> <owl:Class rdf:about="#artifact"/> <owl:Class rdf:about="#natural-object"/> <owl:Class rdf:about="#social-role"/> </owl:unionOf> </owl:Class> </rdfs:range> <rdfs:subPropertyOf> <owl:ObjectProperty rdf:about="#relation"/> </rdfs:subPropertyOf> <owl:inverseOf rdf:resource="#controlled-by"/> <rdfs:label> "A relation which relates concepts to what they can control" </rdfs:label> </owl:ObjectProperty> (domain (range (is-a (inverse

  16. Mapping Rules for Facets Facets are a way to restricting the fillers that can be used for a particular slot • SEM and VALUE • Maps them using owl:Restriction on a particular property. • RELAXABLE-TO • Add this to the classes present in owl:Restriction and add this information in the annotation. • DEFAULT • No clear way to represent non-monotonic reasoning and closed world assumptions in Semantic Web. • DEFAULT-MEASURE • similar to DEFAULT Facet, not handled. • DEFAULT, DEFAULT-MEASURE used relatively less frequently • NOT • Not facet can be handled using owl:disjointOf • INV • need not be handled since is-a slot is already mapped to owl:inverseOf

  17. Evaluation Built Ontology translation tool using Jena API Total Triples Generated ~ 102189 (including bnode) Time to build the Model ~ 10-40 sec Time to do RDFS Inference ~ 10 sec Time to do OWL Micro ~ 40 sec Time to do OWL Full ~ ???? DL Expressivity: ELUIHEL - Conjunction and Full Existential QuantificationU - UnionH - Role HierarchyI - Role Inverse Swoop Pellet Wonderweb http://w3c.org/RDF/Validator/ After Translation Total Number of Classes: 7747 (Defined: 7747, Imported: 0)Total Number of Datatype Properties: 0 (Defined: 0, Imported: 0)Total Number of Object Properties: 604 (Defined: 604, Imported: 0)Total Number of Annotation Properties: 1 (Defined: 1, Imported: 0)Total Number of Individuals: 0 (Defined: 0, Imported: 0) NOTE: This is using no Restrictions OWL FULL

  18. Evaluation • Syntactic Correctness: was checked using OWL/RDF validators. • Semantic Validation: Full semantic validation even for subsets of OWL is difficult. • Meaning Preservation: some subset of the native representation features such as DEFAULTS, modality, case roles may be underrepresented or not handled. • Feature Minimization: Complex features could be difficult for reasoners to handle hence we can perform the translations at each of the levels – OWL Lite, OWL DL, OWL Full. • Translation Complexity: OntoSem is an extensive and large ontology (~8000 concepts). Translation itself is done syntactically but in general translation might require reasoning which could be an issue.

  19. An Application Testbed: SemNews • Semantically Search and Browse news • Aggregators collect the RSS news descriptions form various sources. • The sentences are processed by OntoSem and are converted into TMRs • Provides intelligent agents with the latest news in a machine readable format • http://semnews.umbc.edu/ http://semnews.umbc.edu

  20. Fact Repository Interface Language Processing Data Aggregators 1 11 2 OntoSem RSS Aggregator Ontology & Instance browser 3 4 News Feeds TMRs FR Text Search 12 RDQL Query 13 6 5 OntoSem2OWL Swoogle Index 14 9 Dekade Editor 7 OntoSem Ontology (OWL) Inferred Triples Semantic RSS 15 10 8 Knowledge Editor Environment TMR Semantic Web Tools http://semnews.umbc.edu

  21. Agent understandable news Provides RDF version of the news. http://semnews.umbc.edu

  22. Semantacizing RSS View structured representation of the RSS news story. Future versions would enable editing the facts and provide provenance information http://semnews.umbc.edu

  23. News stories are ontologically linked Find news stories by browsing through the OntoSem ontology. http://semnews.umbc.edu

  24. Tracking Named Entities Find stories on a specific named entity. http://semnews.umbc.edu

  25. Browsing Facts Fact repository explorer for named entity ‘Mexico’ shows that it has a relation ‘nationality-of’ with CITIZEN-235 Fact repository explorer for instance CITIZEN-235 shows that the citizen is an agent of ESCAPE-EVENT http://semnews.umbc.edu

  26. Querying the semanticized RSS RDQL Queries Provides structured querying over text repre-sented in RDF. http://semnews.umbc.edu

  27. Semantic Alerts Alerts can be specified as ontological concepts/ keywords / RDQL queries. Subscribe to results of structured queries http://semnews.umbc.edu

  28. Beyond keyword search • Conceptually searching for content Find all news stories that have something to do with a place and a terrorist activity. • Context based querying Find all events in which ‘George Bush’ was the ‘speaker’. • Reporting facts Find all politicians who traveled to Asia. • Knowledge sharing Populating instances by mapping FOAF and DC to OntoSem ontology.

  29. Current work • Enron email corpus • Profiles in terror

  30. Conclusions • Integrating language processing agents into the SW would publish SW annotations and documents that capture the text’s meaning. • Migrating from native non-web based representation to SW representation may be loss-full but is still useful for many applications. • SemNews application testbed demonstrates some scenarios that can benefit from language understanding agents.

  31. For More Information • Semnews application http://semnews.umbc.edu/ • OntoSem NLP system http://ilit.umbc.edu/ • UMBC ebiquity research group http://ebiquity.umbc.edu/ • This presentation http://ebiquity.umbc.edu/paper/html/id/260/

  32. References Software Used [1] OntoSem http://ilit.umbc.edu/ [2] RDF Validation service http://w3c.org/RDF/Validator [3] Jena Toolkit http://jena.sourceforge.net/ [4] Swoop Ontology Viewer http://www.mindswap.org/2004/SWOOP/ [5] Pellet OWL DL Reasoner http://www.mindswap.org/2003/pellet/ [6] Wonder Web OWL Validator http://phoebus.cs.man.ac.uk:9999/OWL/Validator Papers [1] Sergei Nirenburg and Victor Raskin, Ontological Semantics, Formal Ontology and Ambiguity [2] Sergei Nirenburg and Victor Raskin, Ontological Semantics, MIT Press, Forthcoming [3] Sergei Nirenburg, Ontological Semantics: Overview, Presentation CLSP JHU, Spring 2003 [4] Marjorie McShane, Sergei Nirenburg, Stephen Beale, Margalit Zabludowski, The Cross Lingual Reuse and Extension of knowledge Resources in Ontological Semantics [5] P.J Beltran-Ferruz, P.A Gonzalez-Calero, P. Gervas Converting Mikrokosmos frames into Description Logics. [6] Sergei Nirenburg, Ontology Tutorial, ILIT UMBC Mailing Lists [1] Jena Developers jena-dev@yahoogroups.com [2] pellet users pellet-users@lists.mindswap.org [3] Semantic web semanticweb@yahoogroups.com [4] W3c RDF Interest www-rdf-interest@w3.org [5] W3c Semantic web semantic-web@w3.org

  33. Backup slides

  34. Buildfile: build.xml init: compile: dist: [jar] Building jar: /home/aks1/software/eclipse/workspace/ontojena/dist/lib/ontojena.jar run: [java] MODEL OK [java] Resource: http://ontosem.org/#fire-engine [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#fire-engine) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#all) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#physical-object) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#inanimate) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#wheeled-vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#engine-propelled-vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#wheeled-engine-vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#artifact) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#object) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#land-vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#truck) [java] - (http://ontosem.org/#fire-engine rdfs:label ' "a truck with equipment for fighting fires"') [java] - (http://ontosem.org/#fire-engine rdf:type owl:Class) [java] fire-engine recognized as subclas of vehicle BUILD SUCCESSFUL Total time: 10 seconds real 0m11.144s user 0m9.530s sys 0m0.190s [aks1@trishuli ontojena]$ Reasoning Capabilities Finding Transitive Closures (RDFS reasoning) vehicle Inferred Triples Land-vehicle Engine-propelled--vehicle Wheeled--vehicle Wheeled-engine-vehicle Truck Fire-engine

  35. Mapping Rules Property Related Constructs

  36. Mapping Rules Facet related constructs

More Related