340 likes | 467 Views
ParSemKB Integrating Text Mining Results form Different Modules Using Extended OWL-DL. Tianxiang Lu, Guillaume Jacquet, Aaron Kaplan, Erica Melis. Xerox Research Center Europe 6 chemin de Maupertuis 38240 Meylan, France. Motivation.
E N D
ParSemKBIntegrating Text Mining Results form Different Modules Using Extended OWL-DL Tianxiang Lu, Guillaume Jacquet, Aaron Kaplan, Erica Melis Xerox Research Center Europe 6 chemin de Maupertuis 38240 Meylan, France
Motivation • To merge different outputs of linguistic analysis model together. • To provide the standard way for working together with other public information on the web • To answer conjunctive queries
Example of Problems • Given: • New York Time, 20 August 2007 • Mr. Bush is now the president of United States. Tomorrow, he will meet Nicolas Sarkozy.
Basic Information • (s1) Subject(is, Mr. Bush) • (s1) Person(Mr. Bush) • (s1) Attribute(Mr. Bush, president of US) • (s2) Subject(will meet, he) • (s2) Object(will meet, Nicolas Sarkozy) • (s2) Person(Nicolas Sarkozy)
Coreference Engine • (s1) Profession(president of US) • (co) Coref(he, Mr. Bush)
XTM (temporal expression) • (s1) Temp(is, now) • (s1) Refer(now, 20/08/2007) • (s1) Temp(meet, tomorrow) • (s2) Refer(tomorrow, 21/08/2007) • (od) Before(is, meet)
EventsPotter Analyzer • (s1) Factual-Event( "New York Times", -, is, Mr. Bush, president of US, - , now ) • (s2) Factual-Event( "New York Times", -, meet, he, Nicolas Sakozy, - , tomorrow )
Problems • How to merge the results together • What happens if we want to use another approach (e.g. Statistical approach, Text Entailment etc. ) • How to integrate the information from WEB • Query: What did Nicolas Sarkozy and the president of US do together in 2007?
Solution: Knowledge-based Approach • The ParSemKB as a Blackbox • Internal Architecture of the Framework • Evaluation of OWL-DL as a Knowledge Representation language for ParSemKB • Documentations
ParSemKB as a Blackbox • Global Architecture • ParSemKB: Input and Output • ParSemKB: Specifications • Framework : Interfaces • Framework : Workflow and interaction
Global Architecture • Preprocessing Part • Factspotter analysis tools • Eventspotter analysis tools • TXM • Text Entailment system • Web sources • Storage Part • Occurrence Base + Knowledge Base • APIs between them • Adapters by Preprocessing part for translation • Input API and Application API by storage part
ParSemKB: Specifications • /F10/ Understand the input format • /F20/ Analyze the input corpus by using the background knowledge • /F30/ Store the knowledge in an appropriate way • /F40/ Provide an efficient way to retrieve the knowledge by querying the knowledge base
ParSemKB : Input and Output • Input • /D10/ Background knowledge • /D20/ Target corpus of data • Output • /D30/ The storage of knowledge • /D40/ The test results
Implementation Describe a domain using specific but standard Knowledge Representation language Find or implement the first framework supporting the language and build up the system Enhance the built up system by defining and inputing rules Test the system by trying out specific queries on API level Try to build up other systems by swapping the parts from 1 to 4. Evaluate the efficiency of them
Internal Architecture of Framework • Design Pattern “Factory method” for Knowledge Base Managers • JenaKBManager configurations • Test Templates
JenaKBManager • JENA PELLET PELLET MEM • JENA OWL PELLET DB • JENA PELLET PELLET DB • JENA OWL KAON2
Evaluation of OWL-DL as aKR for ParSemKB • Why OWL-DL • Reification • DL-Safe Rules • Negative Information • Uncertainty
Why OWL-DL • W3C Recommendation as Standard in 2004 and widely used • Based on XML as basic raw data exchange format • Based on RDF as reference data exchange • Different level of expressive power: Lite, DL, Full • DL: description logic (introduce negation, cardinality and complex restrictions)
OWL – abstract syntax with DL-extension • axiom ::= ’Class(’ classID modality {annotation} {description} ’)’ • axiom ::= ’ClassEnumeratedClass(’ classID {annotation} {individualID} ’)’ • description ::= classID • | restriction • | ’unionOf(’ {description} ’)’ • | ’intersectionOf(’ {description} ’)’ • | ’complementOf(’ description ’)’ • | ’oneOf(’ {individualID} ’)’ • ObjectRestrictionComponen ::= ’allValuesFrom(’ description ’)’ • | ’someValuesFrom(’ description ’)’ • | ’value(’ individualID ’)’ • | cardinality (ohne Beschr ¨ankung) • axiom ::= ’EquivalentClasses(’ description {description} ’)’ • | ’DisjointClasses(’ description description {description} ’)’ • | ’SubClassOf(’ description description ’)’n
Reification • Example: On the paper NewYork Times, Tom has pointed out that J.W. Bush was President in 2005. • Reified Statements: • (statement1: "J.W Bush" has_function President) • (statement2: President has_Time 2005) • (Reified statement3: statement1 hasSource Tom) • (Reified statement4: reified statement3 hasSource NewYork Times)
DL-Safe Rules • 1. Entity(?x) ^ Entity(?y) • ^ swrlx:hasProperty(?x, end-offset, ?eo) • ^ swrlx:hasProperty(?y, end-offset, ?eo) • ^ abox:hasClass(?y, ?C) ^ abox:hasClass(?x, ?C) • -> sameAs(?x, ?y)
Negation and Uncertainty • Introduce the notation for negation • Factual • Counter-factual • Possible • Makov Logic Network / Baysien Logic for handling the uncertain/ unknown objects. (not implemented)
Implementation – Input documents (see reference documents) • Domain definition: abstract version and personXeroxFOL.owl • Example inputs for individuals: • correferenceExample.rdf • schwarzneggerExample.rdf • documentExampleBush.rdf • Out.owl as large scale data
Implementation – Java API using Jena Framework • Import description of domain using “com.hp.hpl.jena.ontology.OntModel” • Bind different reasoners • Default reasoner for OWL-DL (instances based reasoning) • External reasoners • Choice of reasoners: Pellet, KAON2 • Implementation of a test sets for different frameworks • Implementation of JenaKB and run the simple trial.
Implementation – Packages • source (src) • Basics • Interfaces • KnowledgeManagers • Tests • executable binary codes(bin) • inputs (inputs) • configuration (conf) • libruary (lib) • logs (logs) • outputs (outputs)
Documentations • Scientific Report • User Guide • Developer Guide • External Materials • White Paper • Documents from Guillaume, Salah • Java Doc • Where to find what
References • Jena with OWL • http://jena.sourceforge.net/javadoc/index.html • http://jena.sourceforge.net/ontology/index.html • W3C relevant • http://www.w3.org/TR/2004/REC-owl-guide-20040210 • http://www.w3.org/Submission/SWRL/ • Protege • http://protege.stanford.edu/overview/protege-owl.html • http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf • http://protege.stanford.edu/plugins/owl/publications/DL2004-protege-owl.pdf • Other papers • Blog • Query-Answering for OWL-DL with Rules • Supporting Rule System Interoperability on the Semantic Web with SWRL • Mapping XML to existent OWL ontologies