180 likes | 297 Views
Optimized Backward Chaining Reasoning System for a Semantic Web Hui Shi, Kurt Maly, and Steven Zeil Contact : maly@cs.odu.edu. Outline. Problem Semantic web subject to changes How to scale a reasoner to big data? Background Knowledge base using ontologies Inference strategies
E N D
Optimized Backward Chaining Reasoning System for a Semantic Web Hui Shi, Kurt Maly, and Steven Zeil Contact: maly@cs.odu.edu WIMS 2014, June 2-4Thessaloniki, Greece
Outline • Problem • Semantic web subject to changes • How to scale a reasoner to big data? • Background • Knowledge base using ontologies • Inference strategies • Benchmarks • Query optimization • Integrated optimized backward chaining • Selection function • Switching resolution methods • Avoidance of non-termination – OLDT • Owl:sameAs optimization • Evaluation • Conclusions WIMS 2014, June 2-4Thessaloniki, Greece
Problem Efficiency of reasoning in the face of large scale and frequent changes within a question/answer system over a semantic web • Issue • Forward chaining scales well for fixed knowledge bases • Backward chaining can handle changes in knowledge base but does not scale WIMS 2014, June 2-4Thessaloniki, Greece
Background • Existing semantic application: question/answer systems • Libra, Cimple, Arnetminer • Semantic Web • Resource Description Framework(RDF) • Web Ontology Language (OWL) for specific knowledge domains • SPARQL query language for RDF • SWRL rule language • Reasoning systems • Jena proprietary Jena rules • Pellet and KANON • ORACLE 11g • OWLIM WIMS 2014, June 2-4Thessaloniki, Greece
Background • Knowledge base (KB) • Ontologies • Representation formalism: Description Logic (DL) • Inference methods for First Order Logic • Materialization and forward chaining • pre-computes inferred truths and starts with the known data • suitable for frequent computation of answers with data that are relatively static • Owlim and Oracle • Query-rewriting and backward chaining • expands the queries and starts with goals • suitable for efficient computation of answers with data that are dynamic and infrequent queries • Virtuoso WIMS 2014, June 2-4Thessaloniki, Greece
Background • Benchmarks evaluate and compare the performances of different reasoning systems • The Lehigh University Benchmark (LUBM) • The University Ontology Benchmark (UOBM) WIMS 2014, June 2-4Thessaloniki, Greece
Background • Query optimization – issues • Query (conjunction of individual clauses) optimization over databases – well understood • Having reasoner -> uncertainty regarding the size of solution space associated with resolving individual clauses • Query optimization in the presence of such uncertainty • Dynamic Optimization with an Interposed Reasoner • A greedy ordering of the proofs of the individual clauses according to estimated sizes anticipated for the proof results • Deferring joins of results from individual clauses where such joins are likely to result in excessive combinatorial growth of the intermediate solution WIMS 2014, June 2-4Thessaloniki, Greece
Hybrid reasoner Motivation example • Assume fully materialized KB • Harvester adds new fact: student0 enrolled course0 • Query ‘Who is enrolled in course 0?’ ok • Assume fact Porf0 teaches course0 in KB • Query “Who is being taught by Prof0?” not ok as simple lookup; needs reasoning with rule such as: enrolledIn(?Student,?Course?), teaches(?Faculty,?Course) :- isTaughtBy(?Student,?faculty) WIMS 2014, June 2-4Thessaloniki, Greece
Optimized Backward Chaining • Problem • Generate a query response for a given query pattern based on a specific rule set (RDFS , Horst, custom) • Four Optimizations • Ordered Selection Function • Switching between Binding Propagation and Free Variable Resolution • Avoid Repetition and Non-Termination (OLDT) • owl:sameAs Optimization WIMS 2014, June 2-4Thessaloniki, Greece
Dynamic Selection of Propagation Mode • Suppose that: • we have a rule body containing clauses (?x p1 ?y) and (?y p2 ?z) • we have already proven that the first clause can be satisfied using value pairs {(x1, y1), (x2,y2),…(xn,yn)}. WIMS 2014, June 2-4Thessaloniki, Greece
Dynamic Selection of Propagation Mode • Binding propagation mode • the bindings from the earlier solutions are substituted into the upcoming clause to yield multiple instances of that clause as goals for subsequent proof • (y1 p2 ?z), (y2 p2 ?z), …, (yn p2 ?z) • Free variable resolution mode • a single proof is attempted of the upcoming clause in its original form, with no restriction upon the free variables in that clause • (?y p2 ?z) WIMS 2014, June 2-4Thessaloniki, Greece
Dynamic Selection of Propagation Mode: Example • Suppose we have an earlier body clause 1: “?y type Course” and a subsequent body clause 2: “?x takesCourse ?y”. • 1.749 seconds to prove body clause 1 • average of 0.235 seconds to prove body clause 2 for a given value of ?y from the proof of body clause 1. • 86,361 students satisfying variable ?x • 0.235 *86,361=20,295 seconds with binding propagation • 2.612 seconds to resolve the second clause in free variable resolution WIMS 2014, June 2-4Thessaloniki, Greece
Dynamic Selection of Propagation Mode • Dynamically switch between modes based upon the size of the partial solutions obtained • Let n denote the number of solutions that satisfy an already proven clause • Let t denote the threshold used to dynamically select between modes • If n≤t, then the binding propagation mode will be selected • If n>t, then the free variable resolution mode will be selected • The larger the threshold is, the more likely binding propagation mode will be selected. WIMS 2014, June 2-4Thessaloniki, Greece
Calculation of Threshold t • Let join1 denote the time spent on the join operations in binding propagation mode • Let join2 denote the time spent on the join operations in free variable resolution mode • Let proof1i denote the time of proving first clause with i free variables and proof2j be the average time of proving new specialized form with j free variables. (i ∈ [1,3], j ∈ [0,2]) • Let proof3k denote the time of proving second clause with k free variables (k∈[1,3]) • Compare the time spent on binding propagation mode and free variable resolution mode to determine t. Binding propagation is favored when proof1i + proof2j * n + join1 < proof1i + proof3k + join2 • t = floor(proof3k/ proof2j ) WIMS 2014, June 2-4Thessaloniki, Greece
Calculation of Threshold t • To estimate proof3k and proof2j • we record the time spent on proving goals with different numbers of free variables • after we have recorded a sufficient number of proof times ,we compute the average time spent on goals with k free variables and j free variables respectively • Start with historical default value • Update the threshold several times when answering a particular query WIMS 2014, June 2-4Thessaloniki, Greece
Evaluation WIMS 2014, June 2-4Thessaloniki, Greece
Overall Performance LUBM(40) = 5,307,754 LUBM(1)= 100,839 WIMS 2014, June 2-4Thessaloniki, Greece
Conclusions • We have developed optimizations for a backward chaining algorithm • New optimized algorithm outperformed one of the best forward-chaining reasoner in scenarios where the knowledge base is subject to frequent change WIMS 2014, June 2-4Thessaloniki, Greece