URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases

URDFQuery-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald

Information Extraction YAGO/DBpedia et al. bornOn(Jeff, 09/22/42) gradFrom(Jeff, Columbia) hasAdvisor(Jeff, Arthur) hasAdvisor(Surajit, Jeff) knownFor(Jeff, Theory) >120 M facts for YAGO2 (mostly from Wikipedia infoboxes) type(Jeff, Author)[0.9] author(Jeff, Drag_Book)[0.8] New fact candidates author(Jeff,Cind_Book)[0.6] worksAt(Jeff, Bell_Labs)[0.7] type(Jeff, CEO)[0.4] 100’s M additional facts from Wikipedia text

Outline • Motivation & Problem Setting • URDF running example: people graduating from universities • Efficient MAP Inference • MaxSAT solving with soft & hard constraints • Grounding • Deductive grounding of soft rules (SLD resolution) • Iterative grounding of hard rules (closure) • MaxSAT Algorithm • MaxSAT algorithm in 3 steps • Experiments & Future Work Query-Time Reasoning in Uncertain RDF Knowledge Bases

URDF: Uncertain RDF Data Model • Extensional Layer (information extraction & integration) • High-confidence facts:existing knowledge base (“ground truth”) • New fact candidates: extracted facts with confidence values • Integration of different knowledge sources: Ontology merging or explicit Linked Data(owl:sameAs, owl:equivProp.) Large “Uncertain Database” of RDFfacts • Intensional Layer (query-time inference) • Soft rules:deductive grounding & lineage (Datalog/SLD resolution) • Hard rules:consistency constraints (more general FOL rules) • Propositional&probabilisticconsistency reasoning Query-Time Reasoning in Uncertain RDF Knowledge Bases

Soft Rules vs. Hard Rules (Soft) Deduction Rules vs. (Hard) Consistency Constraints • People may live inmore than one place livesIn(x,y)  marriedTo(x,z)  livesIn(z,y) livesIn(x,y)  hasChild(x,z)  livesIn(z,y) • People are not born indifferent places/on different dates bornIn(x,y)  bornIn(x,z)  y=z • People are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t1)  marriedTo(x,z,t2)  y≠z  disjoint(t1,t2) [0.8] [0.5] Query-Time Reasoning in Uncertain RDF Knowledge Bases

Soft Rules vs. Hard Rules Rule-based (deductive) reasoning: Datalog, RDF/S, OWL2-RL, etc. (Soft) Deduction Rules vs. (Hard) Consistency Constraints • People may live inmore than one place livesIn(x,y)  marriedTo(x,z)  livesIn(z,y) livesIn(x,y)  hasChild(x,z)  livesIn(z,y) • People are not born indifferent places/on different dates bornIn(x,y)  bornIn(x,z)  y=z • People are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t1)  marriedTo(x,z,t2)  y≠z  disjoint(t1,t2) [0.8] [0.5] FOL constraints (in particular mutex): Datalog with constraints, X-tuples in Prob. DB’s owl:FunctionalProperty, etc. Query-Time Reasoning in Uncertain RDF Knowledge Bases

URDF Running Example KB:RDF Base Facts First-OrderRules hasAdvisor(x,y)  worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  y=z Computer Scientist type[1.0] type[1.0] type[1.0] hasAdvisor[0.7] hasAdvisor[0.8] Jeff Surajit David graduatedFrom[0.9] graduatedFrom[?] graduatedFrom[0.6] graduatedFrom[?] graduatedFrom[?] graduatedFrom[?] graduatedFrom[0.7] Stanford Princeton • Derived Facts • gradFrom(Surajit,Stanford) • gradFrom(David,Stanford) worksAt[0.9] type[1.0] type[1.0] University Query-Time Reasoning in Uncertain RDF Knowledge Bases

Basic Types of Inference • Maximum-A-Posteriori (MAP) Inference • Find the most likely assignment to query variables y under a given evidence x. • Compute: argmaxy P( y | x)(NP-hard for propositional formulas, e.g., MaxSAT over CNFs) • Marginal/Success Probabilities • Probability that query y is true in a random world under a given evidence x. • Compute: ∑y P( y | x)(#P-hard for propositional formulas) Query-Time Reasoning in Uncertain RDF Knowledge Bases

General Route: Grounding & MaxSAT Solving Query graduatedFrom(x, y) • 1) Grounding • Consider only facts (and rules)which are relevant for answering the query • 2) Propositional formula in CNF, consisting of • Grounded hard & soft rules • Uncertain base facts • 3) Propositional Reasoning • Find truth assignment to facts such that the total weight of the satisfied clauses is maximized  MAP inference: compute “most likely”possible world CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) (graduatedFrom(David, Stanford) graduatedFrom(David, Princeton))  (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford)  graduatedFrom(David, Princeton) 1000 1000 0.4 0.4 0.9 0.8 0.7 0.6 0.7 0.9 Query-Time Reasoning in Uncertain RDF Knowledge Bases

Why are high weights for hard rules not enough? CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) • Consider the following CNF (for A,B > 0, A >> B) • The optimal solution has weight A+B • The next-best solution has weight A+0 • Hence the ratio of the optimal over the approximate solution is A+B / A • In general, any (1+) approximation algorithm, with > 0, may set graduatedFrom(Surajit, Princeton)to true, as A+B /A 1 for A  . A 0 B Query-Time Reasoning in Uncertain RDF Knowledge Bases

URDF: MaxSAT Solving with Soft & Hard Rules Special case:Horn-clauses as soft rules & mutex-constraintsas hard rules • Find:argmaxy P( y | x) • Resolves to a variant of MaxSAT for propositional formulas 0.4 0.4 0.9 0.8 0.7 0.6 0.7 0.9 { graduatedFrom(Surajit, Stanford), graduatedFrom(Surajit, Princeton) } { graduatedFrom(David, Stanford), graduatedFrom(David, Princeton) } S: Mutex-const. MaxSAT Alg. Compute W0 = ∑clauses C w(C) P(C is satisfied); For each hard constraint S { For each fact f in St { Compute Wf+t = ∑clauses C w(C) P(C is sat. | f = true); } Compute WS-t= ∑clauses C w(C) P(C is sat. | St= false); Choose truth assignment to f in St that maximizes Wf+t , WS-t ; Remove satisfied clauses C; t++; } (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford)  graduatedFrom(David, Princeton) C: Weighted Horn clauses (CNF) • Runtime: O(|S||C|) • Approximation guarantee of 1/2 Query-Time Reasoning in Uncertain RDF Knowledge Bases

Deductive Grounding Algorithm (SLD Resolution/Datalog) First-Order Rules hasAdvisor(x,y)  worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  y=z Query graduatedFrom(Surajit, y) • graduatedFrom • (Surajit, Princeton) • graduatedFrom • (Surajit, Stanford)  /\ Base Facts • graduatedFrom(Surajit, Princeton) [0.7] • graduatedFrom(Surajit, Stanford) [0.6] • graduatedFrom(David, Princeton) [0.9] • hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] • type(Princeton, University) [1.0] type(Stanford, University) [1.0] • type(Jeff, Computer_Scientist) [1.0] • type(Surajit, Computer_Scientist) [1.0] • type(David, Computer_Scientist) [1.0] • hasAdvisor • (Surajit,Jeff) • worksAt • (Jeff,Stanford) Grounded Rules • hasAdvisor(Surajit, Jeff)  • worksAt(Jeff, Stanford)  • gradFrom(Surajit, Stanford) • gradFrom(Surajit, Stanford) • gradFrom(Surajit, Princeton) Query-Time Reasoning in Uncertain RDF Knowledge Bases

Dependency Graph of a Query • SLD grounding always starts from a query literal and first pursues over the soft deduction rules. • Grounding is also iterated over the hard rules in a top-down fashion by using the literals in each hard rule as new subqueries. • Cycles (due to recursive rules) are detected and resolved via a form of tabling known from Datalog. • Grounding terminates when a closure is reached, i.e., when no new facts can be grounded from the rules and all subgoals are either resolved or form the root of a cycle. Query-Time Reasoning in Uncertain RDF Knowledge Bases

Weighted MaxSAT Algorithm General idea Compute a potential function Wt that iterates over all hard rules St and set the fact f  Stthat maximizes Wt(or none of them) to true;set all other facts in St to false. • At iteration 0, we have • At any intermediate iteration t, we compare • At the final iteration t_max, all facts are assigned either true or false. • Wt_max is equal to the total weight of all clauses that are satisfied. Query-Time Reasoning in Uncertain RDF Knowledge Bases

Step 1 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi) and probabilities pi (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C: Weighted Horn clauses (CNF) Query-Time Reasoning in Uncertain RDF Knowledge Bases

Step 2 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi) and probabilities pi (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C: Weighted Horn clauses (CNF) Query-Time Reasoning in Uncertain RDF Knowledge Bases

Step 2 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi) and probabilities pi (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C1: hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford) P(C1) = 1 – (1-(1-1))(1-(1-1))(1-1) = 1 single partition, negated: 1 - pi C: Weighted Horn clauses (CNF) single partition, negated: 1 - pi single partition, positive: pi Query-Time Reasoning in Uncertain RDF Knowledge Bases

Step 2 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi) and probabilities pi P(C1 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-1) = 1 P(C2is satisfied) = 1-(1-(1-1))(1-(1-1))(1-0) = 0 ... (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C: Weighted Horn clauses (CNF) • W0 = 0.4 + 0.9 + 0.8 + 0.7 + 0.6 + 0.7 + 0.9 = 5.0 Query-Time Reasoning in Uncertain RDF Knowledge Bases

Step 3 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi), probabilities pi, truth values P(C1 is satisfied | f1=true) = 1-(1-(1-1))(1-(1-1))(1-1) = 1 P(C1is satisfied | f2=true) = 1-(1-(1-1))(1-(1-1))(1-0) = 0 ... (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C: Weighted Horn clauses (CNF) • W1 = 0.4 + 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.8 • W2 = 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.4 Query-Time Reasoning in Uncertain RDF Knowledge Bases

Experiments – Setup • YAGO Knowledge Base • 2 Mio entities,20 Mio facts • Soft Rules • 16 soft rules (hand-crafted deduction rules with weights) • Hard Rules • 5 predicates with functional properties (bornIn, diedIn, bornOnDate, diedOnDate, marriedTo) • Queries • 10 conjunctive SPARQL queries • Markov Logic as Competitor (based on MCMC) • MAP inference: Alchemy employs a form of MaxWalkSAT • MC-SAT: Iterative MaxSAT & Gibbs sampling Query-Time Reasoning in Uncertain RDF Knowledge Bases

YAGO Knowledge Base: URDF vs. Markov Logic • First run: ground each query against the rules • (SLD grounding + MaxSAT solving) & report sum of runtimes • Asymptotic runtime checks: synthetic soft rule expansions URDF: SLD grounding & MaxSat solving • URDF vs. Markov Logic • (MAP inference & MC-SAT) • |C| - # ground literals in soft rules • |S| - # ground literals in hard rules Query-Time Reasoning in Uncertain RDF Knowledge Bases

Recursive Rules & LUBM Benchmark • 42 inductively learned (partly recursive) rules over 20 Mio facts in YAGO • URDF grounding with different maximum SLD levels • URDF (SLD grounding + MaxSAT) vs. Jena (only grounding) over the LUBM benchmark • SF-1: 103,397 triplets • SF-5: 646,128 triplets • SF-10: 1,316,993 triplets Query-Time Reasoning in Uncertain RDF Knowledge Bases

Current & Future Topics... • Temporal consistency reasoning • Soft/hard ruleswith temporal predicates • Soft deduction rules: deduce confidence distribution of derived facts • Learning soft rules & consistency constraints • Explore how Inductive Logic Programming can be applied to large, uncertain & incomplete knowledge bases • More solving/sampling • Linear-time constrained&weightedMaxSATsolver • Improved Gibbs sampling with soft & hard rules • Scale-out • Distributed grounding via message passing • Updates/versioning for (linked) RDF data • Non-monotonicanswers for rules with negation! Query-Time Reasoning in Uncertain RDF Knowledge Bases

Online Demo! urdf.mpi-inf.mpg.de Query-Time Reasoning in Uncertain RDF Knowledge Bases

URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases

URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases

Presentation Transcript

Representing uncertain knowledge

Top-k Query Processing in Uncertain Database

Ch9 Reasoning in Uncertain Situations

Interactive Reasoning in Large and Uncertain RDF Knowledge Bases

RDF Query language

Chapter 7 Reasoning in Uncertain Situations

SPARQL AN RDF Query Language

SPARQL Query Language for RDF

SPARQL Query Language for RDF

Reasoning in Uncertain Situations

Reasoning in Uncertain Situations

SPARQL Query Language for RDF

Chapter 5 Reasoning in Uncertain Situations

Probabilistic Reasoning with Uncertain Data

Uncertain Knowledge Representation

Chapter 9 Reasoning in Uncertain Situations

RDF Query Languages

Probabilistic Reasoning with Uncertain Data

Reasoning in Uncertain Situations

Uncertain Knowledge Representation

SPARQL Query Language for RDF