390 likes | 519 Views
Reasoning and Assessing Trust in Uncertain Information using Bayesian Description Logics. Achille Fokoue, Mudhakar Srivatsa ( IBM-US) Rob Young ( dstl-UK) ITA Bootcamp July 12, 2010. Sources of Uncertainty: (Accuracy, Stochasticity and Beyond). Coalition warfare
E N D
Reasoning and Assessing Trust in Uncertain Information using Bayesian Description Logics Achille Fokoue, Mudhakar Srivatsa (IBM-US) Rob Young (dstl-UK) ITA Bootcamp July 12, 2010
Coalition warfare Ephemeral groups (special forces, local militia, Medecins Sans Frontieres, etc) with heterogeneous trust levels respond to emerging threats Secure Information Flows Can I share this information with an (un)trusted entity? Can I trust this piece of information? Decision Making under Uncertainty Information Flow in Yahoo!
Coarse grained and static access control information Rich security metadata [QoISN’08] Semantic knowledgebase for situation awareness (e.g., need-to-share) [SACMAT’09] Fail to treat uncertainty as a first-class citizen Scalable algorithms and meaningful query answering semantics (possible worlds model*) to reason over uncertain data [submitted] Lack of explanations Provide dominant justifications to decision makers [SACMAT’09’] Use justifications for estimating info credibility [submitted] [QoISN’08: IBM-US & RHUL] [SACMAT’09: IBM-US, CESG & dstl] [submitted (https://www.usukitacs.com/?q=node/5401): IBM-US & CESG] [submitted: IBM-US & dstl] Limitations of traditional approaches
Goal: More flexible and situation aware decision support mechanisms for information sharing Key technical principles Perform late binding of decisions (flexibility) Shareability/trust in information is expressed as logical statements over rich security metadata and a semantic KB Domain specific concepts and relationships Current state of the world Logical framework that supports explanations that Allow a sender to intelligently downgrade information (e.g., delete participant list in a meeting) Allow a recipient to judge the credibility of information Our approach in a nutshell
Architecture Rich Metadata • A Global Awareness Module continually maintains and updates a knowledge base encoding, in a BDL language, the relevant state of the world for our application • (e.g., locations of allied and enemy forces) • A hybrid reasoner is responsible for making decisions on information flows • The reasoner provides dominant explanation(s) over uncertain data that justifies the decision • This architecture is replicated at every decision center BDL KB Global Awareness BDL Reasoner Rules & Policy Justifications
DL: Semantic Knowledgebase [SACMAT’09: IBM-US, CESG, dstl] Extended KANI TBox • SHIN Description logics (OWL) • Very expressive decidable subset of first order logic • Reasoning intractable in the worst-case, but • SHER (Scalable Highly Expressive Reasoner) good scalability characteristics in practice • DL KB consists of: • TBox: terminology box Description of the concepts and relations in the domain of discourse. Extension of KANI ontology • ABox: extensional part. Description of instance information ABox
Traditional approaches to deriving trust from data Drawbacks of a pure DL based approach [SACMAT 09] Does not account for uncertainty Trust in information and sources given – not derived from data, history of interactions Limitations of traditional approaches to deriving trust in data Assumes pair-wise numeric (dis)similarity metric between two entities: e.g., eBay recommendation, Netflix ratings Lack of support for conflicts spanning multiple entities: e.g., 3 Sources: S1, S2,S3 Ax1 = all men are mortal Ax2 = Socrates is a man Ax3 = Socrates is not mortal Lack of support for uncertainty in information
Bayesian Description Logics (BDL) • Challenge 1: How to scalably reason over inconsistent and uncertain knowledgebase? • BDL experimental evaluation on an open source DL reasoner shown to scale up to 7.2 million probabilistic axioms • Pellet (a state of the art DL reasoner) broke down at 0.2 million axioms • Pronto (probabilistic reasoner) uses an alternate richer formulation, but does not scale beyond a few dozen axioms • Challenge 2: What is a meaningful query answering semantics for uncertain knowledgebase • Possible worlds model* (concrete definition in paper)
Bayesian Description Logics (BDL) • Challenge 3: How to efficiently compute justifications over uncertain data? • Sampling • Challenge 4: How to use justifications? • Assess the credibility of information sources (trust-based decision making) • Intelligently transform data to make it shareable [TBD]
Notation: Bayesian Network • V: set of all random variables in a Bayesian network • V = {V1, V2} • D(Vi): set of all values that Vi can take • D(V1) = D(V2) = {0, 1} • v: assignment of all random variables to a possible value • v = {V1 = 0, V2 = 1} • v|X (for some X V): projection of v that includes random variables in X • v|{V2} = {V2 = 1} • D(X) (for some X V): Cartesian product of domains D(Xi) for all Xi in X
Notation: BDL • Probabilistic knowledge base K = (A, T, BN) • BN = Bayesian network over a set V of variables • T = { : X = x}, where is a classical Tbox axiom; annotates with X =x • X V • x in D(X) • e.g., Road SlipperyRoad : Rain = true • A= { : X = x}, where is a classical Abox axiom • : p, where p [0, 1] assigns a probability value directly to a classical axiom • : Xnew = true, • Xnew new independent random boolean variable
BDL: Simplified Example • TBox: • SlipperyRoad OpenedRoad HazardousCondition • Road SlipperyRoad : Rain = true • ABox: • Road(route9A) • OpenedRoad(route9A) : TrustSource = true • BN has three variables: Rain, TrustSource, Source • PrBN(TrustSource = true | Source = Mary) = 0.8 • PrBN(TrustSource = true | Source = John) = 0.5 • PrBN(Rain = true) = 0.7 • PrBN(Source = John) = 1 • Informally, the probability values computed through the Bayesian network is propagated to the DL side as follows
BDL: Simplified Example • Primitive event e: Each assignment v for all random variables in BN (e.g., {Rain = true, TrustSource = false, Source = John}) corresponds to a primitive event e (or a scenario or a possible world) • Each primitive event e is associated with • A probability value (PrBN(V=v)) through BN • and to a set of classical DL axioms (Ke) annotated with compatible annotations (e.g., SlipperyRoad OpenedRoad HazardousCondition, Road SlipperyRoad, Road(route9A)) • Intuitively the probability value associated with an statement (e.g., HazardousCondition(route9A)) is obtained by summing the probabilities of all primitive events e such that the classical KB Ke entails (see full definition in paper)
Handling Inconsistent KBs • Simple example • KB = {T, A {true false: X = true}, BN} • Namely, there exists a possible world (when random variable X = true) when the KB is inconsistent • But there are other possible worlds in which KB is consistent (X = false) • What if PrBN(X = true) = 10-6: the probability of KB being inconsistent is very small • Inconsistency tolerant semantics for BDL: degree of unsatisfiability • Let e be a primitive event such that its associated KB Ke is inconsistent; degree of unsatisfiability is essentially the sum of probabilities pe • Reason over consistent subspaces of KB (reasoning / query answering semantics follow)
BDL: Query Answering Semantics • Given query = : e • Does the axiom hold under a set of possible worlds compatible with e • And a KB K = (T, A, BN) that is satisfiable to degree d ( 0) • Answer (, pr) consists of a ground substitution for the variables in query and some pr [0, 1] such that • pr = infimum { Pr( : e) | Pr is a model of K} • See detailed proofs in paper • Paper shows that prior query answering semantics on probabilistic KBs (Amato et al) may be counter-intuitive • See counter examples in paper
Scalable Query Answering • Monte-Carlo sampling for error-bounded approximate query answering • To compute the probability pr for a ground substitution (, pr) that satisfies a query : e • Observation 1: pr is of the form vPrBN(v) • Essentially, we need to selectively identify possible worlds (based on the query ) • Sum up the probabilities (obtained from BN) for selected possible worlds • Observation 2: decision makers may not need the exact value of pr • Say we need answers such that pr > thr • If the true probability that is an answer is 0.95 and thr = 0.75 using only 25 samples we can conclude that is an answer with 95% confidence (true pr = 0.85, # samples = 60; exact pr, # samples = 396)
SHER – A Highly Scalable SOUND and COMPLETE Reasoner for large OWL-DL KB Reasons over highly expressive ontologies Reasons over data in relational databases Highly scalable Can scale to more than 60 million triples Semantically index 300 million triples from the medical literature. Provide explanations PSHER – Probabilistic extension to SHER using BDL Experimental Evaluation
The summary mapping function f that satisfies the constraints: If any individual a is an explicit member of a concept C in the original Abox, and f(a) is an explicit member of C in the summary Abox. If a≠b is explicitly in the original Abox, then f(a) ≠f(b) is explicitly in the summary Abox. If a relation R(a, b) exists in the original ABox, then R(f(a), f(b)) exists in the summary. If the summary is consistent, then the original Abox is consistent (converse is not true). Scalability via Summarization (ISWC 2006) Summary Original ABox Legend: C – Course P - Person M - Man W – Woman H - Hobby C2 C’ C’{C1, C2} C1 isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy P2 M2 M1 M’ P’ P1 likes likes likes TBox: Functional (isTaughtBy) Disjoint (Man, Woman) H2 H1 H’
Results: Scalability • UOBM benchmark data set (university data set) • PSHER has sub-linear scalability with # axioms • Exact query answering (computing exact pr for ground substitutions) is very expensive • State of art reasoner (Pellet) broke down on UOBM-1
Results: Response Time • PSHER performs well on threshold queries • 99.5% of answers were obtained in a few 10s of seconds • Further enhancements • PSHER is parallelizable
Assumes pair-wise numeric (dis)similarity metric between two entities: e.g., eBay recommendation, Netflix ratings Lack of support for conflicts spanning multiple entities: e.g., 3 Sources: S1, S2,S3 Ax1 = all men are mortal Ax2 = Socrates is a man Ax3 = Socrates is not mortal respectively Lack of support for uncertainty in information Traditional approaches to deriving trust from data
Can I trust this information? Courtesy: E.J. Wright and K. B. Laskey. Credibility Models for Multi-Source Fusion. In 9th International Conference on Information Fusion, 2006 • At the command and control center • PSHER detects inconsistency (justifications point to SIGINT Vs agent X) • SIGINT is deemed more trusted by the decision maker • Cautiously reduce trust in information source X • Decision maker weighs in the severity of a possible biological attack and performs “what if” analysis (what if X is compromised? What if sensing device (SIGINT) had a minor glitch?, which information should be considered and which information should be discarded?)
Overview • Encode information as axioms in a BDL KB • Detect inconsistencies and weighted justifications using possible world reasoning • Use justifications to assess trust in information sources • trust scoring mechanism • Weighted scheme based on prior trust (belief) in information sources and weight of justification
Security: robust to shilling robust to bad-mouthing Scalability: scale with the volume of information and the number of information sources security-scalability trade-off Cost of an exhaustive justification search Cost of a perfectly random uniform sample Characteristics of the trust model
Probabilistic Socrates’ example: Axiom1: p1, Axiom2: p2, Axiom3: p3 8 possible worlds (power set) Only one inconsistent world: {Axiom1, Axiom2, Axiom3} Probability measure of a possible world derived from the join prob. distribution of BN Pr({Axiom1, Axiom2}) = p1*p2*(1-p3) Degree of Unsatisfiability DU = p1*p2*p3 Trust Assessment: Degree of unsatisfiability
Trust value of a source S: Beta(α,β) α (reward) : function of non conflicting interesting axioms β (penalty): function of conflicting axioms Compute justifications of K = (A, T, BN) J (A,T) (J, BN) is consistent to the degree d’<1 For all J’ s.t. J’ ⊂ J, (J’, BN) consistent to the degree 1 How to assign penalty to sources involved in a justification? Probability measure, weight(J), of a justification J : DU((J,BN)) Penalty(J) proportional to weight(J) Penalty(J) distributed across sources contributing axioms to J inverse proportionally to their previous trust value Trust Assessment: Justification Weight
Security-Scalability Tradeoff • Impracticality of computing all justifications • Exhaustive exploration of Reiter Search Tree • Alternative approach: unbiased sampling • Malicious source cannot systematically hide conflicts • Retaining the first K node in the Reiter Search not a solution: • The probability π(vd) the node vd in the path < v0, v1, …., vd > to be selected is • π(vd) = ∏ (1/|vi|) • Tradeoff : select node vi with probability min(β/π(vi), 1) with β > 0
Decision Support System for Secure Information Flows Uncertainty: support inconsistent KB and reason over uncertain information Derived trust values from data Flexibility: e.g., sensitivity of tactical information decays with space, time and external events Situation-awareness: e.g., encodes need-to-know based access control policies Supports for explanations : support intelligent information downgrade and provenance data for “what if” analysis Summary
Scenario • Coalition: A & B • Geo location G={G1, …,G4} • A’s operations described in the table
Summarization effectiveness I – Instances after summarization RA – Role assertions after summarization
Filtering effectiveness I – Instances after filtering RA – Role assertions after filtering
What if summary is inconsistent? Either, Original ABox has a real inconsistency Or, ABox was consistent but the process of summarization introduced fake inconsistency in the summary Therefore, we follow a process of Refinement to check for real inconsistency Refinement = Selectively decompress portions of the summary Use Justifications for the inconsistency to select portion of summary to refine Justification = minimal set of assertions responsible for inconsistency Repeat process iteratively till refined summary is consistent or justification is “precise” Refinement (AAAI 2007)
Refinement: Resolving inconsistencies in a summary Original ABox Summary Summary is inconsistent C’{C1, C2, C3} C’ C2 C3 C1 isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy M’ P’ W’ M2 P2 M1 P1 P3 W1 likes likes likes TBox: Functional (isTaughtBy) Disjoint (Man, Woman) H1 H2 H’ After 1st Refinement After 2nd Refinement – Consistent Summary Cx’ Cy’ Cx’ Cy’ Cx’{C1, C2} Cy’{C3} isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy Px’{P1, P2} Py’{P3} M’ W’ Px’ M’ P’ W’ Legend: C – Course P - Person M - Man W – Woman H - Hobby Py’ likes likes Summary still inconsistent! H’ H’
Refinement: Solving Membership Query (AAAI 2007) Summary Original ABox Legend: C – Course P - Person M - Man W – Woman H - Hobby Summary is inconsistent C’{C1, C2, C3} C2 C3 C’ C1 isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy M2 P2 M1 P1 P3 W1 M’ P’ W’ likes likes likes TBox: Functional (isTaughtBy) Disjoint (Man, Woman) H2 H1 H’ After 1st Refinement After 2nd Refinement – Consistent Summary Cx’ Not(Q) Cy’ Not(Q) Cx’ Cy’ Cx’{C1, C2} Cy’{C3} isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy Px’{P1, P2} Py’{P3} M’ W’ Px’ Px’ M’ P’ W’ Py’ Not(Q) Not(Q) likes likes Summary still inconsistent! Solns: P1, P2 Sample Q: PeopleWithHobby? H’ H’ Not(Q)