560 likes | 726 Views
Chase Methods based on Knowledge Discovery. Agnieszka Dardzinska & Zbigniew W. Ras agnadar@wp.pl & ras@uncc.edu. Algorithm Chase. GIVEN: Incomplete Information System ( IIS ) Constraints (functional dependencies,..) which IIS satisfies.
E N D
Chase Methods based on Knowledge Discovery Agnieszka Dardzinska & Zbigniew W. Ras agnadar@wp.pl & ras@uncc.edu
Algorithm Chase GIVEN: Incomplete Information System (IIS) • Constraints (functional dependencies,..) which IIS satisfies [ Dept Chair, Chair Dept Faculty-Name, … Dept(x1) =Dept(x2) ]
Tableau System for IIS– information system with null values replaced by variables
Variables in Tableaux System • distinguished variables, one for each attribute (if b is an attribute of interest, then vb is the corresponding distinguished variable) • nondistinguished variables (there are countably many of them: n1, n2, n3, ….)
Functional Dependencies: [Department → Chair] [Department *Chair → Faculty Name]
Algorithm Chase Input: tableaux system S and set of functional dependencies F Output: tableaux system CHASEF(S) Begin S1:=S; while there are t1, t2 S1 and (B b) F such that t1[B]= t2[B] and t1[b] < t2[b] do change all the occurrences of the value t2[b] in S1 to t1[b] CHASEF(S):=S1 End
Algorithm Chase Input: tableaux system S and set of functional dependencies F Output: tableaux system CHASEF(S) Begin S1:=S; while there are t1, t2 S1 and (B b) F such that t1[B]= t2[B] and t1[b] < t2[b] do change all the occurrences of the value t2[b] in S1 to t1[b] CHASEF(S):=S1 End The algorithm always terminates if applied to a finite tableaux system. If one execution of the algorithm generates a tableaux system satisfying F, then every execution of the algorithm generates the same tableaux system.
Chase supported by rules extracted from IIS (Chase 1) • Chase 1 identifies all incomplete attributes (their values are called concepts) in IS . 2. Main Algorithm - Extraction of rules from IS describing these concepts, - Null values in IS are replaced by values suggested by these rules. 3. These two steps are repeated till fixpoint is reached.
Example (Chase1) X = {x1, x2, x3, x4, x5, x6, x7, x8, x9, x10} A = {b, c, d, e, f, g} Attribute b e2b1(support 2), c1f1 b1(support 2), g2 b2(support 2), c3 b3(support 1), c2 b2(support 2), g3d2 b2(support 1), e3d3 b3(support 1), f2d2 b2(support 1).
Example (Chase1) Attribute b Two null values in S: b(x6), b(x8) b(x6): e2b1(support 2), c1f1 b1(support 2), g2 b2(support 2), c3 b3(support 1), c2 b2(support 2), g3d2 b2(support 1), e3d3 b3(support 1), f2d2 b2(support 1).
Example (Chase1) Attribute b Two null values in S: b(x6), b(x8) b(x6): e2b1(support 2), c1f1 b1(support 2), g2 b2(support 2), c3 b3(support 1), c2 b2(support 2), g3d2 b2(support 1), e3d3 b3(support 1), f2d2 b2(support 1).
Example (Chase1) Attribute b Two null values in S: b(x6), b(x8) b(x8): e2b1(support 2), c1f1 b1(support 2), g2 b2(support 2), c3 b3(support 1), c2 b2(support 2), g3d2 b2(support 1), e3d3 b3(support 1), f2d2 b2(support 1). b(x6) = b2
Example (Chase1) Two null values in S: c(x7), c(x8). c(x7): b1c1(support 2), e2 c1(support 1), f4 c1(support 1), g1 c1(support 2), b2 d2 c2(support 1), b2e1 c2(support 1), b2f2 c2(support 1), b2g3 c2(support 1), d2e1 c2(support 1), d2g3 c2(support 1). b(x6) = b2
Example (Chase1) Two null values in S: c(x7), c(x8). c(x7): b1c1(support 2), e2 c1(support 1), f4 c1(support 1), g1 c1(support 2), b2 d2 c2(support 1), b2e1 c2(support 1), b2f2 c2(support 1), b2g3 c2(support 1), d2e1 c2(support 1), d2g3 c2(support 1). b(x6) = b2
Example (Chase1) Two null values in S: c(x7), c(x8). c(x8): b1c1(support 2), e2 c1(support 1), f4 c1(support 1), g1 c1(support 2), b2 d2 c2(support 1), b2e1 c2(support 1), b2f2 c2(support 1), b2g3 c2(support 1), d2e1 c2(support 1), d2g3 c2(support 1). b(x6) = b2 , c(x7) = c1
Algorithm Chase1(S, In(A), L(D)) Input:System S=(X, A, V) Set of incomplete attributes In(A)={a1, a2, …, ak} Set of rules L(D) Output:System Chase1(S) begin j:=1; whilej ≤ kdobegin Sj:=S for allvVajdo while there is xX and rule (t v)L(D) such that xNSj(t) and card(aj(x))≠1 begin a(x):=v; end j:=j+1 end S:={Sj:1 ≤ j ≤ k}, Chase1 (S, In(A), L(D)) end
A2 A5 A6 A7 A1 A3 A4 query q3 2 4 2 25 22 22 2 q3 2 3 8 22 28 23 2 q3 22 2 2 25 12 30 3 q3 2 2 32 27 22 2 14 q1 3 4 4 2 8 4 0 q1 4 9 3 8 5 1 3 q1 14 4 3 12 4 2 3 q1 1 3 17 4 14 4 5 q2 4 2 1 4 13 14 13 q2 4 15 8 13 13 1 3 q2 4 12 15 17 13 2 3 q2 14 27 16 3 4 4 13 A1- “the no. of different attributes used in a query" A2- “the percent of null values in a queried IS" A3- “the no. of objects returned by QAS when IS-complete" A4- “the no. of objects returned by QAS when IS-incomplete" (optimistic interpretation) A5- “the no. of objects returned by QAS based on rule-based chase algorithm" (pessimistic interpretation) A6- “the no. of bad objectsretrieved" A7- “the no. of passes of rule-based chase algorithm" ZOO Database
Rules Discovery from partiallyIncomplete Information Systems
Data (Incomplete) Information SystemS = ( X, A, V ) X - finite set of objects, A - finite set of attributes, - set of their values. Assumption 1. For any , 2. For any
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Example
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID for Extracting Rules from partially Incomplete Information System Goal: Describe e in terms of {a,b,c,d}
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID for Extracting Rules from partially Incomplete Information System Goal: Describe e in terms of {a,b,c,d}
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID Goal: Describe e in terms of {a,b,c,d} For the values of the decision attribute we have:
X a b c d e x1 x2 • Check the relationship “ ” between values of classification attributes {a,b,c,d} and values of decision attribute e x3 x4 x5 x6 x7 x8 Algorithm ERID Goal: Describe e in terms of {a,b,c,d}.
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID Goal: Describe e in terms of {a,b,c,d} Let , . We say that: support iff and confidence of the rule are above some threshold values.
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID Goal: Describe e in terms of {a,b,c,d} Let , . We say that: support iff and confidence of the rule are above some threshold values. How to define support and confidence of a rule ?
Definition of Support and Confidence (by example) To define support and confidence of the rule a1 e3 we compute: Support of the rule: Support of the term a1: Confidence of the rule:
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Extracting Rules from partially Incomplete Information System (Algorithm ERID(λ1, λ2)) Goal: Describe e in terms of {a,b,c,d} Thresholds (provided by user): Minimal support (λ1 = 1) Minimal confidence (λ2 = ½) - marked negative - marked negative - marked positive
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) but but but but but but but but but
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) but but but but but but but but but They all are not marked
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) and and
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) and and They all are marked positive.
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) and and They all are marked positive. They all are marked negative.
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) The algorithm continues for terms of length 3, 4, … till all of them have either positive or negative marks. Rules are automatically constructed from relations marked positive.
Algorithm Chase 2 (for Partially IIS)
is defined for any , Algorithm Chase 2 • partially incomplete information system of type λ, if S is incomplete and the following three conditions hold:
Algorithm Chase 2 S1, S2 - partially incomplete, both of type λ and both classifying the same sets of objects (from X) using the same sets of attributes (A) Let and satisfies containment relation Ψ (or Ψ(S1)=S2) if: The pair (S1, S2) We also denote that fact by
System S1 System S2 X a b c d e X a b c d e x1 x1 x2 x2 x3 x3 x4 x4 x5 x5 x6 x6 x7 x7 c2 x8 x8
Assumptions: • - information system of type λ • - set of all pairwise independent rules extracted by ERID from S • NS(t) - standard interpretation of term t in S, meaning that: , for any where for any , we have: • In(A) = {a1, … , ak} - incomplete attributes in S
for all do begin if and is a maximal subset of rules from L(D) such that then if then begin end end pj:= pj +nj; end if /containment relation holds between aj(x), [bj(x)/pj]/ then Algorithm Chase 2(S, In(A), L(D)) .................. ..................
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 ERID(λ1, λ2) λ1=1, λ2=0.5 Incomplete Information System S of type λ = 0.3
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 ERID(λ1, λ2) λ1=1, λ2=0.5 Algorithm Chase 2 will try to replace by enew(x1) = {(e1, ), (e2, ), (e3, )}. We will show that Ψ(e(x1)) = enew(x1) (the value e(x1) will be changed by Chase 2). Incomplete Information System S of type λ = 0.3
X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 For x1: we have: Because the confidence assigned to e3 is below the threshold λ, then only two values remain: (e1, 0.48), (e2, 0.97). The value of attribute e assigned to x1 is: {(e1, 0.33), (e2, 0.67)}. Incomplete Information System S of type λ = 0.3
Distributed Chase Algorithm (Chase 3)
Let: - distributed autonomous information systems - information system for any (I - set of sites) - knowledge-base at site • set of (k, i)-rules (constructed at site k and sent to site i)
Strategy for constructing knowledge-base and Algorithm Chase 3 Notation: qS=[a, b, c : d, e] - request by S for definitions of a, b, c with additional information that d, e are complete attributes in S.
Global Ontology S2 S1 r1 r2 qS qS=[a, c, d : b] S KB KBS
Global Ontology S2 S1 r1 r2 qS qS=[a, c, d : b] S r1 r2 KBS
Assumption: Di - granularity level of values of attributes used in rules from Di may differ from the granularity level of values of attribute used in descriptions of objects in Chase 3 algorithm to be applicable to Sihas to be based on rules from Di satisfying the following two conditions: . • attribute value used in the decision part of a rule has the granularity level either equal to or finer than the granularity level of the corresponding attribute in Si . • the granularity level of any attribute used in the classification part of a rule is either equal or softer than the granularity level of the corresponding attribute in Si