250 likes | 357 Views
Solving Failing Queries *). Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu. Failing Query Problems. Problem 1 . Given S(A) with hierarchical attributes and query q(B) that returns an empty answer, how can one relax query’s constraints
E N D
Solving Failing Queries*) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu
Failing Query Problems Problem 1. Given S(A) with hierarchical attributes and query q(B) that returns an empty answer, how can one relax query’s constraints so that it returns a non-empty set of tuples. Assumption: S(A) – information system based on attributes from A, q(B) – query based on attributes from B. Query q(B) is not local for system S(A), if [B A]. Problem 2. Given S(A), which represents one of the sites of a distributed autonomous information system, and not local query q(B) submitted to S(A), where A B , how to modifyq(B) so it can be answered.
Failing Query Problem 1 age salary young middle-aged old low medium high 18 … 29 30 … 60 61 … 80 10k…40k 50k 60k 70k 80k…100k Example of a query: (age, 18) (salary, 40k) Possible relaxations: (age, young) (salary, 40k) (age, 18) (salary, low) (age, young) (salary, low) Problem 1. [Cooperative Query Answering] [Papers by: Minker, Chu, Gaasterland, Demolombe, Muslea] Preference for relaxation: [1 - age, 2 - salary]
Failing Query Problem 1 q = a[1,2]*c1 submitted to S1 fails (no objects in S1 satisfying q) Solution: q can be generalized by QAS to q1 = a1*c1, which is matching objects x3 and x5 in S1. Question: Which of these two objects (x3 or x5) is closer to q? Information System S1 Attribute a is hierarchical of a structure in Lisp-like notation a(a1( a[1,1], a[1,2]), a2( a[2,1],…))
Failing Query Problem 1 q = a[1,2]*c1 submitted to S1 fails (no objects in S1 satisfying q) Question: Which of these two objects (x3 or x5) is closer to q? Let k m. Then, the distance: δa(a[i(1), i(2),…, i(k)], a[j(1), j(2),…, j(m)]) = if [i(1) = j(1) … i(n) = j(n) [ n = k m i(n+1) j(n+1)]], then 1/2n else 0 δ(xi,xj) = δa+ δb+ δc+ δe Information System S1 δ(q, x3) = ½+1+1+1= 3½ δ(q, x5) = ½+1+1+1 = 3½ Result: both are OK
Failing Query Problem 1 [Muslea, KDD’04] On-line, query-guided algorithm for relaxing failing DNF queries Example. A = {Price, CPU, Display, Weight}. Failing query q(A) = [Price$2000][CPU2.5GHz][Display17’’][Weight3lbs]. Select randomly chosen small subset of target DB to discover implicit relationships between values of attributes used in query. Discovered Rules: r1= [[Price$2900][Display18’’][Weight4lbs] [CPU2.5GHz]]. r2 = [[Price$3500] [CPU2.5GHz]]. r3 = ……. Nearest-neighbor technique is used to identify which rule is most similar to failing query. Assume that r1 is such a rule. Relaxed query: [Price$2900][CPU2.5GHz][Display17’’][Weight4lbs].
Failing Query Problem 2 Problem 2. [Collaborative Query Answering] [Papers: Ras, Zemankova, Stolfo, Maitan, Zytkow, Dardzinska] Example of a non-local query Database: Flights(airline; departure time; arrival time; departure airport; arrival airport). select * from Flights where airline = "Delta” departure time = "morning" departure airport = "Charlotte" aircraft = "Boeing"
Query Processing in Collaborative Systems System S1 System S Find definition of e1 in S1: b1→e1; c1→e1; a[1,2]→e1 q = a1 b1e1 submitted to S fails, because attribute e is not in S (clearly b[1,1] is also b1). q = a1b1e1a1b1 (b1+c1+a[1,2]) = = a1*b1+a1b1c1+a1b1a[1,2] = a1*b1. Objects y3, y4 satisfy the query q.
Query Processing in Collaborative Systems System S System S1 q = a[1,2]b[1,1] submitted to S1 fails because of the granularity of b. Find definition b[1,1] in S: a1c2→b[1,1]. q = a[1,2]b[1,1] a[1,2]a1c2 = a[1,2]c2. Objects x1, x2 satisfy the query q.
Failing Query Problem 2 Query Processing in Incomplete IS S = (X,A,V) is a partially incomplete information system of type , if the following two conditions hold: X is a set of objects, A is a set of attributes, Va is a set of values of attribute a, where a A, and V = {Va : a A}, • for any x X, a A, • if aS(x) is defined, then [aS(x) Va or aS(x)={(vi,pi): 1 i m}], • if [aS(x)={(vi,pi): 1 i m}], then [ i=1…m pi = 1 and (i)(pi )] • Also, if [aS(x) = v, then the value v has the same meaning as {(v,1)}]
Incomplete Information System X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Queries: q1(a,b) = a1* b1 q2(a,b) = a1 + b1 J(a1) = {(x1,1/3), (x3,1),(x5,2/3)} J(b1) = {(x1,2/3),(x2,1/3),(x4,1/2), (x5,1),(x7,1/4)} What about J(a1* b1) = J(a1) J(b1), J(a1 + b1) = J(a1) J(b1) ?
Interpretations for and Assume that: J(a1) = {(xi, pi): i K} and J(b1) = {(xi, qi): i K}. Interpretation T0 J(a1) 0 J(b1) as {(xi, S1(pi, qi): i K}, where S1(pi, qi) = [if max(pi, qi) =1, then min(pi, qi), else 0]. J(a1) 0 J(b1) as {(xi, S2(pi, qi): i K}, where S2(pi, qi) = [if min(pi, qi)=0, then max(pi, qi), else 1]. Interpretation T1 J(a1) 1 J(b1) as {(xi, max {0, pi+qi-1}): i K} and J(a1) 1 J(b1) as {(xi, min{1, pi + qi}) : i K}. Interpretation T2 J(a1) 2 J(b1) = {(xi, [piqi]/[2 - (pi + qi – piqi)]): i K} and J(a1) 2 J(b1) = {(xi, [pi + qi]/[1 + piqi]) : i K}.
Interpretations for and Interpretation T3 J(a1) 3 J(b1) = {(xi, piqi): i K} J(a1) 3 J(b1) = {(xi, pi+qi - piqi) : i K} Interpretation T4 J(a1) 4 J(b1) = {(xi, [piqi]/[pi + qi – piqi]): i K} J(a1) 4 J(b1) = {(xi, [pi + qi - 2piqi]/[1 – piqi]) : i K} Fuzzy Interpretation T5 J(a1) 5 J(b1) = {(xi, min {pi, qi}: i K} J(a1) 5 J(b1) = {(xi, max { pi, qi}) : i K} Another possible interpretationT J(a1) 3 J(b1) = {(xi, piqi): i K} J(a1) 5 J(b1) = {(xi, max { pi, qi}) : i K} Interpretations T0, T5, T satisfy property: a (b c) = (a b) (a c)
Assume: S1, S2 partially incomplete IS of type λ The same objects are stored in both systems The same attributes are used to describe objects aS1(x) ={(a1i, p1i): 1 ≤ m1}, aS2(x) ={(a2i, p2i): 1 ≤ m2} Failing Query Problem 2 Incomplete IS [S2 is finer than S1]
S2 is finer than S1 if: (xX)(aA)[card(aS1(x)) ≥ card(aS2(x))] (xX)(aA) [card(aS1(x)) = card(aS2(x))] [i≠j|p2i - p2j| > i≠j|p1i - p1j|] Incomplete Information System
X a b c d e X a b c d e x1 x1 x2 x2 x3 x3 x4 x4 x5 x5 x6 x6 x7 x7 x8 x8 S2 finer than S1 S1 S2
Failing Queries in Collaborative IS • Assume: • Query q = q(B) is submitted to S =(X, A, V), where: • B is a set of all attributes used in q • AB≠ • Attributes in B\(AB) are foreign for S • Two information systems can collaborate if they agree on the ontology of some of their common attributes • The granularity of values of attributes used in a query qmay differ from the granularity of values of the same attributes in S
Failing Queries in Collaborative IS Query q(B) can be processed at site S by discovering definitions of values of attributes from B\(AB) at some of the remote sites for S. With each certain rule discovered at a remote site, a number of additional rules can be also discovered.
Failing Query Problem 2 Example age ( child( ≤ 17), young (18, … , 29), middle-aged (30, … , 60), old (61, … , 80), senile ( ≥ 81) ) salary ( low(0, … , 40K), medium (50K, … , 70K), high (80K, … , 100K), very-high ( >100K) ) ( age, young ) ( salary, 40K ) ( age, young ) ( salary, low ) ( age, N ) ( salary, 40K ) ( age, N ) ( salary, low )
Failing Queries in CollaborativeIS S = (X, A, V) – client site A = {a, b, d, …}, c A Va={a1, a2, a3}, Vb={b1,1, b1,2, b1,3, b2,1, b2,2, b2,3, b3,1, b3,2, b3,3} Vd={d1, d2, d3} Semantics of hierarchical attributes {a, b, c, d} used by S and systems collaborating with S: • a(a1[a1,1, a1,2, a1,3], a2[a2,1, a2,2, a2,3], a3[a3,1, a3,2, a3,3]) • b(b1[b1,1, b1,2, b1,3], b2[b2,1, b2,2, b2,3], b3[b3,1, b3,2, b3,3]) • c(c1 [c1,1, c1,2, c1,3], c2[c2,1, c2,2, c2,3], c3[c3,1, c3,2, c3,3]) • d(d1[d1,1, d1,2, d1,3], d2[d2,1, d2,2, d2,3], d3[d3,1, d3,2, d3,3])
S: a[i], b[i,j], d[i] Assume: Query q = ai,1* bi* ci,3* di is submitted to S. q = ai,1* [bi,1+ bi,2+bi,3] *ci,3* di= [ai,1* bi,1*ci,3* di] + [ai,1* bi,2 *ci,3* di] + [ai,1* bi,3 *ci,3* di] How to solve queryq ? 1. Generalize ai,1 to ai and ci,3 to c. The query has new form: q1 = ai* [bi,1+ bi,2+bi,3]* di 2.a. Objects matching q1 may satisfy q 2.b. Generalizations decrease the chance that retrieved objects will match query q. Check what values of attributes a and c are implied by di* bi,1,di* bi,2, or di* bi,3at remote sites for S, and if any of these rules have high confidence and support.
S: a[i], b[i,j], d[i] q =ai,1* [bi,1+ bi,2+bi,3] *ci,3* di= [ai,1* bi,1 *ci,3* di ]+[ai,1* bi,2 *ci,3* di]+ [ai,1* bi,3 *ci,3* di] How to solve queryq ? 1. Generalize ai,1 to ai and ci,3 to c. The query has new form: q1 = ai* bi* di = [ai* bi,1* di]+ [ai* bi,2* di ] + [ai* bi,3* di ] 2. Check what values of attributes a and c are implied by di* bi,1,di* bi,2, or di* bi,3at remote sites for S, and if any of these rules have high confidence and support. Assume that: di bi,1ai,2, di bi,2 ci,3are certain rules, extracted at a remote site for S. We get q[ai,1* bi,2*di] + [ai,1*bi,3*ci,3 * di] local non-local
Failing Query Problem 2 q=q(a[3,1,3,2], b1, c2) Possible generalization: q1=q1(a3, b1, c2) Rules extracted at remote sites which define any of the values below a[3] will help in solving q. Rules describing values not belonging to {a[3,1], a[3,1,3], a[3,1,3,2]} are used to reduce the size of the query (to remove some conjuncts).
Questions? Thank You