220 likes | 344 Views
Exploratory Mining and Pruning Optimization of Constrained Associations Rules. Abstract. Standpoint of supporting human-centered discovery of Knowledge lack of user exploration and control lack of focus rigid notion of relationship Constrained association queries
E N D
Exploratory Mining and Pruning Optimization of Constrained Associations Rules Data Engineering Lab 성 유진
Abstract • Standpoint of supporting human-centered discovery of Knowledge • lack of user exploration and control • lack of focus • rigid notion of relationship • Constrained association queries • pruning using monotonicity, succinctness Data Engineering Lab 성 유진
Introduction • Problem1 (Lack of User Exploration and Control) • Mining Process => Black Box • (user can’t preempt and needs to wait for hours) • establish clear breakpoints to allow user feedback • Problem2 (Lack of Focus) • on which to focus the mining to find association between sets of items whose types do not overlap Data Engineering Lab 성 유진
associations from item sets whose total price is at least $1,000 • provide a rich interface for the user to express focus (CAQ) • Problem3 (Rigid notion of Relationship) • significance metrics : • separate criteria for selecting candidates for the antecedent and consequent: association from items to sets of types pepsi => snacks Data Engineering Lab 성 유진
Architecture • Phase 1 • user initially specifies CAQ • includes a set of constraints C • C is applicable to the antecedent and consequent • output: • pairs of candidates(Sa, Sc) • Sa, Sc have support over thresholds • user can add, delete, of modify the constraints as many times as desired Data Engineering Lab 성 유진
Phase 2 • significance metric • a threshold for the metric • whatever further conditions to be imposed ont the antecedent and consequent classical association mining - confidence (as significance metric) - confidence threshold - require ( SaSc) be frequent Data Engineering Lab 성 유진
Constrained Association Queries • CAQ • S Item : S is a set variable on the Item domain • {(S1, S2) |C}, C is a set of constraints on S1, S2 • frequent constraints freq(Si) • trans(TID, Itemset), iteminfo(Item, Type, Price) • S.price 100 : all items in S are of price less than of equal to $100 • {snacks, sodas} S.Type Data Engineering Lab 성 유진
CAQ Examples • {(S1, S2) | S1 Item & S2 Item & count(S1) = 1 & count(S2) = 1 & freq(S1) & freq(S2)} • S1.Type S2.Type and max(S1.Price) avg(S2.Price) • {(S1, S2) | agg1(S1.Price) 100& agg2(S2.Price 1000} • {(S1, S2) | S1.Type {Snacks} & S2.Type {beers} & max(S1.Price) min(S2.Price) • Sound/Complete • algorithm is sound if it only finds frequent sets that satisfy the given constraints • algorithm is complete if all frequent sets satisfying the given constraints are found Data Engineering Lab 성 유진
Goal • to push the constraints as deeply as possible inside the computation of frequent set • classical algorithm + test them for constraint satisfaction => too inefficient • sound/complete : anti-monotone, succinctness Data Engineering Lab 성 유진
Anti-Monotone Constraints • Find constraints which satisfy anti-monotone • prune away a significant num of candidates • Definition • A 1-var constraint C is anti-monotone iff for all sets S, S’: • S S’ & S satisfies C S’ satisfies C • Identify which constraints are anti-monotone • Fig3 • min(S) v (anti-monotone) , min(S) v (not ) Data Engineering Lab 성 유진
Succinct Constraints • once-and-for-all (before any iteration takes place) • not generate and test paradigm • how to • succinctness • member generating functions • definition • SATc(Item) : the set of item sets satisfying C , pruned space • C1 S.Price 100 , pruned space for C1 contains only item sets such that each item in the set has a price at least $100 • selection predicate, p Data Engineering Lab 성 유진
Example C1 S.Price 100 , let Item1 = price 100 (Item): • C1 is succinct because its pruned space SATc1(Item) is simply 2item1 C2 {snacks, sodas} S.Type : Let Item2, Item3 , Item4 be the sets type = ‘snacks’(Item),type = ‘sodas’(Item) , type ‘snacks’ type ‘sodas’ (Item) • C2 is succint SATC2(Item) can be expressed as 2item - 2item2 - 2item3 - 2item4 - 2item2 item4 - 2item3 item4 Data Engineering Lab 성 유진
Example • C1 S.Price 100, MGF = {X |X Item1 & C } • C2 {snacks, sodas} S.Type, MGF = {X1 X2 X3| • X1 Item2 & X1 & X2 Item3 & X2 & X3 Item4} Data Engineering Lab 성 유진
Algorithms • Algorithm Apriori+ • computes the frequent set => among frequent set, those which satisfy constraints become answer set • Algorithm Hybrid(m) • in case (C - Cfreq ) is more selective , apriori+ is inefficient • First check Cfreq for m iterations • to reduce the remaining I/O cost, it switches to checking (C- Cfreq) Data Engineering Lab 성 유진
CAP algorithm • 4 Cases succinct and Anti-monotone • Replace C1 in the Apriori Algorithm by C1c succinct but not anti-monotone Data Engineering Lab 성 유진
Anti-monotone but Non-succinct • Define Ck as in apriori algorithm, drop the candidates S if S fails C • constraint satisfaction is tested before counting is done neither • Induce any weaker constraint C’ from C, depending on whether C’ is anti-monotone and /or sucinct, use the above strategies • Once all frequent sets are generated, test them for satisfaction of C Data Engineering Lab 성 유진