Mining the Most Interesting Rules

Mining the Most Interesting Rules Roberto J. Bayardo Jr., Rakesh Agrawal Presented by: Mohamed G. Elfeky

Introduction • Algorithms for mining rules: • Constraint-based • Heuristic (Predictive rules) • Interestingness-metric • Several interestingness metrics: • confidence, support, laplace, gain, conviction

Generic Problem Statement The rule: A  C The input is: (U, D, , C, N) • U is a set of conditions for the rule antecedent. • D is a data-set. •  is a total order on rules. • C is a condition for the rule consequent. • N is a set of constraints on rules.

Optimized Rule Mining • Find a set A1 U such that: • A1satisfiesN, •   A2  U: A2 satisfies N  A1 < A2. • Any rule A  C whose A  A1 is optimal. • Generally, this is NP-Hard problem.

Partial-Order Optimized Rule Mining • Partial order vs. Total order • Some rules may be incomparable. • Several equivalence classes for optimal rules. • Find a set O P(U) such that: •  A O: A is optimal, • For each equivalence class that has a rule that is optimal, exactly one member of this class is within O.

Monotonicity • f(x) is said to be monotone in x if: x1 < x2 f(x1)  f(x2) • f(x) is said to be anti-monotone in x if: x1 < x2 f(x1)  f(x2)

Optimality • SC-Optimality • PC-Optimality • Definition • Theoretical Implications • Practical Implications

SC-Optimality:Definition The partial order sc • For rules r1 and r2: r1 <scr2 if and only if: • sup(r1)  sup(r2)  conf(r1) < conf(r2), or • sup(r1) < sup(r2)  conf(r1)  conf(r2). • Also, r1 =scr2 if and only if: • sup(r1) = sup(r2)  conf(r1) = conf(r2).

SC-Optimality:Definition (cont.) The partial order s c • For rules r1 and r2: r1 <s  cr2 if and only if: • sup(r1)  sup(r2)  conf(r1) > conf(r2), or • sup(r1) < sup(r2)  conf(r1)  conf(r2). • Also, r1 =s  cr2 if and only if: • sup(r1) = sup(r2)  conf(r1) = conf(r2).

SC-Optimality:Definition (cont.) sc-optimal rule sc-optimal rule non-optimal rule confidence No optimal rules fall outside the borders support

SC-Optimality:Theoretical Implications • A total order t is implied by scif: • r1 scr2  r1 tr2 ^ r1 =scr2  r1 =tr2 • r is optimal for scr is optimal for t. • t defined by f(r) is implied by scif: • f(r) is monotone in support, and • f(r) is monotone in confidence.

SC-Optimality:Theoretical Implications (cont.) • Interestingness metrics: • laplace(r) = • gain(r) = sup(r) (1 – /conf(r)) • conviction(r) = /(1 – conf(r)) sup(r) + 1 sup(r)/conf(r) + k

PC-Optimality:Definition The partial order pc • For rules r1 and r2: r1 <pcr2 if and only if: • pop(r1)  pop(r2)  conf(r1) < conf(r2), or • pop(r1)  pop(r2)  conf(r1)  conf(r2). • Also, r1 =pcr2 if and only if: • pop(r1) = pop(r2)  conf(r1) = conf(r2).

PC-Optimality:Definition (cont.) • pop(A  C) is the set of records from D that satisfy both A and C. • |pop(r)| = sup(r)  |D| • Analogously, the definition of p  c

PC-Optimality:Theoretical Implications • scis implied by pc and s  cby p  c. • pc results in more incomparable rule pairs. • pc-optimal rule set will contain more rules than sc-optimal rule set.

Optimality:Practical Implications • Two algorithms are proposed, one for each type of optimality. • Each algorithm produces a set of optimal rules without specifying the interestingness metrics. • The produced set is guaranteed to identify the most interesting rules according to several metrics.

Optimality:Practical Implications (cont.) • These algorithms facilitate interactivity: • Examine the optimal rules according to some metric without additional querying or mining. • Find the most interesting rule that characterizes any given subset of the population.

Mining the Most Interesting Rules

Mining the Most Interesting Rules

Presentation Transcript

Data Mining Association Rules

Mining Association Rules

Mining Association Rules

DATA MINING - ASSOCIATION RULES-

Mining Association Rules

The most interesting facts about Russia

Historically Interesting Voting Rules: Electing the Doge

Mining Causal Association Rules

Most Interesting Final Projects

The most interesting places In Warsaw

Data Mining Association Rules

Association Rules Mining

The Most Interesting Extinct Turtles

World’s most interesting facts

Incremental Mining Association Rules

“The most interesting states of the USA”

The most interesting buildings

The most interesting places in Prague

Mining Generalized Association Rules

Mining Negative Association Rules

The most interesting Homeschool Diploma:

Introduction to Data Mining Mining Association Rules