Mining Confident Rules Without Support Requirements

Mining Confident Rules Without Support Requirements Ke Wang Yu He D. W. Cheung F. Y. L. Chin

Association Rules • Given a table over A1,…,Ak, C • Find all rules {Ai=ai} C=c of minimum confidence and minimum support • Support: sup(Ai=ai)= #records containing Ai=ai • Confidence: sup(Ai=ai C=c)/sup(Ai=ai)

Low Support Rules • Interesting rules unknown low support • High support rules low confidence • Often, patterns are fragmented into many low support rules Find all rules above the minimum confidence

Confidence-based Pruning • Without minimum support, the classic support-based pruning inapplicable • Confident rules are neither downward closed nor upward closed • Need new strategies for pushing the confidence requirement.

Confidence-based Pruning r1: Age=young Buy=yes r2: Age=young, Gender=M Buy=yes r3: Age=young, Gender=F Buy=yes Observation 1: if r1 is confident, so is one of r2 and r3 (specialized by Gender) Observation 2: if no specialized rule of r1 is confident, r1 can be pruned

Confidence-based Pruning • Level-wise rule generation: Generate a candidate rule x c only if for every attribute A not in x c, some A-specialization of x c is confident.

The algorithm Input: table T over A1,…,Am,C, and miniconf Output: all confident rules 1. k=m; 2. Rulek= all confident m-rules; 3. while k>1 and Rulek is not empty do 4. generate Candk-1 from Rulek; 5. compute the confidence of Candk-1 in one pass of T; 6. Rulek-1 = all confident candidates in Candk-1; 7. k--; 8. return all Rulek;

Disk-based Implementation • Assumption: T, Rulek, Candk-1 are stored on disk. • We focus on • generating Candk-1 from Rulek and • computing the confidence for Candk-1. • Key: clustering T, Rulek, Candk-1 according to attributes Ai

Clustering by Hash Partitioning • hi --- the hash function for attribute Ai, i=1,…, m • Table T is partitioned into T-buckets • Rulek is partitioned into R-buckets • Candk-1 is partitioned into C-buckets • A bucket-id is a sequence of hash values involved [b1,…bk]

Pruning by Checking Bucket Ids • A tuple in a T-bucket supports a candidate in a C-bucket only if the T-bucket id matches the C-bucket id. • E.g., T-bucekt [A1.1,A2.1,A3.2] matches C-buckets [A1.1, A3.2] and [A1.1,A2.1] • A C-bucket [b1,…,bk] is nonempty only if for every other attribute A, some R-bucket [b1,…,bk,bA ] is nonempty

Hypergraph Hk-1 • A vertex corresponds to a T-bucket • An edge corresponds to a C-bucket, which contains a vertex if and only if the C-bucket matches the T-bucket • Hk-1 is in memory.

The Optimal Blocking • Assume that we can read several T-buckets each time, called a T-block. • For each T-block, we need to access the matching C-buckets from disk. • We want the optimal blocking of T-blocks so that the access of C-buckets is minimized. • This problem is NP-hard.

Heuristics • Heuristic I: The more T-buckets match a C-bucket, the higher priority such T-buckets should be in the next T-block. • Heuristic II: The more C-buckets matches a T-bucket, the higher priority this T-bucket should be in the next T-block.

C1 C2 C3 C4 T1 T2 T3 T4 T5 • (T1T2T3)(T4T5): C1,C2,C4 read twice, C3 read once • Heursitic I: (T1T2T5)(T3T4): C1,C2,C4 read once, C3 read twice • Heuristic II: (T1T3T5)(T2T4): C1,C4 read twice, C2,C3 read once.

Experiments • Synthetic datasets from “An interval classifier for database mining application”, VLDB 92. • 9 attributes, 1 class. • Default data size = 100K

Conclusion • The experiments show that the proposed confidence-based pruning is effective.

Mining Confident Rules Without Support Requirements

Mining Confident Rules Without Support Requirements

Presentation Transcript

Data Mining Association Rules

Mining Association Rules

Mining Association Rules

DATA MINING - ASSOCIATION RULES-

Business Rules and Requirements

Mining Association Rules

Mining Causal Association Rules

confident

Confident

Confident

Confident Children, Confident Communities

CONFIDENT

Data Mining Association Rules

Association Rules Mining

Confident

Incremental Mining Association Rules

Mining Non-Derivable Association Rules

Mining Generalized Association Rules

Mining Negative Association Rules

Mining support services

Chapter 2: Mining Association Rules

Introduction to Data Mining Mining Association Rules