1 / 20

Mining Confident Rules Without Support Requirements

Mining Confident Rules Without Support Requirements. Ke Wang Yu He D. W. Cheung F. Y. L. Chin. Association Rules. Given a table over A1,…,Ak, C Find all rules {Ai=ai} C=c of minimum confidence and minimum support Support: sup(Ai=ai)= #records containing Ai=ai

brina
Download Presentation

Mining Confident Rules Without Support Requirements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Confident Rules Without Support Requirements Ke Wang Yu He D. W. Cheung F. Y. L. Chin

  2. Association Rules • Given a table over A1,…,Ak, C • Find all rules {Ai=ai} C=c of minimum confidence and minimum support • Support: sup(Ai=ai)= #records containing Ai=ai • Confidence: sup(Ai=ai C=c)/sup(Ai=ai)

  3. Low Support Rules • Interesting rules unknown low support • High support rules low confidence • Often, patterns are fragmented into many low support rules Find all rules above the minimum confidence

  4. Confidence-based Pruning • Without minimum support, the classic support-based pruning inapplicable • Confident rules are neither downward closed nor upward closed • Need new strategies for pushing the confidence requirement.

  5. Confidence-based Pruning r1: Age=young Buy=yes r2: Age=young, Gender=M Buy=yes r3: Age=young, Gender=F Buy=yes Observation 1: if r1 is confident, so is one of r2 and r3 (specialized by Gender) Observation 2: if no specialized rule of r1 is confident, r1 can be pruned

  6. Confidence-based Pruning • Level-wise rule generation: Generate a candidate rule x c only if for every attribute A not in x c, some A-specialization of x c is confident.

  7. The algorithm Input: table T over A1,…,Am,C, and miniconf Output: all confident rules 1. k=m; 2. Rulek= all confident m-rules; 3. while k>1 and Rulek is not empty do 4. generate Candk-1 from Rulek; 5. compute the confidence of Candk-1 in one pass of T; 6. Rulek-1 = all confident candidates in Candk-1; 7. k--; 8. return all Rulek;

  8. Disk-based Implementation • Assumption: T, Rulek, Candk-1 are stored on disk. • We focus on • generating Candk-1 from Rulek and • computing the confidence for Candk-1. • Key: clustering T, Rulek, Candk-1 according to attributes Ai

  9. Clustering by Hash Partitioning • hi --- the hash function for attribute Ai, i=1,…, m • Table T is partitioned into T-buckets • Rulek is partitioned into R-buckets • Candk-1 is partitioned into C-buckets • A bucket-id is a sequence of hash values involved [b1,…bk]

  10. Pruning by Checking Bucket Ids • A tuple in a T-bucket supports a candidate in a C-bucket only if the T-bucket id matches the C-bucket id. • E.g., T-bucekt [A1.1,A2.1,A3.2] matches C-buckets [A1.1, A3.2] and [A1.1,A2.1] • A C-bucket [b1,…,bk] is nonempty only if for every other attribute A, some R-bucket [b1,…,bk,bA ] is nonempty

  11. Hypergraph Hk-1 • A vertex corresponds to a T-bucket • An edge corresponds to a C-bucket, which contains a vertex if and only if the C-bucket matches the T-bucket • Hk-1 is in memory.

  12. The Optimal Blocking • Assume that we can read several T-buckets each time, called a T-block. • For each T-block, we need to access the matching C-buckets from disk. • We want the optimal blocking of T-blocks so that the access of C-buckets is minimized. • This problem is NP-hard.

  13. Heuristics • Heuristic I: The more T-buckets match a C-bucket, the higher priority such T-buckets should be in the next T-block. • Heuristic II: The more C-buckets matches a T-bucket, the higher priority this T-bucket should be in the next T-block.

  14. C1 C2 C3 C4 T1 T2 T3 T4 T5 • (T1T2T3)(T4T5): C1,C2,C4 read twice, C3 read once • Heursitic I: (T1T2T5)(T3T4): C1,C2,C4 read once, C3 read twice • Heuristic II: (T1T3T5)(T2T4): C1,C4 read twice, C2,C3 read once.

  15. Experiments • Synthetic datasets from “An interval classifier for database mining application”, VLDB 92. • 9 attributes, 1 class. • Default data size = 100K

  16. Conclusion • The experiments show that the proposed confidence-based pruning is effective.

More Related