190 likes | 281 Views
The Concept of Maximal Frequent Itemsets. NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15. Outline. Introduction Max-Miner MAFIA GenMax Conclusion. Introduction(1/2). Interesting datasets with long patterns Questionnaire results Transactions database
E N D
The Concept of Maximal Frequent Itemsets NCU CSIE Database LaboratoryKuo-Yu Huang 2002-04-15 NCU CSIE DBLab
Outline • Introduction • Max-Miner • MAFIA • GenMax • Conclusion NCU CSIE DBLab
Introduction(1/2) • Interesting datasets with long patterns • Questionnaire results • Transactions database • Contain many frequently occurring items • A wide average record length • Apriori-like algorithms are inadequate • Enumerates every single frequent itemsets NCU CSIE DBLab
Introduction(2/2) • Maximal Frequent Itemsets • If it has no superset that is frequent. • eq • Items: a, b, c, d, e • Frequent Itemset: {a, b, c} • {a, b, c, d}, {a, b, c, e}, {a, b, c, d, e} are not Frequent Itemset. • Maximal Frequent Itemsets: {a, b, c} NCU CSIE DBLab
Max-Miner(1/4) • Efficiently mining long patterns from databases • R. J. Bayardo • ACM SIGMOD’98 • Max-Miner • Abandons a bottom-up traversal • Attempts to “look-ahead” • Identify a long frequent itemset, prune all its subsets. NCU CSIE DBLab
Max-Miner(2/4) • Set-enumeration tree • Breadth-first search NCU CSIE DBLab
Max-Miner(3/4) • Candidate group • Head: h(g) • Itemset enumerated by the node. • Tail: t(g) • An ordered set and contains all items not in h(g) • eg:Node {1} • h{g}: {1} • t{g}: {2, 3, 4} NCU CSIE DBLab
Max-Miner(4/4) • Support counting • h(g), h(g)∪t{g}, h(g) ∪{i} for all • If h(g)∪t{g} is frequent, then any itemset enumerated by a sub-node will also be frequent but no maximal. • If h(g)∪{i} is infrequent, then any head of a sub-node that contains item I will also be infrequent. NCU CSIE DBLab
MAFIA(1/4) • MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. • D. Burdick, M. Calimlim, and J. Gehrke. • ICDE’01 • MAFIA • Integrates a depth-first traversal of the itmset lattice with eiffective pruning mechanisms NCU CSIE DBLab
MAFIA(2/4) NCU CSIE DBLab
MAFIA(3/4) • HUTMFI • Check Head Union Tail is in MFI • Stop searching and return • PEP • newNode = C ∪ i • Check newNode.support == C.support • Move I from C.tail to C.head • FHUT • newNode = C ∪ I • Whether I is the leftmost child in the tail NCU CSIE DBLab
MAFIA(4/4) NCU CSIE DBLab
GenMax(1/2) • Efficiently Mining Maximal Frequent Itemsets • Karam Gouda and Mohammed J. Zaki. • ICDM’01 • GenMax • A backtrack search based algorithm for mining maximal frequent itemsets. NCU CSIE DBLab
GenMax(2/2) • Superset checking techniques • Do superset check only for Il+1∪Pl+1 • Using check_status flag • Local maximal frequent itemsets • Reordering the combine set • Diffsets propagation NCU CSIE DBLab
Conclusion(1/4) • Type I: • normal MFI distribution with not too long maximal patterns. • Type II: • Left-skewed distribution with longer pattern • Type III: • Exponential decay distribution with short maximal pattern NCU CSIE DBLab
Conclusion(2/4) NCU CSIE DBLab
Conclusion(3/4) NCU CSIE DBLab
Conclusion(4/4) NCU CSIE DBLab