CanTree: a tree structure for efficient incremental mining of frequent patterns

CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM’05 報告者：林靜怡 2006/11/15

Introduction • Many existing incremental mining algorithms are Apriori-based • not easily adoptable to FP-tree based frequent-pattern mining

Related Work • The FELINE Algorithm with the CATS Tree • The AFPIM Algorithm

The FELINE Algorithm with the CATS Tree • CATS tree (Compressed and Arranged Transaction Sequences tree) • Allows frequent-pattern mining without the generation of candidate itemsets • requires one database scan to build the tree

CATS Tree • New transactions are added at the root level • At each level, items of the new transaction are compared with children (or descendant) nodes. • If the same items exist in both 1.the transaction is merged with the node at the highest frequency level 2.The remainder of the transaction is then added to the merged nodes • repeated recursively until all common items are found.

CATS Tree • Any remaining items of the transaction are added as a new branch to the last merged node. • The frequency of a node is lower than or equal to the frequencies of its ancestors • If the frequency of a node becomes higher than its ancestors, then it has to swap with the ancestors

Weaknesses • tree construction could be computationally expensive • checks existing tree paths one-by-one until a mergeable one is found • extra cost is required for the swapping or merging of nodes.

The AFPIM Algorithm • Adjusting FP-tree for Incremental Mining • all the “frequent” items are arranged in descending order of their global frequency • when the ordering is changed, items in the tree need to be adjusted • When previously infrequent item becomes “frequent” in the updated database, it needs to rescan and build a new FP-tree.

preMinsup:35% minsup:55% 4 x 0.35 = 1.4

Weaknesses • the amount of computation spent on swapping, merging, and splitting tree nodes • requirement for an additional mining parameter preMinsup • finding an appropriate value for this parameter is not easy

Weaknesses • when the database is updated, item frequencies may have changed. This results in changes in the ordering. • Both FELINE and AFPIM algorithms need lots of swapping, merging, and splitting of tree nodes

Canonical-Order Tree (CanTree) • requires one database scan • items are arranged according to some canonical order • in lexicographic order or alphabeticalorder • some specific order depending on the item properties

Property • Property 1 The ordering of items is unaffected by the changes in frequency caused by incremental updates. • Property 2 The frequency of a node in the CanTree is at least as high as the sum of frequencies of its children.

CanTree • Transactions can be easily added to the CanTree without any extensive searches for mergeable paths • mine frequent patterns from the tree in a fashion similar to FP-growth(a divide-and-conquer approach).

g: eg,deg,cdeg,bcdeg,abcdeg e: de,cde,bcde,abcde,ce, bce,abce,de,bde,abde f: ef,def,bdef,abdef d: cd,bcd,abcd,bd,abd c: bc,abc b: ab

Discussion • CanTrees can be used for incremental constrained mining • Efficiency and Memory Issues • On the surface, it appears CanTree may take a large amount of memory. • CanTree may not be as compact as the CATS tree,but it significantly reduce computation and time • assume we have enough main memory space

Experiment • Database:generated by the program developed at IBM Almaden Research Center • consists of 1M records with an average transaction length of 10 items and a domain of 1000 items • time-sharing environment in a 1 GHz machine

Experiment

Conclusion • provide the user with a simple, but powerful, tree structure for efficient FP-tree based incremental mining • CanTree can be easily maintained • Can used for efficient incremental constrained mining

CanTree: a tree structure for efficient incremental mining of frequent patterns

CanTree: a tree structure for efficient incremental mining of frequent patterns

Presentation Transcript

Event correlation and data mining for event logs

Opinion Mining A Short Tutorial

Searching via Traversals Searching a Binary Search Tree (BST) Binary Search on a Sorted Array Data Structure Conversi

Chapter 2 Data Mining

Mining Billion-Node Graphs - Patterns and Algorithms

Data Mining

Data Mining Tools

Spatial and Temporal Data Mining

The Telecommunications Revolution

Chapter 10 Binary Trees

Web Mining : A Bird ’ s Eye View

Creational Patterns

Mining Billion-node Graphs: Patterns, Generators and Tools

Mining Complex Types of Data

Creational Patterns

Mining Billion-node Graphs: Patterns, Generators and Tools

Data Mining: Classification and Prediction

DATA MINING LECTURE 4

CSE 634 Data Mining Concepts and Techniques Association Rule Mining

Data Mining using Fractals and Power laws