170 likes | 449 Views
CFI-Stream: Mining Closed Frequent Itemsets in Data Streams. Nan Jiang,Le Gruenwald SIGKDD ’ 06 報告者:林靜怡 2006/10/04. Introduction. mining Closed frequent itemsets computes and maintains closed itemsets online and incrementally perform the closure checking
E N D
CFI-Stream: Mining Closed Frequent Itemsets in DataStreams Nan Jiang,Le Gruenwald SIGKDD’06 報告者:林靜怡 2006/10/04
Introduction • mining Closed frequent itemsets • computes and maintains closed itemsets online and incrementally • perform the closure checking • output the current closed frequent itemsets in real time based on users’ specified thresholds
Definition • D:data stream • I = { , , …, } :a set of n elements, called items • T: subsets of all the transactions • X: subsets of all the items appearing in a data stream
Definition • C(X):the smallest closed set containing X • Definition 1 An itemset X is said to be closed if and only if C(X)= f(g(X)) = f•g(X) = X
Algorithm • CFI-Stream algorithm • DIrect Update (DIU) tree • perform the closure checking online over a data stream sliding window • Conditions need to check for closed itemsets • check when performing addition and deletion operations on the DIU tree
DIU tree • maintain the current closed itemsets • k levels in the DIU tree, each level i stores the closed i-itemsets
DIU tree • Each node in the DIU tree stores • a closed itemset • its current support information • links to its parent and children nodes
Add a Transaction to the DIU Tree T1:original transaction set t:new arrived transaction • Conditions to Check for Closed Itemsets (1) t is in the T1, if the largest itemset X it contains is not currently in the DIU tree ->check for all X’s subsets Y, which are in T1
(2) when t is not in T1, for each its subset Y, if Y is in T1, we need to check
2 1 C 3 3 CD CD 1 ABC • C,D 2 A,B 3 A,B,C 4 A,B,C 2 1 1 AB 2
Delete a Transaction in DIU Tree • Conditions to Check for Closed Itemsets • When the number of the transactions with same itemset of X is equal to zero, if Y is a subset of X, and Y is a closed itemset in the original transaction set
2 2 • C,D 2 A,B 3 A,B,C 4 A,B,C 3 C 3 1 CD AB 2 ABC
Experiment • Synthetic datasets T10.I6.D100K and T5.I4.D100K