160 likes | 274 Views
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding window. Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz, ICDM 2004. Adviser: Jia-Ling Koh Speaker: Shu-Ning Shin Date: 2005.5.6. Introduction.
E N D
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding window Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz, ICDM 2004. Adviser: Jia-Ling Koh Speaker: Shu-Ning Shin Date: 2005.5.6
Introduction • Algorithm Moment: Mime closed frequent itemsets in the most N transactions in data stream. • Data structure, closed enumeration tree (CET), maintain: • Closed frequent itemsets, • Boundary between closed frequent itemsets and the rest.
Problem • Lexicographic order: • Closed frequent itemset: none of its supersets has the same support. • Items Σ={A, B, C, D}, window size N=4, minimum support s = ½.
CET (1) • Four types of itemsets node: • Infrequent: • Infrequent gateway node, dashed circle — D. • Frequent but not closed: • Unpromising gateway node, dashed rectangle — AC. • Intermediate node — A. • Closed: • Closed node, solid rectangle — ABC.
CET (2) • Property 1: if nI is an infrequent gateway node, then any node nJ where represents an infrequent itemset. • Property 2: if nI is an unpromising gateway node, then nI is not closed, and none of nI’s descendents is closed. • Property 3: if nI is an intermediate node, then nI is not closed and nI has closed descendents.
Moment: Build CET (1) • Node nI has information : • itemset I, node type, support, tid_sum • Hash table: • store all closed frequent itemsets • check if nI is an unpromising gateway node, if exit a nJ where • hash on the (support, tid_sum) of nI
A B C D 0 0 0 0 Moment: Build CET (3) • Items Σ={A, B, C, D}, Explore(n{i}), for each i in Σ. ψ
AD 0 0 CD Moment: Add CET (2) • Adding a transaction tid 5: • Call Addition(nψ, t5, D, minsup) ψ 4 C 2 D 4 A F={D} AD 1 2 CD 3 AC 5 A, C, D
Moment: Delete CET (2) • Deleting a transaction tid 1: F={D} 1 3 C D
Moment: Update CET (3) • Deleting a transaction tid 2: 3 2 B A 2 AB
Experiment (1) • Dataset: T20I4D100K • Window Size N = 100000
Experiment (3) • Real Datase: BMS-WebView-1 • Items: 497, transactions: 59602 • Window Size N = 50000