220 likes | 235 Views
Explore FP-Stream model for mining frequent patterns, utilize tilted-time window, logarithmic model, and pruning techniques for efficient pattern management.
E N D
Mining Frequent Patterns inData Streams at Multiple TimeGranularities Chris Giannella, Jiawei Han, Jian Pei, Xifeng Yan, Philip S. Yu Advisor:Jia-Ling Koh Speaker:Chun-Wei Hsieh 03/12/2004
Problem and Analysis • Infrequent items can become frequent later on and hence cannot be ignored. • I is frequent if its support is no less than min_support . • I is sub-frequent if its support is less than but no less than maximum support error . • Otherwise, I is infrequent.
FP-Stream • FP-stream includes two major components: (1) tilted-time window (2) pattern-tree
Tilted-Time Window • People are often interested in recent changes at a fine granularity, but long term changes at a coarse granularity.
Tilted-Time Window (cont.) • One can register window-based count for each frequent pattern for each tilted-time window.
FP-Stream • Usually frequent patterns do not change dramatically over time. • To save space, it uses only one frequent pattern tree.
Logarithmic Tilted-time Window • In the natural tilted-time window, at most 59 (4+24+31) tilted windows need to be maintained for a period of one month. • According to logarithmic tilted-time window model, with one year of data and the nest precision at quarter, it need units of time instead of • units.
Logarithmic Tilted-Time Window Updating • Assume that the stream of transactions is broken up into fixed sized batches • The frequencies for each logarithmic tilted-time window are :
Tail Pruning • Let be the tilted-time windows,where is the oldest. • is the window size of . • Drop tail sequences when the following condition holds,
Type I Pruning • If I is found in B but is not in the FP-stream structure, no superset is in the structure. • Hence, if , then none of the supersets need be examined.
Type II Pruning • If all of I’s tilted-time window table entries are pruned (hence I is dropped), then any superset will also be dropped.
Algorithm • INPUT: (1) An FP-stream structure (2) A min_support threshold (3) An error rate (4) an incoming batch, , of transactions. • OUTPUT: The updated FP-stream structure.
Algorithm (cont.) 1. Initialize the FP-tree to empty. 2. Sort each incoming transaction t , according to f_list, and then insert it into the FP-tree without pruning any items. 3. When all the transactions in are accumulated, update the FP-stream as follows.
Algorithm (cont.) • Mine itemsets out of the FP-tree using FP- growth algorithm. For each mined itemset, I, check if I is in the FP-stream structure. If I is in the structure, do • If I is not in the structure and if ,then insert I into the structure. Otherwise,FP-growth stops mining supersets of I (Type I Pruning).
Algorithm (cont.) (b) Scan the FP-stream structure (depth-first search). For each itemset I encountered, check if I was updated when B was mined.If not ,then insert 0 into I’s tilted-time window table.Prune I’s table by tail pruning. Once the search reaches a leaf, if the leaf has an empty tilted-time window table, then drop the leaf. If there are any siblings of the leaf, continue the search with them. If there were no siblings, then return to the parent and continue the search with its siblings. If all of the children of the parent were dropped, then the parent becomes a leaf node and might be dropped.