190 likes | 204 Views
Mining General Temporal Association Rules for Items with Different Exhibition. Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee, Proc. of the 2002 IEEE international Conference on Data Mining(ICDM’02) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung. Introduction.
E N D
Mining General Temporal Association Rules for Items with Different Exhibition Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee, Proc. of the 2002 IEEE international Conference on Data Mining(ICDM’02) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung
Introduction • In this paper, explore a new model of mining general temporal association rules from large database where the exhibition periods of the items are allowed to be different from one to another. (see next page)
Introduction (Cont.) • What’s wrong on conventional mining algorithm applied in this database? • For example: • Min_support = 30%, min_conf= 75% • By conventional mining, only {A}, {B}, {C} and {F} are frequent itemsets • No association rule discovered But some rules do exist in this database!!
Introduction (Cont.) • What’s the problem of conventional mining algorithm? • It doesn’t take the individual exhibition periods of items into consideration.
itemset earliest-exhibition-end-time Latest-exhibition-start time Introduction (Cont.) • For allowing to have different exhibition periods, now define three basic definition: • Maximal common exhibition period (MCP) • MCP(X) = [p, q] • For example: (in Figure1) MCP(BC) = [2,3]
Introduction (Cont.) • Relative support • For example: (in Figure1) • Confidence • For example: (in Figure1)
Introduction (Cont.) • Based on the definition above, the frequent general temporal association rules in this database are:
Introduction (Cont.) • In this model, the “downward closure” property is no longer valid. • For example: (In Figure1) itemset BCD is frequent in [2,2], but BC, BD and CD are “not” all frequent in their corresponding MCP!! ex: BC’s relative support is only 25% (< 30%)
Problem Description • Maximal temporal itemset • For example: • BCD2,2 ( ) • BD2,2 ( ) • BC2,2 ( X ) • Temporal sub-itemset of the maximal temporal itemset • For example: • BCD2,2 is a maximal temporal itemset BD2,2 , BC2,2 and CD2,2 are the temporal sub-itemset of BCD2,2
Problem Description (Cont.) • Maximal temporal itemset is frequent • For example: (XMCP(X) is a maximal TI) If supp(XMCP(X)) >= min_supp, thenXMCP(X) is a frequent • Property: All temporal sub-itemsets of a frequent maximal temporal itemset are frequent • General temporal association rule • It will be frequent iff
Mining General Temporal Association Rule ─ SPF Algorithm • SPF consists of “two” major procedures: • Segmentation (ProcSG) • Progressively Filtering (ProcPF) • First, SPF divide the database into partitions according to the time granularity imposed. • Second, SPF employs ProcSG • Third, SPF utilizes ProcPF • Then, generate all candidate k-itemsets from (k-1)-itemset transform to TIs, generate SIs • Finally, scan database to determine all frequent TIs and SIs
SPF Algorithm ─ ProcSG • Segment the database into sub-database that items in each will have either the common starting time or the common ending time • db1,6 db1,3, db4,4 and db5,6
SPF Algorithm ─ ProcPF • After the entire database is segmented by ProcSG, ProcPF is to progressivly filter candidate 2-itemsets from one partition to another in each sub-database
An Illustrative Example (SPF) • Illustrative Example: Figure1 • Min_supp = 30%, min_conf=75% • Use ProcSG: database sub-databases • db1,4 db1,2 and db3,4 (two sub-segments)
An Illustrative Example (SPF) • Use ProcPF: progressively filter the candidate 2-itemsets
An Illustrative Example (SPF) • After the 1st database scan, • C2= {AB, BC, BD, CD, CF, EF} • Generate C3, C3={BCD} • Transform to TI and generate SI • After the 2nd database scan, • Frequent T1={AB2,4, BD2,2, CF1,3, EF3,3 BCD2,2}
Experiment • Data • |D| = the number of transactions • |T| = average size in each transaction • |N| = the number of different items • |L| = the number of potential frequent itemsets • Algorithms to compare • SPF • AprioriIP