370 likes | 916 Views
TEMPORAL ASSOCIATION RULE MINING. Prepared by : Ajit Padukone , Komal Kapoor. Outline. Association Rule Mining Applications Temporal Association Rule Mining Existing Techniques and their Limitations Problem Statement Proposed Approach Finding Maximal Valid Time Intervals
E N D
TEMPORAL ASSOCIATION RULE MINING Prepared by : AjitPadukone, KomalKapoor
Outline • Association Rule Mining • Applications • Temporal Association Rule Mining • Existing Techniques and their Limitations • Problem Statement • Proposed Approach • Finding Maximal Valid Time Intervals • Finding All Temporally Frequent Itemset • Future Work
Motivation Association Rule Mining {onion, potatoes} => {burgers} {bread, milk} => {butter} Transaction Data Frequent itemsets : {onion,potatoes,burgers}, {bread,milk,butter}
Applications • Retail Data Analysis • Web Usage Mining • Intrusion Detection • Bioinformatics
Spatial Association Rule Mining • Extract spatial predicates • Find all frequent patterns/predicates/sets • Generate strong rules E.g. {Contains(Port),crosses(WaterBody)} Source : VaniaBorgony, Enhancing Spatial Association Rule Mining in Geographic Databases, 2006 - lume.ufrgs.br
Temporal Association Rule Mining Chapter 10 of the reference book defines two types of temporal references: • Transaction Time • Valid Time Time attribute for association rules can also be defined in an analogous way.
Existing Technique – Apriori Algorithm • Apriori Algorithm finds the frequent item sets in a set of transaction which satisfy the minimum support threshold. • Support of the item set is defined as the proportion of transactions in the data set which contain the itemset. Algorithm: • Find all k-itemsets that have transaction support above minimum support (frequent k-itemsets) • Generate candidatek+1-itemsets using large k-itemsets • Prune the candidate k+1-itemsets to obtain frequent k+1-itemsets which have a transaction support above minimum support • If size(frequent k+1-itemsets) > 0, Repeat
Apriori Algorithm (contd.) Universal Set of Items = { A, B, C, D, E, F, G } Minimum support = 30 % (3 transactions) Step 3: 3 – itemsets. All 3 itemsetswith non-frequent 2-item sets as subsets have been pruned. Non-struck out ones are frequent. Step 1: 1 – itemsets. Non-struck out ones are frequent. Table 1: Transaction Database Step 2: 2 – itemsets. All 2 itemsetswith { D } or { E } as one of the subsets are pruned. Non-struck out ones are frequent.
Limitation • The Apriori Algorithm finds the frequent itemsets in the transaction database which satisfy the minimum support threshold for the entire transaction database. • What about those itemsets which are highly frequent over a limited period of time and not over the entire set of transactions? For e.g. – Turkey-> Pumpkin Pie (Halloween) • The itemsets extracted using the Apriori Algorithm, might not be valid for the entire period over which association rule mining has been performed.
Related Work • X. Chen and I. Petrounias, Mining Temporal Features in Association Rules, Proc. Third European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '99). • YingjiuLi, PengNing, X. Sean Wang, SushilJajodia, Discovering Calendar-based Temporal Association Rules , journal Data & Knowledge Engineering - Special issue: Temporal representation and reasoning archive Volume 44 Issue 2, February 2003. • Kang et. al., Discovering Flow Anomalies: A SWEET Approach, Eighth IEEE International Conference on Data Mining, 2008. ICDM
Temporal Association Rule Mining The book also defines ‘Time instants’ or ‘Time Intervals’ ‘chronon’ and ‘duration’ e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips} 12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers} 12thDec-2009, 12:05:44 {soap, shampoo, comb, toothbrush} 12th Dec-2009, 11thhr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}} 12th Dec-2009, 12thhr{{soap, shampoo, comb, toothbrush}}
Temporal Association Rule Mining The book also defines ‘Time instants’ or ‘Time Intervals’ ‘chronon’ and ‘duration’ e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips} 12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers} 12thDec-2009, 12:05:44 {soap, shampoo, comb, toothbrush} 12th Dec-2009, 11thhr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}} 12th Dec-2009, 12thhr{{soap, shampoo, comb, toothbrush}} Time Unit (chronon)
Problem Statement Definitions : • Support of an itemset I over interval (ti,tj) = frequency of I in the interval (ti,tj)/Total number of transaction during the interval (ti,tj) • Valid Time Interval for itemset I: the time interval during which the support of I over the interval is greater than a threshold (lmin_sup) • Maximal Valid Time Interval: A valid interval for an itemset I which not contained in any other valid time interval for I. • Temporally Frequent itemset: A itemset which has atleast one valid time interval associated with it. Lmin_sup = 0.5 Valid Time Intervals
Problem Statement Definitions : • Support of an itemset I over interval (ti,tj) = frequency of I in the interval (ti,tj)/Total number of transaction during the interval (ti,tj) • Valid Time Interval for itemset I: the time interval during which the support of I is greater than a threshold (lmin_sup) • Maximal Valid Time Interval: A valid interval for an itemset I which not contained in any other valid time interval for I. • Temporally Frequent itemset: A itemset which has atleast one valid time interval associated with it. Lmin_sup = 0.5 Maximal Valid Time Intervals
Problem Statement (contd.) Given: Transaction data D in the format (TU, {T1,T2,…,Tk}) Where TU-> Time Unit Ti-> Transaction Find: All temporally frequent itemsets along with their maximal valid time intervals.
Problem Statement (contd.) So now, along with finding the frequent itemsets we have to find the maximal valid time intervals for each frequent itemset. Complexity of the naive approach for finding maximal valid time intervals for each frequent itemset: O(n2) Where, n= |D|
Finding Maximal Valid Time Intervals Definition : • Valid/Supporting Time Unit for I: Time Unit during which the support of I is greater than lmin_supp. • Non-valid/Non-Supporting Time Unit for I: Time Unit during which the support of I is less than lmin_supp. .
Finding Maximal Valid Time Intervals Lemma 1: Each valid time interval TUi,TUj should contain atleast 1 valid/supporting time unit for I. Lemma 2: If an interval (TUi,TUj) is not valid for I then the interval (TUi,TUj+1) where TUj+1 is a non-valid time unit cannot be valid. Lemma 3: If an interval (TUi,TUj) is valid for I then the interval (TUi,TUj+1) where TUj+1is a valid time unit would be valid. Using Lemma 3, collapse continuous runs of supporting time units into 1 unit with the average density
Finding Maximal Valid Time Intervals (contd.) Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_sup Part 1: Find_maximal_valid_time_intervals(I,D,lmin_sup) Find STU={TUa1,TUa2,…,TUan} such than TUak is a supporting time unit for I For i = 1 to n For j=n to i+1 IF is_valid_time_interval(TUai,TUaj,D,lmin_sup) break; End End End Lemma 1,3
Finding Maximal Valid Time Intervals (contd.) Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_sup Part 2: start = TUai-1+1 , finish=TUaj+1-1 low = start, high = TUaj While low <= TUai and end < = finish IF is_valid_time_interval(low,high) high = high +1 Else low = low+1 End End Lemma 2
Finding Maximal Valid Time Intervals (contd.) Further iterations… Complexity: O(n’2 +n)
Finding All Temporally Frequent Itemset Given: Transaction data D <TUi,{T1, T2, …,Tn}>, lmin_sup, UI (Universal Itemset) C->Generate_1-item_candidate_sets(UI,D) Interval = (1, |D|) While (|C|>0) For each candidate set c in C max_valid_intervals-> find_maximal_valid_time_interval(c,D,lmin_sup) If |max_valid_intervals|>0 temp_freq_sets.add(<c,max_valid_intervals>) End End If |temp_freq_sets| > 0 C-> generate_new_candidate_sets(temp_freq_sets , D,lmin_sup) Else C-> null End End
Future Work • Find cyclic valid time intervals • Identify interesting maximal valid time intervals