160 likes | 346 Views
An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns. Zhigang Zheng , Yanchang Zhao, Ziye Zuo , and Longbing Cao PAKDD 2010. Outline. Motivation Problem Definition GA-Based Negative Sequential Pattern Mining Algorithm Experiments Conclusion. Motivation.
E N D
An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns ZhigangZheng, Yanchang Zhao, ZiyeZuo, and Longbing Cao PAKDD 2010
Outline • Motivation • Problem Definition • GA-Based Negative Sequential Pattern Mining Algorithm • Experiments • Conclusion
Motivation • Negative sequential patterns focus on negative relationships between itemsets. • Absent items are taken into consideration • Drawback • The search space for mining negative patterns is much bigger than that for positive ones. • Huge amounts of negative candidates will be generated. • Ex. 10 distinct 1-item positive frequent items 103 3-item positive candidates, but there will be 203 3-item negative candidates.
(Cont.) • Based on Genetic Algorithm, a generation pass good genes on to a new generation by crossover and mutation without generating candidates • using dynamic fitness function and pruning method to improve performance.
Problem Definition • A sequence is an ordered list of elements • A element ei consists of one or more items. • Ex. <ab (c,d) f> consists of 4 elements and (c,d) is an element which includes two items. • A positive sequence s =<a b c d> • A negative sequence s = <a b ¬c d> or <a b¬ (c,d) f> • A sequence <a b f> is a max. positive subsequence of sequences <a b ¬ c f> and <a b¬ (c,d) f>
(Cont.) • Negative sequential pattern • s_sup ≥ min_sup • Items in the same element should be all positive or all negative. Ex. <a (a, ¬b) c> is not allowed • Two or more continuous negative elements are not accepted. • For each negative item in a negative pattern, its positive item is required to be frequent. • Negative Matching
GA-Based Negative Sequential Pattern Mining Algorithm • Population and Selection • Crossover and Mutation • Pruning • Algo. Flow
Population and Selection • Initial Population: all 1-item frequent positive and negative patterns. • Selecting top K individuals with high dynamic fitness • In order to evaluate the individuals and decide which are the best for the next generation, a fitness function is used.
Crossover and Mutation • Crossover • Parents with different lengths are allowed to crossover with each other. • Crossover may happen at different positions to get sequential patterns with varied lengths. • Ex.
(Cont.) • Mutation • Mutation is helpful in avoiding contraction of the population to a special frequent pattern. • Ex. <b ¬ca> <bd¬ e>
Pruning • Ex. c=<e1 e2 e3 … en> c’ =<eiej… ek> is the max. positive subsequence of c and 0<i≤j ≤k≤n If c’ is not frequent, c must be infrequent and should be pruned.
Conclusion • In the crossover process, how to decide which position can be crossovered?