200 likes | 370 Views
Mining Sequential Patterns with Constraints in Large Database. Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung. Introduction. In past studies, two problems remain:
E N D
Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung
Introduction • In past studies, two problems remain: • Many practical constraints are not covered • There lack a systematic method to push various constraints into the mining process • In this paper: • Develop a framework—Prefix-growth, is built based on a prefix-monotone property • The constraints can be effectively and efficiently pushed deep into sequential pattern mining under this new framework
Categories of constraints • Item constraints • For example: • Length constraint • The number of transactions or occurrences of items… • For example:
Categories of constraints (Cont.) • Super-pattern constraint where P is a given set of patterns • For example: • Aggregate constraint • Aggregate function: sum, avg, max, min,etc • For example: We like sequentialpatterns where average price of all the items in each pattern is over $100
Categories of constraints (Cont.) • Regular expression constraints • Constraints specified as a regular expression • For example: • Duration constraints • Gap constraints • For example: Find purchasing patterns such that “the gap between each consecutive purchases is less than 1 month”
Characterization of constraints • Anti-monotonic • If a sequence a satisfies C implies that every non-empty subsequence of a also satisfies C • For example:dur(a) < 3 • Monotonic • If a sequence a satisfies CMimplies that every super-sequence of a also satisfies CM • For example:len(a) >= 10, super-pattern constraints • Succinct constraint • For example: item-constraint
Prefix-Monotone Property • Prefix anti-monotonic for each sequence a satisfying the constraint, so does every prefix of a • Prefix monotonic for each sequence asatisfying the constraint, so does every sequence having aas a prefix. • A constraint is calledPrefix-monotoneif it is prefix-monotonic or prefix monotonic.
Theorem • All the commonly used constraint discussed above, except for g_sum and average, have prefix-monotone property
Push Prefix-Monotone Constraints into Sequential Pattern Mining • Regular expression • Min_sup = 2
are pruned!! Push Prefix-Monotone Constraints into Sequential Pattern Mining (Cont.) • Mining step: • find length-1 sequential and remove irrelevant sequence • Patterns <a>, <b>, <c>, <d>, <e> are identified as length-1 patterns, infrequent item <f> is removed • S_id = 10 is removed fail this constraint • divide the set of sequential patterns into subsets without overlap • prefix<a>, prefix<b>, prefix<c>, prefix<d>, prefix<e>
Push Prefix-Monotone Constraints into Sequential Pattern Mining (Cont.) • construct <a>-projected database and mine it • SDB|<a>={<(_b)(bc)dd>, <(_e)(abc)(dd)>,<ddcb>} • Locally frequent items and satisfy the constraint: • prefix <ab>, prefix<ac>, prefix<ad> • recursive mining • To mining patterns with prefix <ab>、<ac>、<ad>, and form the projected database • Final pattern outputted • {<a(bc)d>, <add>}
Handling Touch aggregate constraint • Constraint: • Min_sup = 2 • Item i called a small item if its value i.value <= 25, otherwise, it is called a big item
Experimental results • Compare the efficiency of mining sequential patterns without constraint
Experimental results (Cont.) • Compare the efficiency of mining sequential patterns with constraint • Capability of GSP and prefix-growth on pushing anti-monotone constraint (dur(a) <= t)
Experimental results (Cont.) • Experimental results on mining with regular expression constraint
Scalability of prefix-growth with Constraint avg(a) ≤ v Number of projected databases in prefix-growth with Constraint avg(a) ≤ v Experimental results (Cont.)
Experimental results (Cont.) • Scalability of prefix-growth w.r.t. support threshold
Experimental results (Cont.) • Scalability of prefix-growth w.r.t. database size
Conclusion • Prefix-monotone property covers many commonly used constraints • Experiment results and performance study show that prefix-growth is efficient and scalable in mining large databases