CONTOUR: an efficient algorithm for discovering discriminating subsequences

CONTOUR: an efficient algorithm for discovering discriminating subsequences Jianyong Wang, Yuzhou Zhang, Lizhu Zhou, George Karypis, Charu C. Aggarwal DMKD, Vol. 18, No. 1, 2009, pp. 1-29. Presenter : Wei-Shen Tai 2009/3/11

Outline • Introduction • Problem formulation • Efficiently mining summarization subsequences • Summarization subsequence based clustering • Empirical results • Conclusions • Comments

Motivation • Make frequent sequence mining more efficient • It is very time consuming to mine the complete set of frequent subsequences for large sequence databases. • A subset of useful frequent subsequences is to apply any existing frequent sequence mining algorithm.

Objective • Effective search space pruning methods • Finding the summarization subsequence to represent original input sequence.

Problem formulation CABAC→BAC • Subsequence • If sequence Sαis contained in sequence Sβ, Sαis called a subsequence of Sβ. • Absolute support of sequence • The number of input sequences in SDB that contain Sα, denoted by supSDB(Sα). • Summarization subsequences • A set of representative subsequences as a concise summarization of the input sequences, • Internal similarity of micro-cluster Cλ

Efficiently mining summarization subsequences • Frequent subsequence enumeration • For each prefix, the mining algorithm builds its projected database, and computes the set of locally frequent events. min_sup = 2 SDB| AA ={C,A}, but they cannot be used to extend the prefix AA. (AAC, AAA )

Closed sequence-based optimization • BackScan search space pruning • Semi-maximum period • A subsequence between the first instance and the last instance of subsequence P. (for example, prefix BB) • First, and second to m semi-maximum period • An event A appears in each of the first semi-maximum periods of BB. It means ABB and BB exist simultaneously, ABB is the longer one. ABCBA ABCBA ABCBA →ABCB →ABCB ABCB ACBB ABCB

Unpromising projected sequence pruning • Current Frequent Covering Subsequence • An input sequence Si that has the largest weight and was discovered so far. • Trivial projected sequence • Short projected sequences may not contain sufficient number of events to generate any summarization subsequence. • For example, prefix p=C:5 • SDB|p = {PS1 =ABAC, PS3 = B, PS4 = BAC, PS5 = BBA, PS6 = BC}, • CFCS1 =ABA:3, CFCS3 =ABCB:2, CFCS4 =BAC:2, CFCS5 =ABA:3, and CFCS6 =ABCB:2.

Further discussions • Event weight assignment • It is similar to TFIDF concept • Multiple summarization subsequence mining • An input sequence may support multiple summarization subsequences.

Summarization subsequence based clustering • Micro-cluster generation • Input sequences with the same summarization subsequence are grouped together. • Macro-cluster creation • Agglomerative hierarchical clustering paradigm to create K macro-clusters. ABA ABCB CBAC

Empiricalresults

Conclusions • CONTOUR • A set of summarization subsequences is a concise representation of the original sequence database. • It preserves much structural information, and can be used to efficiently cluster the input sequences with a high clustering quality.

Comments • Advantage • This method provides more concise representation of original sequences than feature selection methods. • Those summarization subsequences can be efficiently adopted in most of conventional sequence mining methods. • Drawback • In equation 1 and 2, the internal similarity is computed under one summarization subsequence. Whereas, the multiple summarization subsequences may not be suitable for these equations. • Application • Sequence pattern mining and clustering.

CONTOUR: an efficient algorithm for discovering discriminating subsequences

CONTOUR: an efficient algorithm for discovering discriminating subsequences

Presentation Transcript

Structural Geology (3443) Lab 2 – Contour Maps

IMAGE RECONSTRUCTION

Discovering Alabama Chapter 2

The Contemporary Contour (Art After World War II and Contemporary Art)

Contour Lines

Hungarian Algorithm

Final Presentation

Design and Analysis of Algorithm Decrease and Conquer Algorithm

Hidden Markov Models

Iso-Contouring and Level-Sets

Discovering Nottingham

DISCOVERING VOICE: Voice Lessons for Middle and High School

IS 240: Discovering the Atom

Discovering My Shape For Ministry

Fundamental Frequency Contour Synthesis for Turkish Text to Speech

235015, 305450 Artificial Intelligence ปัญญาประดิษฐ์ 3(2-2-5)

Algorithm Analysis

IMAGE RECONSTRUCTION