200 likes | 325 Views
Pattern Directed Mining Of Sequence Data. By Valery Guralnik, Duminda Wijesekera, Jaideep Srivastava Presenter : Jyothsna R Nayak. contents. Introduction Sequential Patterns Data Structure and Algorithm Experimental Evaluation SP Tree Optimization Conclusions References. Introduction.
E N D
Pattern Directed Mining Of Sequence Data By Valery Guralnik, Duminda Wijesekera, Jaideep Srivastava Presenter : Jyothsna R Nayak
contents • Introduction • Sequential Patterns • Data Structure and Algorithm • Experimental Evaluation • SP Tree Optimization • Conclusions • References
Introduction • Sequence data • event has an associated time of occurrence • Episode is a collection of events • Frequent Episodes : Episodes occurring with a frequency above a certain threshold
Steps involved in mining of frequent episodes • Present a language for specifying episodes of interest • Describe a data structure: Sequential Pattern Tree • Mining algorithm to generate frequent episodes • Optimize SP Tree
Sequential Patterns • Pattern language • A = {A1,A2,….Am} • D1, D2,…,Dm = Domains • e over A is a (m + 2) tuple(a1, a2,..,am, tbeg, tend)
Example of Events in the Stock Market Domain Activenes Event ID Date Comp Type Comp Name Movement Volatility e1 Low 01/02/91 Computer Microsoft Down High e2 Medium 01/03/91 Computer Microsoft Up High e3 High 01/02/91 Computer Low Microsoft NoMovmt e4 01/03/91 Computer Down High High Microsoft
Definitions • Ordering Constraint • Serial Occurrence e -> f , e.tend < f.tbegin • Parrallel Occurrence (e || f) • Attribute constraint • Selection Constraint e.type = ‘computer’ • Join Constraint e.name = f.name
Event specification • Partial specifications e[(e.type = ‘computer’ v e.type = ‘electronic’) ^ e.movement_direction = ‘down’] • comparing some characteristics e[e.movement_direction = ‘up’] -> [e.name = f.name] f[f. movement_direction = ‘down’]
Data Structure • Leaf node represents an event • An interior node represents an ordering constraint • If is an ordering constraint labeling some interior node, and if e and f are the left and right children of that node then e f is a sequential pattern. • Associated with each node is a table of matching events • Attached to each node is a Boolean expression tree representing attribute constraints . .
SP Tree Matching episodes Matching events Matching events = e f = e.name f.name = e.mvmt up f.mvmt down SP Tree for e[e.mvmt = ‘up’] -> [e.name = f.name]f[f.mvmt = ‘down’] User specified pattern
Bottom-up algorithm Intialize queue Q to empty for (each leaf 1 in T) do begin generate events from S that match constraints of 1 if(the parent p of 1 is not ready in Q) then put p in Q end While (Q is not empty) do begin Remove node n from Q Generate_Events(n) if(for n’s parent p another child was processed) then put p in Q end
Generate-events Algorithm • for(each episode e from left child l of n) do begin for (each episode f from right child r of n) do begin if(node n is serial) then if(e.tend >= f.tbegin) then continue if(events in e and f match the join constraint) then form new episode g from events from e and f end end
Experimental evaluation • Results • window size variation • data set size • number of event specifications • attribute constraints
Time in Secs Window Size in Days Minimum Frequency = 0.8
Time in Secs Number of Event specifications Minimum Frequency = 0.8 Window size = 11
Time in Secs Number of constraints Minimum Frequency = 0.8 window size = 5
Time in Secs Number of Events in Data sets Minimum Frequency = 0.7 Window size = 5
SP Tree Optimization • If two event nodes represent the same event, then only one of the nodes can be used. • If two ordering nodes have the same join constraints, and they both have the left and right children representing the same events then one such node is sufficient.
Conclusions Approach is • Robust • Flexible • Efficient • Complex pattern • Good performance
References • Discovering frequent episodes in sequences by Mannila. H., Toivonen, H and Verkamo • Agarwal, R., and Srikanth “Mining sequential patterns” • Mannila. H., Toivonen, H “ Discovering generalised episodes using minimal occurences • Agarwal, R., and Srikanth”Mining generalised association rules