190 likes | 449 Views
Mining Relationships Among Interval-based Events for Classification. Dhaval Patel 、 Wynne Hsu Mong 、 Li Lee SIGMOD 08. Outline. Introduction Preliminaries Augment hierarchical representation Interval-based event mining Interval-based event classifier Experiment Conclusion.
E N D
Mining Relationships Among Interval-based Events for Classification Dhaval Patel、Wynne Hsu Mong、Li Lee SIGMOD 08
Outline. • Introduction • Preliminaries • Augment hierarchical representation • Interval-based event mining • Interval-based event classifier • Experiment • Conclusion
Introduction. • Predicts categorical class labels • Classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data • A Two-Step Process Model construction Model usage
age? <=30 overcast >40 31..40 student? credit rating? yes excellent fair no yes no yes no yes Introduction.(cont)
Preliminaries. • E = (type, start, end) • EL = {E1, E2,….., En} • The length of EL, given by |EL| is the number of events in the list. • Composite event E = (Ei R Ej) • The start time of E is given by min{ Ei.start, Ej.start } end time is max{Ei.end, Ej.end }
Before Meet Overlap Start Finish Contain Equal Augment hierarchical representation.
Augment hierarchical representation(cont.) • ((A overlap B) overlap C) • 1. 2. • (A Overlap[0,0,0,1,0] B) Overlap[0,0,0,1,0] C • C = contain count、F = finish by count M = meet count、O=overlap count S = start count
Augment hierarchical representation(cont.) • The linear ordering of is {{A+}{B+}{C+}{A−}{B−}{D+}{D−}{C−}}
Interval-based event mining. • Candidate generation • Theorem. A (k+1)-pattern is a candidate pattern if it is generated from a frequent k-pattern and a 2-pattern where the 2-pattern occurs in at least k − 1 frequent k-patterns. • Dominant event Dominant event in the pattern P if it occurs in P and has the latest end time among all the events in P.
Interval-based event mining(cont.) • Support count
IEClassifier. • Class labels Ci 1≦i ≦c, c is the number of class label • The information gain: • p(TP) is probability of pattern TP to occur in datasets. • Whose information gain values are below a predefined info_gain threshold are removed.
IEClassifier.(cont) • Let PatternMatchI be the set of discriminating patterns that are contained in I
Experiment.(cont) • 對於一群資料而言,有時候我們會希望依據資料的一些特性來將這群資料分為兩群。而就資料分群而言,我們已知有一些效果不錯的方法。例如:Nearest Neighbor、類神經網路(Neural Networks)、Decision Tree等等方式,而如果在正確的使用的前提之下,這些方式的準確率相去不遠,然而,SVM 的優勢在於使用上較為容易。 • 我們希望能夠在該空間之中找出一Hyper-plan,並且,希望此Hyper-plan可以將這群資料切成兩群。
Conclusion. • IEMiner algorithm • IEClassification • The performance improved • It achieved the best accuracy