Mining Relationships Among Interval-based Events for Classification

Mining Relationships Among Interval-based Events for Classification Dhaval Patel、Wynne Hsu Mong、Li Lee SIGMOD 08

Outline. • Introduction • Preliminaries • Augment hierarchical representation • Interval-based event mining • Interval-based event classifier • Experiment • Conclusion

Introduction. • Predicts categorical class labels • Classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data • A Two-Step Process Model construction Model usage

Introduction.(cont)

age? <=30 overcast >40 31..40 student? credit rating? yes excellent fair no yes no yes no yes Introduction.(cont)

Preliminaries. • E = (type, start, end) • EL = {E1, E2,….., En} • The length of EL, given by |EL| is the number of events in the list. • Composite event E = (Ei R Ej) • The start time of E is given by min{ Ei.start, Ej.start } end time is max{Ei.end, Ej.end }

Before Meet Overlap Start Finish Contain Equal Augment hierarchical representation.

Augment hierarchical representation(cont.) • ((A overlap B) overlap C) • 1. 2. • (A Overlap[0,0,0,1,0] B) Overlap[0,0,0,1,0] C • C = contain count、F = ﬁnish by count M = meet count、O=overlap count S = start count

Augment hierarchical representation(cont.)

Augment hierarchical representation(cont.) • The linear ordering of is {{A+}{B+}{C+}{A−}{B−}{D+}{D−}{C−}}

Interval-based event mining. • Candidate generation • Theorem. A (k+1)-pattern is a candidate pattern if it is generated from a frequent k-pattern and a 2-pattern where the 2-pattern occurs in at least k − 1 frequent k-patterns. • Dominant event Dominant event in the pattern P if it occurs in P and has the latest end time among all the events in P.

Interval-based event mining(cont.)

Interval-based event mining(cont.) • Support count

IEClassifier. • Class labels Ci 1≦i ≦c, c is the number of class label • The information gain: • p(TP) is probability of pattern TP to occur in datasets. • Whose information gain values are below a predefined info_gain threshold are removed.

IEClassifier.(cont) • Let PatternMatchI be the set of discriminating patterns that are contained in I

Experiment.

Experiment.(cont) • 對於一群資料而言，有時候我們會希望依據資料的一些特性來將這群資料分為兩群。而就資料分群而言，我們已知有一些效果不錯的方法。例如：Nearest Neighbor、類神經網路(Neural Networks)、Decision Tree等等方式，而如果在正確的使用的前提之下，這些方式的準確率相去不遠，然而，SVM 的優勢在於使用上較為容易。 • 我們希望能夠在該空間之中找出一Hyper-plan，並且，希望此Hyper-plan可以將這群資料切成兩群。

Conclusion. • IEMiner algorithm • IEClassification • The performance improved • It achieved the best accuracy

Mining Relationships Among Interval-based Events for Classification

Mining Relationships Among Interval-based Events for Classification

Presentation Transcript

Data Mining: Classification

An Interval Classifier for Database Mining Applications

Evolutionary relationships – classification

Spatial Data Mining Practical Approaches for Analyzing Relationships Within and Among Maps

8.4 Relationships Among Functions

RELATIONSHIPS AMONG LIVING THINGS

Chapter 8 Relationships Among Variables

Relationships among Mollusca

An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases

CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data

A Bootstrap Interval Estimator for Bayes’ Classification Error

Unit 1: Relationships Among Quantities

Modern Classification sorts organisms into groups shows relationships among them

KEY CONCEPT Modern classification is based on evolutionary relationships.

Layered Interval Codes for TCAM-based Classification

Data Mining Classification:

Web Usage Mining Classification

Layered Interval Codes for TCAM-based Classification

23.1 Relationships among phyla

Classification and Phylogenetic Relationships

KEY CONCEPT Modern classification is based on evolutionary relationships.

Data Mining: Classification