250 likes | 377 Views
An Inductive Database for Mining Temporal Patterns in Event Sequences. Alexandre Vautier, Marie-Odile Cordier and René Quiniou. Alexandre.vautier@irisa.fr. RENNES - France. M. Rabbit, you suffer from bigeminy, a severe cardiac arrhythmia. M. Dog, you are ok !.
E N D
An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou Alexandre.vautier@irisa.fr RENNES - France
M. Rabbit, you suffer from bigeminy, a severecardiac arrhythmia M. Dog, you are ok !... The application Cardiac arrhytmias …ok, but how can a specific arrhythmia be automatically characterized ? Abnormal QRS complex Abnormal rhythm Normal QRS complex Normal rhythm P wave Electrocardiograms
A problem definition close to supervised machine learning ..ok, which patterns are frequent in the sequence labeled but not frequent in sequences labeled ? P Temporal patterns representative of the sequence N P p,Q,p,q,p,Q,p,Q P p,q,p,q,p,q,p,q p,Q,p,q,p,Q,p,Q p,Q,p,p,Q,p,p,Q N Frequent temporal patterns p,q,p,q,p,q,p,q p,Q,p,p,Q,p,p,Q Discretized and labeled electrocardiograms
Formalization of the problemThe framework of inductive database (IDB) …ok, which temporal patterns C satisfy Quexpert(P,N,T,C) = (9L2P, freq(C,L)¸TL)Æ (8L2N, freq(C,L) < TL) ? Temporal patterns {C|freq(C, Lbigeminy)¸ T0}, {C|freq(C, Llbbb)¸ T1}, {C|freq(C, Lmobitz)¸ T2}, {C|freq(C, Lnormal)¸ T3}, {C| Quexpert(P,N,T,C) } Temporal patterns … Sequences {Lbigeminy} 2 P {Llbb, Lmobitz, Lnormal}2 N Sequences {Lbigeminy} 2 P {Llbb, Lmobitz, Lnormal}2 N An IDB
Plan • Introduction • Problem features • Sequences • Chronicles • Inductive databases • Order relation • What is frequency ? • Algorithms • Frequent Minimal Chronicles Search (Fmc Search) • Querying the IDB • Experiments and problems to be solved • Conclusion and future work
Features of sequences • Long sequences of time-stamped events with few types • Numerical temporal information of major importance An example of an event sequence: Events … B B B A A A B A B A 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 time
C,1 B,5 A,8 B,10 C,15 C,16 A,26 A,27 B,34 Features of temporal patternsChronicles • Chronicle: a set of events temporally constrained • May contain several events of the same type • Specifies numerical temporal constraint between events: the uncertain delay represented by an interval [dmin,dmax] dmin,dmax2Z • Is easily readable by an expert of the application domain Event sequence: an ordered list of time-stamped events Instances IC(L) C, t0 A, t1 B, t2 [5;10] C: [-2;20]
Inductive databaseRequired definitions A query language that makes use of frequency constraintsfreq(C,L) ¸ T and freq(C,L) · T If a query on frequency satisfies monotonicity or anti-monotonicity propertiesthen a search based on frequency is easier to compute An order relation on chronicles must be defined
[5;10] B,t3 A,t2 B,t4 [9;20] v [8;21] B,t1 A,t0 An order relation on chronicles C is more general than C’ (C v C’) , • each event of C can be matched to an event of C‘ • each temporal constraint of C is more general than the corresponding constraint in C' C C’
[1,5] [2,3] [-1,3] C: B A B How to compute the frequency of a chronicle in a sequence ? The cardinal of the set of • All the instances ? • Minimal occurences ? [Mannila,97] • Earliest distinct instances ? [Dousson, 99] • Distinct instances ? IC(L) L: B B A B A B B B B A A B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Monotonicity and anti-monotonicity properties • Constraints on frequency should satisfy monotonicity or anti-monotonicity properties C: 2 instances A freq(C,L) ¸ freq(C’,L) · L: A B A B 0 1 2 3 4 5 [-2,2] 3 instances C’: A B Minimal occurences [Mannila, 97] don’t have monotonicity and anti-monotonicity properties
Recognition criterion Let IC(L) be the set of instances of the chronicle C in the sequence L • A recognition criterion selects a unique set E of instances from IC(L) • The frequency of the chronicle C in the sequence L according to the recognition criterion Q is freqQ(C,L) = |E| • A monotonic criterion is a recognition criterion Q such that C v C’) freqQ(C,L) ¸ freqQ(C’,L)
x x FmcQ,W(L,T) x x x Fmc Search • Fmc Search: Frequent Minimal Chronicles SearchfreqQ(C,L) ¸ T, maxwin(C) · W • Input: • L: An event sequence • T:A minimum frequency threshold • Q: A recognition criterion (application dependent) • W: A maximal time window • Output: FmcQ,W(L,T) • Every chronicle from FmcQ,W(L,T) satisfies 3 properties: • is as specific as possible • generalizes at least T instances… • …that respect the recognition criterion Q • Algorithm: Step 1 Step 2 Step 3 Step 4
x x FmcQ,W(L,T) x x x Fmc SearchStep 1: Chronicle instance extraction • The instances of every frequent chronicle are extracted from the sequence L. Their temporal constraints are set to [-W,W] • Implemented in the software FACE (Frequency Analyser for Chronicle Extraction) The numerical temporal constraints of chronicles found by FACE are not specific enough
x x FmcQ,W(L,T) x x x Fmc SearchStep 2: Fuzzy clustering of instances • A fuzzy clustering of each set IC(L) found at step 1 is performed An instance has a membership degree to each cluster AB IC(L) x x x x BB B B B A A B AB
x x FmcQ,W(L,T) x x x Fmc SearchStep 3: Chronicle construction from clusters • For each fuzzy cluster of step 2: • Instances are sorted in the decreasing order of their membership degree • The T first instances that respect the Q criterion are kept to construct a chronicle • This chronicle is the lgg (least general generalization) of the selected instances The specificity of chronicles depends on the clustering
x x FmcQ,W(L,T) x x x Fmc SearchStep 4: Chronicle filtering - keep the most specific • Compute the set of frequent minimal (maximally specific) chronicles FmcQ,W (L,T) • The most specific chronicles are retained • Monotonicity property:A chronicle C that satisfies freqQ(C,L) ¸ T is more general than at least one chronicle of FmcQ,W (L,T)
Querying the IDB Remember my query: Quexpert(P,N,T,C). For the explanation P = {LP} and N = {LN} freq(C,LP)¸TPÆ freq(C,LN)<TN A chronicle C satisfies this query iff: • C is more general than at least one chronicle of FmcQ,W(LP,TP) monotonicity property • C is not more general than every chronicle of FmcQ,W(LN,TN) anti-monotonicity property T Version space An adaptation of Mitchell’salgorithm computes this version space ?
ExperimentsCharacterization of cardiac arrhythmias • Data: 4 sequences of cardiac events • elaborated from electrocardiograms • labeled by an expert • containing ~4000 events of 3 types (P waves, normal QRS complexes, abnormal QRS complexes) • A typical query :freqQd(C,Lbigeminy) ¸ 5% ÆfreqQd(C,Lnormal) · 10% ÆfreqQd(C,Lmobitz) · 10% ÆfreqQd(C,Llbbb) · 10% ÆW = 3 s
ExperimentsAn example of cardiac chronicle • Also found by a supervised learning method (ILP) from ECGs [Carrault, 03] p = “P waves” q = “normal QRS” Q = “abnormal QRS” Characterizes bigeminy arrhythmia
Problems to be solved • The step 3 of the Fmc-search has to cluster up to 180,000 instances per chronicle • For a minimum threshold of 5%, up to 1000 chronicles can be extracted in one sequence This slows down Mitchell's algorithm dramatically • Finding Fmc is an NP-complete problem The set FmcQ,W(LN,TN) is correct but not complete Results have to be filtered in order to give the correct solution Practical FmcQ,W(LN,TN) T ? Optimal FmcQ,W(LN,TN)
Conclusion • An original method to extract temporal patterns in the form of chronicles • Chronicles express constraints on time by numerical intervals • A formalization of the problem in the framework of inductive database which provides the definition of: • An order relation on temporal patterns • A monotonic recognition criterion and the related frequency • A management of numerical temporal constraints (this task is very hard) • An algorithm that finds Fmc in sequences • A method to reuse and adapt Mitchell’s algorithm
Future work • Control the clustering step of Fmc search in order to compute only Fmcs that are needed by Mitchell’s algorithm • Adapt Mitchell’s algorithm in order to provide an approximate solution whose quality is user-defined • Extend the method to other measures of interest • Explore new applications • intrusion detection
in Event Sequences Alexandre Vautier, Marie-Odile Cordier, and René Quiniou RENNES - France An IDB for Mining Temporal Patterns Alexandre.vautier@irisa.fr
Maximum size of IC(L) as a function of the number of events in a chronicle