1 / 25

An Inductive Database for Mining Temporal Patterns in Event Sequences

An Inductive Database for Mining Temporal Patterns in Event Sequences. Alexandre Vautier, Marie-Odile Cordier and René Quiniou. Alexandre.vautier@irisa.fr. RENNES - France. M. Rabbit, you suffer from bigeminy, a severe cardiac arrhythmia. M. Dog, you are ok !.

bell-webb
Download Presentation

An Inductive Database for Mining Temporal Patterns in Event Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou Alexandre.vautier@irisa.fr RENNES - France

  2. M. Rabbit, you suffer from bigeminy, a severecardiac arrhythmia M. Dog, you are ok !... The application Cardiac arrhytmias …ok, but how can a specific arrhythmia be automatically characterized ? Abnormal QRS complex Abnormal rhythm Normal QRS complex Normal rhythm P wave Electrocardiograms

  3. A problem definition close to supervised machine learning ..ok, which patterns are frequent in the sequence labeled but not frequent in sequences labeled ? P Temporal patterns representative of the sequence N P p,Q,p,q,p,Q,p,Q P p,q,p,q,p,q,p,q p,Q,p,q,p,Q,p,Q p,Q,p,p,Q,p,p,Q N Frequent temporal patterns p,q,p,q,p,q,p,q p,Q,p,p,Q,p,p,Q Discretized and labeled electrocardiograms

  4. Formalization of the problemThe framework of inductive database (IDB) …ok, which temporal patterns C satisfy Quexpert(P,N,T,C) = (9L2P, freq(C,L)¸TL)Æ (8L2N, freq(C,L) < TL) ? Temporal patterns {C|freq(C, Lbigeminy)¸ T0}, {C|freq(C, Llbbb)¸ T1}, {C|freq(C, Lmobitz)¸ T2}, {C|freq(C, Lnormal)¸ T3}, {C| Quexpert(P,N,T,C) } Temporal patterns … Sequences {Lbigeminy} 2 P {Llbb, Lmobitz, Lnormal}2 N Sequences {Lbigeminy} 2 P {Llbb, Lmobitz, Lnormal}2 N An IDB

  5. Plan • Introduction • Problem features • Sequences • Chronicles • Inductive databases • Order relation • What is frequency ? • Algorithms • Frequent Minimal Chronicles Search (Fmc Search) • Querying the IDB • Experiments and problems to be solved • Conclusion and future work

  6. Features of sequences • Long sequences of time-stamped events with few types • Numerical temporal information of major importance An example of an event sequence: Events … B B B A A A B A B A 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 time

  7. C,1 B,5 A,8 B,10 C,15 C,16 A,26 A,27 B,34 Features of temporal patternsChronicles • Chronicle: a set of events temporally constrained • May contain several events of the same type • Specifies numerical temporal constraint between events: the uncertain delay represented by an interval [dmin,dmax] dmin,dmax2Z • Is easily readable by an expert of the application domain Event sequence: an ordered list of time-stamped events Instances IC(L) C, t0 A, t1 B, t2 [5;10] C: [-2;20]

  8. Inductive databaseRequired definitions A query language that makes use of frequency constraintsfreq(C,L) ¸ T and freq(C,L) · T If a query on frequency satisfies monotonicity or anti-monotonicity propertiesthen a search based on frequency is easier to compute An order relation on chronicles must be defined

  9. [5;10] B,t3 A,t2 B,t4 [9;20] v [8;21] B,t1 A,t0 An order relation on chronicles C is more general than C’ (C v C’) , • each event of C can be matched to an event of C‘ • each temporal constraint of C is more general than the corresponding constraint in C' C C’

  10. [1,5] [2,3] [-1,3] C: B A B How to compute the frequency of a chronicle in a sequence ? The cardinal of the set of • All the instances ? • Minimal occurences ? [Mannila,97] • Earliest distinct instances ? [Dousson, 99] • Distinct instances ? IC(L) L: B B A B A B B B B A A B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

  11. Monotonicity and anti-monotonicity properties • Constraints on frequency should satisfy monotonicity or anti-monotonicity properties C: 2 instances A freq(C,L) ¸ freq(C’,L) · L: A B A B 0 1 2 3 4 5 [-2,2] 3 instances C’: A B Minimal occurences [Mannila, 97] don’t have monotonicity and anti-monotonicity properties

  12. Recognition criterion Let IC(L) be the set of instances of the chronicle C in the sequence L • A recognition criterion selects a unique set E of instances from IC(L) • The frequency of the chronicle C in the sequence L according to the recognition criterion Q is freqQ(C,L) = |E| • A monotonic criterion is a recognition criterion Q such that C v C’) freqQ(C,L) ¸ freqQ(C’,L)

  13. x x FmcQ,W(L,T) x x x Fmc Search • Fmc Search: Frequent Minimal Chronicles SearchfreqQ(C,L) ¸ T, maxwin(C) · W • Input: • L: An event sequence • T:A minimum frequency threshold • Q: A recognition criterion (application dependent) • W: A maximal time window • Output: FmcQ,W(L,T) • Every chronicle from FmcQ,W(L,T) satisfies 3 properties: • is as specific as possible • generalizes at least T instances… • …that respect the recognition criterion Q • Algorithm: Step 1 Step 2 Step 3 Step 4

  14. x x FmcQ,W(L,T) x x x Fmc SearchStep 1: Chronicle instance extraction • The instances of every frequent chronicle are extracted from the sequence L. Their temporal constraints are set to [-W,W] • Implemented in the software FACE (Frequency Analyser for Chronicle Extraction) The numerical temporal constraints of chronicles found by FACE are not specific enough

  15. x x FmcQ,W(L,T) x x x Fmc SearchStep 2: Fuzzy clustering of instances • A fuzzy clustering of each set IC(L) found at step 1 is performed An instance has a membership degree to each cluster AB IC(L) x x x x BB B B B A A B AB

  16. x x FmcQ,W(L,T) x x x Fmc SearchStep 3: Chronicle construction from clusters • For each fuzzy cluster of step 2: • Instances are sorted in the decreasing order of their membership degree • The T first instances that respect the Q criterion are kept to construct a chronicle • This chronicle is the lgg (least general generalization) of the selected instances The specificity of chronicles depends on the clustering

  17. x x FmcQ,W(L,T) x x x Fmc SearchStep 4: Chronicle filtering - keep the most specific • Compute the set of frequent minimal (maximally specific) chronicles FmcQ,W (L,T) • The most specific chronicles are retained • Monotonicity property:A chronicle C that satisfies freqQ(C,L) ¸ T is more general than at least one chronicle of FmcQ,W (L,T)

  18. Querying the IDB Remember my query: Quexpert(P,N,T,C). For the explanation P = {LP} and N = {LN} freq(C,LP)¸TPÆ freq(C,LN)<TN A chronicle C satisfies this query iff: • C is more general than at least one chronicle of FmcQ,W(LP,TP) monotonicity property • C is not more general than every chronicle of FmcQ,W(LN,TN) anti-monotonicity property T Version space An adaptation of Mitchell’salgorithm computes this version space ?

  19. ExperimentsCharacterization of cardiac arrhythmias • Data: 4 sequences of cardiac events • elaborated from electrocardiograms • labeled by an expert • containing ~4000 events of 3 types (P waves, normal QRS complexes, abnormal QRS complexes) • A typical query :freqQd(C,Lbigeminy) ¸ 5% ÆfreqQd(C,Lnormal) · 10% ÆfreqQd(C,Lmobitz) · 10% ÆfreqQd(C,Llbbb) · 10% ÆW = 3 s

  20. ExperimentsAn example of cardiac chronicle • Also found by a supervised learning method (ILP) from ECGs [Carrault, 03] p = “P waves” q = “normal QRS” Q = “abnormal QRS” Characterizes bigeminy arrhythmia

  21. Problems to be solved • The step 3 of the Fmc-search has to cluster up to 180,000 instances per chronicle • For a minimum threshold of 5%, up to 1000 chronicles can be extracted in one sequence This slows down Mitchell's algorithm dramatically • Finding Fmc is an NP-complete problem The set FmcQ,W(LN,TN) is correct but not complete Results have to be filtered in order to give the correct solution Practical FmcQ,W(LN,TN) T ? Optimal FmcQ,W(LN,TN)

  22. Conclusion • An original method to extract temporal patterns in the form of chronicles • Chronicles express constraints on time by numerical intervals • A formalization of the problem in the framework of inductive database which provides the definition of: • An order relation on temporal patterns • A monotonic recognition criterion and the related frequency • A management of numerical temporal constraints (this task is very hard) • An algorithm that finds Fmc in sequences • A method to reuse and adapt Mitchell’s algorithm

  23. Future work • Control the clustering step of Fmc search in order to compute only Fmcs that are needed by Mitchell’s algorithm • Adapt Mitchell’s algorithm in order to provide an approximate solution whose quality is user-defined • Extend the method to other measures of interest • Explore new applications • intrusion detection

  24. in Event Sequences Alexandre Vautier, Marie-Odile Cordier, and René Quiniou RENNES - France An IDB for Mining Temporal Patterns Alexandre.vautier@irisa.fr

  25. Maximum size of IC(L) as a function of the number of events in a chronicle

More Related