170 likes | 307 Views
Grouping Multivariate Time Series Variables: Applications to Chemical Process and Visual Field Data. Allan Tucker - Birkbeck College Stephen Swift - Brunel University Nigel Martin - Birkbeck College Xiaohui Liu - Brunel University. Introduction.
E N D
Grouping Multivariate Time Series Variables: Applications to Chemical Process and Visual Field Data Allan Tucker - Birkbeck College Stephen Swift - Brunel University Nigel Martin - Birkbeck College Xiaohui Liu - Brunel University
Introduction • Present a methodology to group Multivariate Time Series (MTS) variables • MTS is a series of observations recorded over time • Test on two real-world applications • Grouping - partitioning a set of objects into a number of mutually exclusive subsets • Many, if not all, are NP-Hard
Grouping MTS - Introduction • Desirable to model MTS as a group of several smaller dimensional MTS • Decompose MTS into several smaller dimensional MTS based on dependencies in data • Large number of dependencies because one variable may affect another after a certain time lag
1. Correlation Search (EP) Q 2. Grouping Algorithm (GGA) 1 2 ... Qlen (xa, xb, lag) (xc, xd, lag) ... (xe, xf, lag) G Several Lower Dimensional MTS {{0,3} {1,4,5} {2} Grouping MTS - Methodology One High Dimensional MTS (X)
Correlation Search • Spearman’s Rank Correlation used • Entire Search Space is too large • Invalid Triples: • Autocorrelations • duplicates irrespective of direction where lag = 0 e.g. (xi ,xj ,0) and (xj ,xi ,0) • Evolutionary Programming approach found to be the most efficient
Group 0 Group 1 Group 2 0 3 4 1 2 6 5 7 Grouping Genetic Algorithm- Representation and Operators • Previously compared and contrasted different GA representations and operators • Falkenauer’s Crossover & Mutation ensure Schema Theory holds for grouping problems Chromosome: 0 1 1 0 0 2 1 2 : 0 1 2
Grouping- The Grouping Metric Properties • If Q is empty, then fitness maximised when each variable is in a separate group • If Q contains all pairings of variables (the entire search space), then fitness maximised when all variables in the same group • If data is from mixed set of MTS, fitness maximised when variables in the same group have as many correlations as possible in Q and variables in different groups have as few correlations as possible in Q
Oil Refinery Data • Oil Refinery Process in Scotland • Data recorded every minute • Hundreds of variables • Years of data available on repository • Selected 50 interrelated variables over 10000 time points • Large Time Lags (up to 120 minutes between some variables)
Visual Field Data The interval between tests is about 6 months 5 6 6 6 5 5 5 6 6 7 Typically, 76 points are measured 5 5 5 5 5 6 7 7 4 4 4 3 2 2 4 6 7 8 Values Range Between 60 =very good, 0 = blind 4 3 3 2 2 1 1 B 8 8 13 14 14 15 15 1 1 B 9 9 The number of tests can range between 10 and 44 13 13 13 14 15 15 13 11 10 9 12 12 12 12 12 11 10 10 Nerve Fibre Bundle (Right Eye) 12 12 12 11 11 10 X 12 11 11 11 Usual Position of Blind Spot (Right Eye) B
Oil Refinery Data - Results (1) • Very rapid generation of Groups (seconds) • 3 major groups discovered, 2 relating to the upper and lower trays of the column • Most of the single variables appear noisy • Used as a method for pre-processing data before model building where time is short
Visual Field Data - Results (1)- Patient Group Comparison Patients are ordered on Average Sensitivity Patient 1 - lowest and Patient 82 - the highest Graph goes from light (BRHC) to dark (TLHC)
Visual Field Data - Results (2) • High Sensitivity implies similar groups • Small groups in general • Points in the eye will be associated with similar nerve fibre bundles • Low Sensitivity implies dissimilar groups • Large groups in general • Different areas of the visual field may be deteriorating
Conclusions • Decomposing Large, High-Dimensional MTS is a challenging one • Proposed methodology very encouraging • Oil Refinery Data : 3 relatively independent sub-systems rapidly identified • Visual Field Data : Discovered groups offer ideal starting point for modelling as a VAR process
Future Work • Experimenting with new datasets • Gene Expression Data • EEG Data • Determining the ideal Parameters • e.g. Qlen is very influential on final groupings • Combining the two stages - correlation search and grouping into one incremental process
Acknowledgements • Engineering and Physical Sciences Research Council, UK • Moorfields Eye Hospital, UK • Honeywell Technology Centre, USA • Honeywell Hi-Spec Solutions, UK • BP-Amoco, UK