580 likes | 589 Views
This paper explores robust signal extraction from time series data using various filtering techniques. It focuses on the analysis of clinical data for applications such as intelligent alarm systems, clinical decision support, outlier detection, and dimension reduction.
E N D
Ursula Gather Universität Dortmund Germany Robustness for High-dimensional Data RobHD 2004, Vorau, May 5th-8th 2004 Signal Extraction from Time Series Roland Fried Universidad Carlos III de Madrid, Spain
arterial pressures, pulmonary artery pressures, central venous pressure, heart rate, pulse, temperature 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 3000 3500 Time [Minutes] Hemodynamic variables of a critically ill patient
Goals of Clinical Data Analysis • Intelligent Alarm Systems • Clinical Decision Support • Signal extraction • Outlier detection / classification • Level shift / trend detection • Dimension reduction • Fast, automatic algorithms • Good clinical interpretability
Moving window (yt-m, …, yt, … ,yt+m) of width n=2m+1 for approximation of • Signal extraction from univariate (yt) Signal + noise model Signal smooth with a few sudden shifts Observational noise symmetric mean zero Spiky noise measurement artifacts … Choice of m: bias, time delay admissible variance, robustness, computation time
85 80 75 70 65 60 0 50 100 150 200 250 Time [min] Location based filtering Heart rate ,running mean and running median (length 31) • Problems: • Running mean not robust • Running median not smooth
Extract local linear trend from • Robust Linear Regression • Least median of squares: • Repeated median: • L1 Regression:
L1 bestRM bestLMS best Simulations: MSE for Level Approximation different • numbers of outliers and • sizes of • outliers • level shifts • trends (Davies, Fried, Gather 2004)
Application: heart rate 85 80 75 70 65 60 0 50 100 150 200 250 Time LMS not stable,Repeated MedianandL1 similar
• Modified Double Window (DW) Filters Double Window Modified Trimmed Mean Filter (Lee, Kassam, 1985) • Regression based analogues • Apply RM to (yt-k,…,yt+k) • Estimate st from regression residuals • Trim all yt+i with large regression residuals in (yt-m,…,yt+m) • Apply least squares or repeated median TRM Filter MRM Filter
Number of spikes which can be removed completely if Med RM MTMTRM MRM m m-1 k k-1 k-1 Trend period 0 m-1 0 k-1 k-1 Removal of spikes Number of spikes which can be dampened Med RM MTMTRM MRM m m-1 k k-1 k-1 Steady state Length of outlier patches important for choice of inner window
100 80 60 40 20 0 0.5 0.0 0.1 0.2 0.3 0.4 Efficiency for Gaussian Noise, m=10 Location based: Regression: Median MTM, k=10 MTM, k= 5 RMMRM, k= 7 TRM, k= 7 slope Efficiency of modified filters high
5 4 3 2 1 0 10 2 4 6 8 5 4 3 2 1 2 4 6 8 10 0 Shift PreservationMax. for outliersat the endof the window(max. for outliers of sizes 1, … ,10) Med MTM RM DW: MTM MRM TRM number of outliers b= 0.0 b= -0.5 Slope Location based filters blur shifts during trends, double window filters reduce blurring of shifts
Series with outliers, shifts and trends 15 10 5 0 -5 Time 0 50 100 150 200 250 300 TRM compromise RM smooth, MTM good at shifts,
Predictive FMH Filter, M=3 Combined FMH Filter, M=5 Predictive / Combined Repeated Median Hybrid (RMH) Filter Half-window averages half-window medians One-sided linear predictors one-sided repeated medians • Hybrid Filter FIR-Median Hybrid (FMH) Filter(Heinonen and Neuvo, 1987, 1988)
Number of spikes which can be removed completely if SM RM PFMH CFMHPRMH CRMH Number of spikes which can be dampened m m-1 Steady state 1 1 SM RM PFMH CFMH PRMH CRMH Trend period 0 m-1 1 0 m m-1 1 1 Hybrid filters more limited than DW filters RMH filters more robust than FMH filters Removal of spikes
100 80 60 40 20 0 0.0 0.1 0.2 0.3 0.4 0.5 Efficiency for Gaussian Noise, m=10 Location based: Regression: FMH: RMH: Median RM PFMH CFMH PRMH CRMH slope Hybrid filters less efficient than DW filters RMH filter almost as efficient as FMH filter
5 4 3 2 1 0 10 2 4 6 8 5 4 3 2 1 0 2 4 6 8 10 Shift PreservationMax. for outliersat the endof the window(max. for outliers of sizes 1, … ,10) Med RM FMH: PFMH CFMH RMH: PRMH CRMH number of outliers Slope b= -0.5 b= 0.0 Combined hybrid filters blur shifts in trends slightly, but even less than DW filters
20 15 10 5 0 -5 0 50 100 150 200 250 300 Series with outliers, trends, shifts and extremes Time RM smooth, PFMHpreserves extremes, PRMH more robust
Series of extracted factors Series of “model errors’’ • Signal extraction from d-variate (Yt) (e.g. Peña & Box, 1987) Factor Model: Process of r latent variables - Matrix of loadings Results 10 vital parameters: time
Conclusion • Extraction of time-varying mean from contaminated time series by robust regression • LMS very robust, but slow and unstable Repeated median robust, fast and stable • Double window and hybrid filters improve preservation of shifts and extremes • Adaptive window width selectionFull online version Multivariate signal extraction
References Bernholt, T., Fried, R. (2003). Computing the update of the repeated median regression line in linear time.Information Processing Letters88, 111-117. Davies, P.L., Fried, R., Gather, U. (2004). Robust signal extraction for on-line monitoring data.J. Statistical Planning and Inference, 122, 65-78. Fried, R. (2004). Robust filtering of time series with trends.J. Nonparametric Statistics, to appear. Fried, R., Bernholt, T., Gather, U. (2004). Repeated median and hybrid filters. Technical Report 10/2004, SFB 475, University of Dortmund, Germany. Gather, U., Fried, R. (2004a). Robust scale estimation for local linear temporal trends. Tatra Mountains Mathematical Publications, 26, 87-101. Gather, U., Fried, R. (2004b). Methods and algorithms for robust filtering. Proceedings of COMPSTAT 2004, to appear.
Analysis must Procedure needs • work automatically • work online • resist measurement artifacts (many data situations possible) • attenuate observational noise unique solution low computation time high breakdown point low bias curves satisfactory efficiency No claim of optimality, compromise needed Statistical demands
Desirable properties • Noise attenuation: efficiency • Stability: continuity • Removal of spikes: exact fit, robustness • Preservation of shifts and extremes: exact fit, robustness • Trend tracking: invariance • Online analysis: fast algorithms
0.62 0.83 Computation time (millisec.) window width n=2m+1 m=10 m=15 • Least median of squares: O(n2) 5.40 11.15 • Repeated median: O(n2) 2.60 4.50 online: O(n)(Bernholt, Fried2003) • L1 Regression: O(n log n) 2.40 4.70
k* TL2 TL1 TRM TLMS 1 n=21 7 10 10 1 n=31 10 15 15 Finite Sample Replacement Breakdown Point Define k*: smallest number of contaminated observations which can cause a spike of any size in the extracted signal
k* L2 L1 RM LMS 1 7 10 10 n=21 1 10 15 15 n=31 Robustness Smallest number k* of contaminated observations which can cause a spike of any size in the extracted signal where
80 70 60 50 40 30 20 10 Finite-sample efficiency relatively to L2 (MSE) % Med L1 RM LMS Slope0.00 0.10 0.00 0.10 0.00 0.10 0.00 0.10 0.05 0.05 0.05 0.05 Widthn=11 n=21 n=31 n=61 Rep. median and L1 regression never much worse than median
80 70 60 50 40 30 20 10 Finite-sample Efficiency w.r.t. L2 (MSE) % Med L1 RM LMS Slope0.0 0.1 0.0 0.1 0.0 0.1 Width n=21 n=31 n=61 Rep. median and L1 regression never much worse than median
MSE for Level shift of size number of outliers Performance when outliers present LMS << Repeated median < L1 regression LMS often better for large outliers Median good only for negligible slope
Time series and level approximates Application: heart rate LMS not stable,Repeated MedianandL1 similar
if • Outlier Replacement General strategy to improve repeated median Given and Prediction residual `Optimal choice´ ofd0 and d1 ? Special Cases: d0 > d1 = 0 : Trimming d0 = d1 > 0 : Winsorization
Replace by if e.g. • Online Outlier Replacement for RM Outlier region centered at Robust scale estimation e.g. Rousseeuw and Croux‘ very good for shifts and inliers residuals (Gather, Fried, 2004a, Fried, 2004)
Length of the shortest half Rousseeuw and Croux‘ Qa Scale Approximation Residuals MAD Nested scale statistic (Gather, Fried, 2003, Fried, 2003)
MAD works fine Length of the shortest half better worst case • Outlier replacement Residuals e.g. with robust scale estimation Rousseeuw and Croux‘ Qa very good for shifts and inliers (Gather, Fried, 2004a, Fried, 2004)
85 80 75 70 65 60 0 50 100 150 200 250 time Application: Heart Rate Heart rate , LMS and RM with outlier replacement
Sinusoid, 10% patchy additive N(0,9s) outliers Trimming with QaLMS
• Level Shift Detection EWMA, CUSUM, Runs etc. not robust against outliers Robust majority rule: Compute Detect positive LS at t+j0 if dLS : clinically relevant threshold, e.g. dLS=2
Simulated Time Series with Shifts LMS and RM with outlier replacement and shift detection 18 16 14 12 10 8 6 4 2 0 -2 0 50 100 150 200 250 300 350 400 450 500 time time
Trend invariance Replacing yt-m, …,y0, … ,yt+m by yt-m - mb, …, y0, … ,yt+m +mb does not change the extracted signal Invariant: RM, TRM, MRM, PFMH, PRMH Not invariant: SM, MTM, CFMH, CRMH Lipschitz continuity SM RM PFMH CFMH PRMH CRMH Const. 1 2k+1 4/(k-1) 4/(k-1) 2k+1 2k+1 Not continuous: MTM, TRM, MRM
100 100 80 100 60 80 80 efficiency 60 40 60 efficiency 100 efficiency 20 40 40 80 20 0 20 0.0 0.1 0.2 0.3 0.4 0.5 60 slope 0 efficiency 0 0.0 0.1 0.2 0.3 0.4 0.5 40 0.0 0.1 0.2 0.3 0.4 0.5 slope slope 20 0 0.0 0.1 0.2 0.3 0.4 0.5 slope Efficiency for Gaussian noise f=0.6
100 100 80 80 60 60 40 40 20 20 0 0 0.5 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.5 100 100 80 80 60 60 40 40 20 20 0 0 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 Efficiency for Gaussian Noise Autocor. f=0.0 Med RM MTM DW: MTM MRM TRM Med RM FMH: PFMH CFMH RMH: PRMH CRMH f=0.6 slope Efficiency of repeated median high, of hybrid filter low
100 100 80 80 60 60 40 40 20 20 0 0.0 0.1 0.2 0.3 0.4 0.5 0 0.5 0.0 0.1 0.2 0.3 0.4 Efficiency for Gaussian Noise Hybrid filter Modified filter slope Med, MTMRM DW:MTM MRM TRM Med, RM FMH: PFMH, CFMH RMH: PRMH, CRMH Efficiency of modified filter high, of hybrid filter low
5 5 4 4 3 3 2 2 1 1 0 0 10 10 2 4 6 8 2 4 6 8 5 5 4 4 3 3 2 2 1 1 2 4 6 8 10 0 0 2 4 6 8 10 Shift PreservationMax. RMSE for increasing number of outliersat the end Slope b= 0.0 Med MTM RM DW: MTM MRM TRM Med RM FMH: PFMH CFMH RMH: PRMH CRMH number of outliers b=-0.5 Double window and hybrid filter reduce blurring of shifts
Removal of spikes Number of spikes which can be removed competely SM RM MTMTRM MRMPFMH CFMHPRMH CRMH m m-1 k k-1 k-1 Steady 1 1 Trend 0 m-1 0 k-1 k-1 1 0 Number of spikes which can be dampened SM RM MTMTRM MRMPFMH CFMHPRMH CRMH m m-1 k k-1 k-1 1 1
10 10 8 8 6 6 4 4 2 2 0 0 2 4 6 8 10 2 4 6 8 10 10 10 8 8 6 6 4 4 2 2 0 0 2 4 6 8 10 2 4 6 8 10 Outlier patch in the centerMax. RMSE for increasing number of outliersat the end Slope b= 0.0 Med RM MTM DW: MTM MRM TRM Med RM FMH: PFMH CFMH RMH: PRMH CRMH number of outliers b=-0.5 Double window and hybrid: info about length of patches needed
Green: SM Blue: RM RED: MRMS Purple: MRM Orange: MTM Yellow: DWMTM
Online estimates Blue RM Yellow MRM Green TRM Green: SM Blue: RM RED: TRMS Purple: MRM Orange: MTM Yellow: DWMTM