170 likes | 266 Views
Flexible and efficient retrieval of haemodialysis time series S. Montani, G. Leonardi, A. Bottrighi, L. Portinale, P. Terenziani DISIT, Sezione di Informatica, Universita del Piemonte Orientale, Alessandria, Italy. Introduction Data structures Retrieval process Experiments Future work.
E N D
Flexible and efficient retrieval of haemodialysis time seriesS. Montani, G. Leonardi, A. Bottrighi, L. Portinale, P. TerenzianiDISIT, Sezione di Informatica, Universita del Piemonte Orientale, Alessandria, Italy • Introduction • Data structures • Retrieval process • Experiments • Future work
Introduction: Time Series - evolution of a phenomenon over time, to understand its behavior for future problem solving TIME SERIES • Medical domain: continuous monitoring, control instruments (e.g., ICU, hemodialysis) • State variables (e.g, distolic pressure value) vstrend variables (e.g., increasing, decreasing) • PROBLEM: difficult interpretation and retrieval – e.g. - find similar cases, - find “abstract” cases, - understand results to interactively refine\relax search need for automatic support for these tasks
Introduction: Time Series Retrieval (literature) • DIMENSIONALITY REDUCTION • mathematical transforms able to preserve the distance between two time series (or to underestimate it). E.g. Discrete Fourier Transform (DFT) • Complexity (preprocessing, post processing) • INPUT: a specific time serie (case) • Black box behavior difficult interpretation, no flexibility, no interacttivity Symbolic approaches to dimensionality reduction (e.g., [Xia, 96], survey [Daw et al., 2001])
Our approach: Time Series Retrieval + Temporal Abstraction (TA) • Original contribution: TA used for dimensionality reduction and flexible retrieval • TA: deriving high level concepts from time stamped data (from a point-based to an interval-based representation) • In our proposal: two-level TA: SYMBOL (e.g., increase vslow_increase) TIME GRANULARITY (e.g., 1h vs 20min) • DOMAIN-INDEPENDENT methodology: • General DATA STRUCTURES • CONSTRAINTS on the data structures
DATA STRUCTURES: SYMBOL TAXONOMY Example!! • SYMBOL ORDERING naturally emerges from the domain dependent interpretation (e.g., Ds may abstract slopes from −90 to −45 degrees, thus preceding Dw(slopes from −44 to −10 degrees) - Domain-independent general constraint: symbol taxonomy must respect the ordering ∀x, y, x′, y′ ∈ isa(x, x′) ∧ isa(y, y′) ∧ x′ y′ ∧ x < y → x′ < y′
DATA STRUCTURES: SYMBOL DISTANCE • ANY DISTANCE function is admitted (domain independent) - However, the DISTANCE function must be CONSISTENT with the SYMBOL ORDERING (if any) ∀x, y, z x < y < z → distance(x, y) < distance(x, z)
DATA STRUCTURES: TIME GRANULARITY TAXONOMY • ANY taxonomy of time granularities (to describe the episodes at increasingly more abstract levels of temporal aggregation) e.g. 10 min 30 min 1 h 2 h 4 h • HOMOGENEITY: aggregation must be “homogeneous” at every given level, in the sense that each granule at a given level must be an aggregation of exactly the same number of consecutive granules at the lower level IMPLICIT information about DURATION of (sub)episodes • “up” function, to aggregate from each level to the upper one e.g. up(<I,I,S>, 10 min, 30 min) <I, 30 min>
DATA STRUCTURES: TIME GRANULARITY TAXONOMY: UP FUNCTION • ANY “up” function (domain-dependent), BUT • CONSTRAINT about PERSISTENCE ∀x ∈ up(x, x) = x • CONSTRAINTS about ORDERING PRESERVATION ∀x, y x < y → x ≤ up(x, y) ≤ y ∀x, y, z x < y < z → up(x, y) ≤ up(x, z) ∀x, y, z x <y <z → up(x, z) ≤ up(y, z)
DATA STRUCTURES: INDEX OF Time Series (cases) • FOREST of TREEs • First, the TIME GRANULARITY dimension is (partially) expanded • Then, the SYMBOL dimension is (partially) expanded • Each node in the tree addresses all the time series (cases) that are abstracted (“up” function + ISA symbol taxonomy) by the pattern of the node
DATA RETRIEVAL • Exploits Temporal Abstraction (“up” function on temporal granularity and ISA on symbol taxonomy) and the INDEX Supports both • “basic” queries (retrieve time series similar to a given one) • “abstract” queries (retrieve time series similar to (<S,Iw,Iw,Iw>, 1h) • QUERY PROCESSING HIGHLIGHTS - Abstract on the symbol taxonomy (ISA) - Abstract on the time granularity taxonomy (“up”) - Find the proper (root of the) index tree in the forest - Descend the index tree backward to the lowest possible node - Return the time series (cases) addressed by such a node
DATA RETRIEVAL: an example “Abstract” query: S IwIwIw(1h time granularity level) Abstraction, symbol taxonomy: S IwIwIw S I II Abstraction, time granularity (“up” function): S I II II I
DATA RETRIEVAL: an example Descend the index from the root “I “ to search for “S IwIwIw” ALL the corresponding time series are returned
DATA RETRIEVAL: advantages FLEXIBLE and UNDERSTANDABLE - “Abstract” query: S IwIwIw (1h time granularity level) • Understandable query, process and output all time series that can be abstracted as dictated by the “abstract” query are returned • Support for INTERCACTIVITY E.g., depending on the output of the query, the user may • Relax the query, eg., by asking “S I II” • Refine the query, e.g., by asking “S SIwIwIwIwIw S “
DATA RETRIEVAL: Experimental Results • Dataset of 10388hemodialysis sessions (i.e. cases), collected at the Vigevano hospital, Italy. • Comparisons with RHENE, an approach was based on DFT for dimensionality reduction, and on spatial indexing (through TV-trees) for further improving retrieval performances ADVANTAGES • Efficiency • Flexibility (“abstract” queries vs. specific time series) • Interactivity * Trends vs, state abstractions
FUTURE WORK • Queries about SUBpattrerns • Higher level queries (e.g., regular expressions)