360 likes | 454 Views
ICDE 2008. Trajectory Outlier Detection: A Partition-and-Detect Framework. April 8, 2007. Jae-Gil Lee, Jiawei Han, and Xiaolei Li Department of Computer Science University of Illinois at Urbana-Champaign. Table of Contents. Motivation Partition-and- Detect Framework
E N D
ICDE 2008 Trajectory Outlier Detection: A Partition-and-Detect Framework April 8, 2007 Jae-Gil Lee, Jiawei Han, and Xiaolei Li Department of Computer Science University of Illinois at Urbana-Champaign
Table of Contents • Motivation • Partition-and-Detect Framework • Outlier Detection Algorithm: TRAOD • Partitioning Phase (Simple) • Detection Phase • Partitioning Phase (Enhanced) • Performance Evaluation • Related Work • Conclusions
Outlier Detection • Definition: the process of detecting a data object that is grossly different from or inconsistent with theremaining set of data • Applications: the detection ofcredit card fraud, the monitoring of criminal activities in electronic commerce, etc. • Algorithms:distribution-based, distance-based, density-based, and deviation-based • Target data: previous research has mainly dealt with outlier detection of point data
Analysis on Trajectory Data • Tremendous amounts of trajectory data of moving objects are being collected • Example: vehicle positioning data, hurricane tracking data, animal movement data, etc. • Trajectory outlier detection has many important, real-world applications • Detection of suspicious persons in video surveillance • Analysis of unusual air-mass trajectories in meteorology • … A powerfuloutlier detection algorithm for trajectories is needed urgently
Limitations of Existing Algorithms • Knorr et al. [5] have presented one of very few attempts • Define the distance between two whole trajectories using the summary information (e.g., the coordinates of the starting and ending points) • Apply a distance-based approach to detection of trajectory outliers • Existing algorithmsmight not be able to detect outlying portions of trajectories • Example: TR3 is not detected as an outlier since its overall behavior is similar to those of neighboring trajectories TR5 An outlying sub-trajectory TR4 TR3 TR2 TR1
Discovery of OutlyingSub-Trajectories • Discovery ofoutlyingsub-trajectoriesis very useful in the real world • Example: Sudden changes in hurricane’s path [10] We propose the partition-and-detect framework
The Partition-and-Detect Framework • Consists of two phases: partitioning and detection TR5 TR4 A set of trajectories TR3 TR2 TR1 (1) Partition A set of trajectory partitions (2) Detect TR3 An outlier Outlying trajectory partitions Note: A set of outlying trajectory partitions indicates an outlying sub-trajectory
The Problem Statement • Given a set of trajectories I = {TR1, …, TRn}, our algorithm generates a set of outliers O = {O1, …, Om} with outlying trajectory partitions for each Oi • Necessary definitions: • A trajectory is a sequence of multi-dimensional points, which is denoted as TRi = p1p2p3 … pj … pleni; a trajectory partition (t-partition for short) is a line segment pipj (i < j), where pi and pj are the points chosen from the same trajectory • A t-partition is outlying if it does not have a sufficient number of similar neighbors • A trajectory is an outlier if it contains a non-negligible amount of outlying t-partitions
The Outlier Detection Algorithm: TRAOD • Based on the partition-and-detect framework Algorithm TRAOD (TRAjectory Outlier Detection) Input: A set of trajectories I = {TR1, …, TRn} Output: A set of outliers O = {O1, …, Om} with outlying t-partitions for each Oi Algorithm: /* Partitioning Phase */ 01: for eachTR Ido 02: Partition TR into a set L of line segments; 03: Accumulate L into a set D; /* Detection Phase */ 04: for eachPDdo 05: Mark P if it is an outlying t-partition; 06: for eachTR I do 07: Output TR if it is an outlier;
Where We Are Now /* Partitioning Phase */ 01: for eachTR Ido 02: Partition TR into a set L of line segments 03: Accumulate L into a set D; /* Detection Phase */ 04: for eachPDdo 05: Mark P if it is an outlying t-partition; 06: for eachTR I do 07: Output TR if it is an outlier; by a simple strategy; by a two-level partitioning strategy;
A Simple Partitioning Strategy (1/2) • Careless partitioning (especially, in a long length) could miss possible outliers • Example: Even though TRout behaves differently from its neighboring trajectories, these differences are averaged out due to careless partitioning A trajectory TRout A t-partition Neighboring Trajectories
A Simple Partitioning Strategy (2/2) • A trajectory is partitioned at a base unit: the smallest meaningful unit of a trajectory in a given application • Example: The base unit can be every single point Pros: high detection quality in general Cons: poor performance due to a large number of t-partitions remedied by a two-level partitioning strategy A trajectory TRout An outlying t-partition A t-partition Neighboring Trajectories
Where We Are Now /* Partitioning Phase */ 01: for eachTR Ido 02: Partition TR into a set L of line segments 03: Accumulate L into a set D; /* Detection Phase */ 04: for eachPDdo 05: Mark P if it is an outlying t-partition; 06: for eachTR I do 07: Output TR if it is an outlier; by a simple strategy; by a two-level partitioning strategy;
Distance between T-Partitions • The weighted sum of three components: the perpendicular distance( ), parallel distance( ), and angle distance( ) • Adapted from similarity measures used in the domain of pattern recognition [13]
Trajectory Outliers Based on Distance (1/2) • Def. (a closetrajectory): • Def. (an outlying t-partition): TRj is close to LiTRj is not close to Li Not close > 1‒p Close ≤ 1‒p Li is an outlying t-partitionLi is not an outlying t-partition
Trajectory Outliers Based on Distance (2/2) the sum of the lengths of outlying t-partitions in TRi ≥ F the sum of the lengths of all t-partitions in TRi • Def. (an outlier): • A trajectory TRi is an outlier if TRi TRi is an outlier TRj TRj is not an outlier
Incorporation of Density (1/2) • The previous definition, as it is, has the local density problem • A t-partition in a dense region tends to have relatively a larger number of close trajectories than that in a sparse region T-Partitions in dense regions are favored!
Incorporation of Density (2/2) the average density of all t-partitions adj(Li) = the density of the t-partition Li • Def. (the density of a t-partition): • The density of a t-partition Li is the number of t-partitions within the distance σ from Li, where σ is the standard deviation of pairwise distances between t-partitions • Def. (the adjustingcoefficient of a t-partition): • Adjustment by the density • The number of close trajectories is multiplied by the adjusting coefficient adj(Li) adj(Li) < 1.0 in a dense region adj(Li) > 1.0 in a sparse region
Guidelines for Parameter Values • Three parameters: • D corresponds to similar, p to sufficient, and F to non-negligible • Remark: There is no universally correct parameter value even for the same data set and application • Our guideline: Resorts on user feedback Want Many Outliers? D p F Smaller Larger Have Many Trajectories? 0.90 0.99 Are Trajectories Short? 0.10 0.20
Where We Are Now /* Partitioning Phase */ 01: for eachTR Ido 02: Partition TR into a set L of line segments 03: Accumulate L into a set D; /* Detection Phase */ 04: for eachPDdo 05: Mark P if it is an outlying t-partition; 06: for eachTR I do 07: Output TR if it is an outlier; by a simple strategy; by a two-level partitioning strategy;
Two-Level Trajectory Partitioning • Objective • Achieves much higher performance than the simple strategy • Obtains the same result as that of the simple strategy; i.e., does not lose the quality of the result • Basic idea 1. Partition a trajectory in coarse granularity first 2. Partition a coarse t-partition in fine granularityonly when necessary • Main benefit • Narrows the search space that needs to be inspected in fine granularity Many portions of trajectories can be pruned early on
Intuition to Two-Level Trajectory Partitioning • If the distance between coarse t-partitions is very large (or small), the distances between their fine t-partitions is also very large (or small) TRi Coarse-Granularity Partitioning Fine-Granularity Partitioning TRj Given two coarse t-partitions, can we know if the distance between any two fine t-partitions is greater than (or less than) D?
Coarse-Granularity Partitioning* • Try to maximize two rivalry measures • Preciseness: the difference between a trajectory and a set of its coarse t-partitions should be as small as possible • Required for making the bounds tight • Conciseness: the number of coarse t-partitions should be as small as possible • Required for reducing the number of comparisons • Formulate this problem using the minimum length description (MDL) principle • A good tradeoff between the two measures is found based on the information theory * Coarse-granularity partitioning is identical to that in our earlier work on trajectory clustering [15]
Fine-Granularity Partitioning Li TRi Lj TRj • Identify outlying coarse t-partitions by deriving the distance bounds between two coarse t-partitions Li and Lj • Suppose li is a fine t-partition in Li and lj is that in Lj • Derive the above bounds separately for (Lemmas1~3) and combine them (Lemma4) lb(Li, Lj, f) The lower bound of f(li, lj), ub(Li, Lj, f) The upper bound of f(li, lj),
Derivation of the Distance Bounds Lemma 1. Bounds for Lemma 2. Bounds for Lemma 3. Bounds for Combine Lemma 4. Bounds for dist(Li, Lj)
Pruning Rules for Fine-Granularity Partitioning lb(Li, Lj, dist) > D > D ≤ D ub(Li, Lj, dist) ≤ D • Rule 1: If lb(Li, Lj, dist) > D, fine-granularity partitioning is not required when comparing Li and Lj • Rule 2: If ub(Li, Lj, dist) ≤ D, fine-granularity partitioning is required, but the distance between the fine t-partitions in Li and Lj needs not be computed Li Lj Li Lj
Performance Evaluation • Use two real trajectory data sets • Hurricane track data set • Records the Atlantic hurricanes for the years 1950 through 2006 • The entire set: 608 trajectories and 18,951 points; A small set (1990~2006): 221 trajectories and 7,270 points • Animal movement data set • Records the locations of elk, deer, and cattle for the years 1993 through 1996 (the Starkey Project) • Elk1993: 33 trajectories and 15,422 points; Deer1995: 32 trajectories and 20,065 points; Cattle1993: 41 trajectories and 19,556 points • Validate the quality of outlier detection • Evaluate the effectiveness of the two-level partitioning strategy
Trajectory Outliers for Hurricane Data (Small) D = 85, p = 0.95,F = 0.2 → # of outliers = 13
Trajectory Outliers for Elk1993 D = 55, p = 0.95,F = 0.1 → # of outliers = 3
Trajectory Outliers for Deer1995 D = 80, p = 0.95,F = 0.1 → # of outliers = 3
Effects of Parameter Values (a) D = 83, p = 0.95, F = 0.2 19 outliers 10 outliers (b) D = 87, p = 0.95, F = 0.2
Pruning Power of Two-Level Partitioning 2L-Total: the ratio of the number of pairs pruned by Rule 1 to the total number of pairs of coarse t-partitions 2L-False: the proportion of pairs pruned incorrectly Optimal: the maximum ratio of pairs that can be pruned Achieves high pruning power (64~88%)
Speedup Ratio of Two-Level Partitioning the elapsed time of the algorithm using the simple partitioning strategy Speedup Ratio = the elapsed time of the algorithm using the two-level partitioning strategy Shows significant performance improvement
Related Work • Outlier detection algorithms for points • Distribution-based [2], distance-based [3, 4, 5, 6], density-based [7, 8], deviation-based [9] • Trajectory outlier detection technique using a distance-based approach [5] • Not clear whether this technique can detect outlying sub-trajectories from very complicated trajectories • Trajectory outlier detection algorithms based on classification [12] • Require a good training set and depend on training
Conclusions • Proposed a novel framework, the partition-and-detect framework, for detecting trajectory outliers • For the 1st phase, proposed a two-level trajectory partitioning strategy • Ensures both high quality and high efficiency • For the 2nd phase, proposed a hybrid of the distance-based and density-based approaches • Very intuitive, but does not have the local density problem • Demonstrated the effectiveness of TRAOD using various real trajectory data