210 likes | 232 Views
Learn about identifying dense moving objects clusters over time and explore efficient exact and approximate algorithms.
E N D
On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras Hong Kong University of Science and Technology
What is a Moving Cluster? • Dense clusters of objects that move similarly for a long time period • Not necessarily the same objects during the lifetime of the cluster • Examples • Migrating animals • Convoy of cars • Military applications • Solutions: • Efficient exact and approximate algorithms
Problem Formulation • Example: • Moving cluster
MinPts=3 ε ε Related Work (Static) • Partition-based clustering (k-medoids) • Hierarchical clustering (BIRCH, CURE) • Density-based clustering (DBSCAN)
Related Work (Moving Objects) • Grouping trajectories [Vlachos et.al, ICDE 02] • Trajectory cluster: Constant set of objects through its lifetime • Only similar movement; no space proximity • Dense areas over time [Hadjieleftheriou et.al, SSTD 03] • Static dense regions • No common objects between regions in sequence • Incremental DBSCAN/OPTICS [Ester et.al, VLDB 98] • Only a small percentage of objects moves • Maintaining Data Bubbles [Nassar et.al, SIGMOD 04] • Redistributes updated objects in existing bubbles
MC1: The Straight-forward approach • G: set of moving clusters • Apply clustering to next timeslice Si • Expand moving clusters in G • Add new moving clusters to G • Report ending clusters
Hash-based DBSCAN • Memory: • 10M objects with 1GB RAM
MC1 is inefficient! • Checks all possible combination of clusters in consecutive timeslices • Performs clustering for every timeslice
MC2: Minimizing Redundant Checks • Clustering in every timeslice • Select a random object in c1 • Search the object in S2 • Repeat for remaining objects • Max: (1-θ)|ci| objects c1c2 is a moving cluster
Ambiguity Cases: θ<0.5 {c0c1, c2} {c0c2, c1}
MC3: Approximate Moving Clusters • Intuition: Many clusters will remain the same even if objects move • Avoid performing clustering in every timeslice • For an object o • If o belongs to cluster c in timeslice Si • Assume that o also belongs to c in the next timeslice (notice: objects may have moved)
Refine clusters • Hash new clusters in a grid • Legal cluster: • Does not meet/intersect with other clusters • It is connected (cells meet) • Objects in legal clusters are not considered further • For the rest of the objects, perform clustering • Possible inaccuracies!!!
Minimize Error • Perform exact clustering to absorb (may not eliminate) the accumulated error • Period for exact clustering: Grows linearly, drops exponentially • Exact clustering: If more that α|G| clusters have been added/removed
Experimental Evaluation • 10K-50K objects per timeslice • 50-100 timeslices, up to 5M objects • Linux, C++, 1.3GHz CPU, 1.2GB RAM • Generator: Clusters move/rotate, objects appear/disappear
Varying data size (10K-50K per timeslice) • θ=0.9, α=0.1 • Larger dataset: larger clusters, more interactions Avg: 87%
Varying number of clusters (100-800 per timeslice) • 5M objects, θ=0.9, α=0.1 • Many clusters: Reaches error threshold fast 96% 87% 73%
Varying α • 5M objects, θ=0.9, 800 clusters • α small: may not recover!!!
Varying α for different agilities • Low agility: Fewer errors faster
MC3 for varying θ • 5M objects, α=0.1, 800 clusters • θ large: incorrect clusters are pruned for not satisfying the θ criterion
Conclusions • Moving clusters • Objects may move/change • Exact and approximate solutions • Future work • Automatic setting of parameter α • Better error estimation • Constraints (e.g, moving cluster must span at least k timeslices)