380 likes | 627 Views
Density-Based Clustering of Spatial Data when facing Physical Constraints Authors: Dr. Osmar R. Zaiane and Chi-hoon Lee. Database Laboratory Department of Computing Science University of Alberta. DBCluC ( D ensity- B ased Clu stering with C onstraints). Introduction Related works
E N D
Density-Based Clustering of Spatial Data when facing Physical ConstraintsAuthors:Dr. Osmar R. Zaiane and Chi-hoon Lee Database Laboratory Department of Computing Science University of Alberta
DBCluC(Density-Based Clustering with Constraints) • Introduction • Related works • Background Concepts • Modeling Constraints • DBCluC Algorithm • Performance Evaluation • Conclusion
K-means K-medoids CLARANS AGNES/DIANA BIRCH CURE CHAMELEON AUTOCLUST DBSCAN DENCLUE STING WaveCluster Introduction • Cluster Analysis • Clustering (unsupervised classification) is a process of partitioning data objects into a set of meaningful sub-classes called clusters by maximizing intra closeness in a cluster and minimizing inter closeness between clusters. • Taxonomy of Clustering methods Data Clustering Non-Constraint Based Constraint Based Hierarchical Partitioning Graph-Partitioning Density-Based Grid-Based
Partitioning Graph-Partitioning Density-Based CLARANS AUTOCLUST DBSCAN Introduction • Cluster Analysis • Clustering (unsupervised classification) is a process of partitioning data objects into a set of meaningful sub-classes called clusters by maximizing intra closeness in a cluster and minimizing inter closeness between clusters. • Taxonomy of Clustering methods Data Clustering Non-Constraint Based Constraint Based COD-CLARANS Partitioning Graph-Partitioning Density-Based AUTOCLUST+ DBCluC
Introduction (Cont.) • Key factors for a spatial clustering algorithm • Scalability • Discover arbitrary shaped clusters • Discriminate noise and outliers • Minimum Domain Knowledge • Insensitive to data input order • Constraints • Operational Constraints • Ex) SQL aggregate and existence constraints [4] • Physical Constraints • Ex) Obstacles [1, 2] and crossings
DBCluC(Density-Based Clustering with Constraints) • Introduction • Related works • Background Concepts • Modeling Constraints • DBCluC Algorithm • Performance Evaluation • Conclusion
Related Works • COD-CLARANS (A.K.H. Tung, et al. 2001) • Defines the relationship between obstacles and data objects by visibility graphs to compute obstructed distances between data objects • Require expensive preprocessing steps. • Inherits disadvantages of CLARANS • Number of clusters (k) • Main memory management • Micro-clustering method, Detection of only spherical shaped clusters • AUTOCLUST+ (Vladimir Estivill-Castro, et al. 2000) • Delaunay structure for data points • Model obstacles as a set of line segments • Scalable and efficient in 2-dimensional space
DBCluC(Density-Based Clustering with Constraints) • Introduction • Related works • Background Concepts • Modeling Constraints • DBCluC Algorithm • Performance Evaluation • Conclusion
Background Concepts • DBSCAN • Proposed by Ester, Kriegel, Sander, and Xu (KDD’ 96). • Density based spatial clustering algorithm discriminating noise. • Detection capability of arbitrary shaped clusters with noise. • R* tree indexing structure (O(logn)). • Density notion evaluated by two parameters: Eps and MinPts. • Eps: Maximum radius of the neighbourhood. • MinPts: Minimum number of points in an Eps-neighbourhood of a given query point. • Neps(p): {q D| dist(p,q) Eps}. |Neps(p)|: MinPts.
Background Concepts: DBSCAN p • • • • • • Directly Density-reachable • • • • • MinPts: 4 Eps: 2cm • • A point p is directly density reachable from a point q wrt. Eps, MinPts if pNeps(q) • • • • q • Density – reachable • • • p • • • A point p is density-reachable from a point q wrt. Eps, MinPts, if there is a chain of points p1 , …,pn,, p1 =q , pn =p • • • • • • • • • • q • Density – connected o • • A point p is density-connected to a point q wrt. Eps, MinPts, if there is a point o such that both, p and q are density-reachable from o wrt. Eps and MinPts • • • • • • • • p • • q
Background Concepts: DBSCAN • Cluster • A non-empty subset of data points satisfying the following conditions: • 1)Maximality: ∀ p, q: if p C and q is density-reachable from p with respect to Eps and MinPts, then q C. • 2) Connectivity. ∀ p, q C: p is density-connected to q with respect to Eps and MinPts. • Noise • Datapoint that does not belong to any cluster Motivating Concepts - Obstacle
Background Concepts (cont.) • Obstacle Constraints: • An Obstacle entity -Disconnectivity functionality • Grouping nearest data objects is not feasible • A polygon denoted by P(V, E) where V is a set of points from the polygon and E is a set of line segments • Types: Convex and Concave.
Background Concepts: Obstacle free density notions p • • • • • • Directly obstacle free density-reachable • • r • • • MinPts: 4 Eps: 2cm A point p is directly density reachable from a point q wrt. Eps, MinPts if p Neps(q) and an edge joining pand q is obstacle-free. • • • • • • q • Obstacle free density – reachable • • • p • • • A point p is density-reachable from a point q wrt. Eps, MinPts, if there is a chain of points p1 , …,pn,, p1 =q , pn =p such that pi is directly obstacle free density-reachable from pi+1. • •• • • • • • • • r q o • • Obstacle free density – connected • • • • • A point p is density-connected to a point q wrt. Eps, MinPts, if there is a point o such that both, p and q are obstacle free density-reachable from o. • • • • • p • • q
Background Concepts: DBCluC • Cluster • A non-empty subset of data points satisfying the following conditions: • 1)Maximality: ∀ p, q: if p C and q is obstacle free density-reachable from p with respect to Eps and MinPts, then q C. • 2) Connectivity. ∀ p, q C: p is obstacle free density-connected to q with respect to Eps and MinPts. • Noise • Datapoint that does not belong to any cluster Motivating Concepts - Obstacle
DBCluC(Density-Based Clustering with Constraints) • Introduction • Background Concepts • Modeling Constraints • DBCluC Algorithm • Performance Evaluation • Conclusion
Modeling Constraints – Obstacles Crossings • Modeling Obstacles • Objectives • Assign Disconnectivity Functionality. • Enhance performance of processing large number of obstacles by reducing search spaces. • Method: Polygon Reduction Algorithm • Observation • An obstacle is able to be modeled by a polygon. • A given polygon creates a set of visible spaces with respect to data objects to be clustered. • Goal • Maintain a set of visible spaces created by an obstacle associated with data objects. • Approach • Represents an obstacle as a set of Obstruction Lines.
Modeling Constraints • Polygon Reduction Algorithm • Two steps • Convexity Test • Construct obstruction lines • Convexity Test. • A pre-stage in order to determine if a polygon is a convex or a concave by checking the type of all points in the polygon. • Approaches • Turning Directional Approach • Assume points of a polygon is enumerated in an order: clockwise or counterclockwise • O(n) • Externality Approach • Check the relations between a polygon and an assessment edge that are “very” close to a query point • O(n2)
v3 v1 v2 Examples of Convexity Test- Turning Directional Approach
Convex point A point inside triangle area of the query point and two endpoints of an assessment edge Query point Convex point Query point Concave point Examples of Convexity Test – Externality Approach Query point Assessment edge
Modeling Constraints – Polygon Reduction Algorithm • Define the type of a polygon via Convexity Test • A polygon is concave if a concave point in the polygon. • A polygon is convex if points are convex points. • Convex - obstruction lines*. • Concave – The number of obstruction lines depends on a shape of a given polygon
vs1 vs6 vs5 vs4 vs2 vs3 Modeling Obstacles:An example 8 4
Entry Points Entry Edges Modeling Constraints – a crossing • Crossing Modeling • Objective • Efficiently assign connectivity functionality. • Method: A polygon with Entry Points and Entry Edge. • Defined by users’ or applications’ demands • Entry points modeled from a crossing connect reachable objects Eps
DBCluC(Density-Based Clustering with Constraints) • Introduction • Background Concepts • Modeling Constraints • DBCluC Algorithm • Performance Evaluation • Conclusion
DBCluC • DBCluC • Extension from DBSCAN • Start clustering from an arbitrary data point. • Indexing data points with SR-tree • K-NN Query and Range Query available. • Consider crossing constraints while (after) clustering. • Consider obstacles after retrieving neighbours of a given query point. • Visibility between a query point and its neighbours is checked for all obstacles. • Complexity • O( N ·logN ·L), where Nis the number of data points and Lis the number of obstruction lines.
DBCluC(Density-Based Clustering with Constraints) • Introduction • Background Concepts • Modeling Constraints • DBCluC Algorithm • Performance Evaluation • Conclusion
Performance • Performance Evaluation - based on synthetic data sets • Detecting arbitrary shaped clusters • Insensitive to data input order • Discriminating noise and outliers • Pruning search spaces
Performance (DS3) (a) Before clustering (b) Clustering ignoring constraints (c) Clustering with bridges (d) Clustering with obstacles (e) Clustering with obstacles and bridges
Performance (DS5) (a) Before clustering (b) Clustering ignoring constraints
Performance (DS5) (c) Clustering with bridges (d) Clustering with obstacles (e) Clustering with obstacles and bridges
Performance Time in second (a) Run time varying size of data objects
Performance (b) Run time varying size of obstacles
Conclusion • Propose a spatial clustering algorithm in the presence of Constraints: Obstacles and Crossings. • Modeling constraints • Obstacles • Polygon Reduction Algorithm. • Reduces search spaces allowing DBCluC to handle large number of obstacles • Crossing • Entry point and Entry edge. • Control connectivity flow • Experiments • Scalable, efficient, and effective.
Future Work • Indexing obstacles • Prune search spaces for large number of obstacles • Reduce the complexity of DBCluC to O(N•logN) • Extension to a high dimension with obstruction hyper planes • Consider the object altitude • Consider more constraints: Time, Length of a crossing, Direction of Crossing (one direction/bi-direction) • Extension to operational constraints
References [1] A. K. H. Tung, J. Hou, and J. Han, Spatial Clustering in the Presence of Obstacles, Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001. [2] Vladimir Estivill-Castro and IckJai Lee. Autoclust+: Automatic clustering of point-data sets in the presence of obstacles. In International Workshop on Temporal and Spatial and Spatio-Temporal Data Mining (TSDM2000), pages 133-146, 2000. [3] M.G. Stone. A mnemonic for areas of polygons. AMER. MATH. MONTHLY, 93:479-480, 1986. [4] Anthony K. H. Tung, Raymond T. Ng, Laks V. S. Lakshmanan, and Jiawei Han. Constraint-based clustering in large databases. In ICDT, pages 405-419, 2001. [5] Osmar R. Zaïane and Chi-Hoon Lee, Clustering Spatial Data in the Presence of Obstacles: a Density-Based Approach, Sixth International Database Engineering and Applications Symposium (IDEAS 2002), Edmonton, Alberta, Canada, July 17-19, 2002 [6] Osmar R. Zaïane, Andrew Foss, Chi-Hoon Lee, Weinan Wang, On Data Clustering Analysis: Scalability, Constraints and Validation, in Proc. of the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'02), pp 28-39, Taipei, Taiwan, May, 2002 [7] Osmar R. Zaïane, Chi-Hoon Lee, Clustering Spatial Data When Facing Physical Constraints, in Proc. of the IEEE 2001 International Conference on Data Mining (ICDM'2002), pp ??-??, Maebashi City, Japan, December 9 - 12, 2002
Visibility Graph from [1] v1 v4 v2 O1 O2 p q v3 v5
Delaunay diagram • Collection of edges satisfying an "empty circle" property: for each edge we can find a circle containing the edge's endpoints but not containing any other points. • Dual of Voronoi Diagram
S1 S5 e5 e1 e4 e2 S4 S2 e3 S3 Visible Space • Given a set D of n data objects with a polygon P(V, E), avisible spaceS is a space that has a set P of data objects satisfying the following • Space S is defined by three edges: the first edge(edges) e E connects two minimal convex points vi, vjV, the second edge f is the extension of the line connecting vi and its other adjacent point vk V, and the third edge g is the extension of the line connecting vj and its other adjacent vl V. • p,qP, p and qare visible to each other in S. Thus, P D • S is not visible to any other visible space S’. Thus, S’ S = S2 S3 S1 S4 S5 S3 S4 S5 S4 S5 S5