270 likes | 421 Views
Mr. Scan: Efficient Clustering with MRNet and GPUs. Evan Samanas and Ben Welton. Density-based clustering. Discovers the number of clusters Finds oddly-shaped clusters. Clustering Example (DBSCAN [1] ). Goal: Find regions that meet minimum density and spatial distance characteristics.
E N D
Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben Welton
Density-based clustering • Discovers the number of clusters • Finds oddly-shaped clusters Mr. Scan: Efficient Clustering with MRNet and GPUs
Clustering Example (DBSCAN[1]) Goal: Find regions that meet minimum density and spatial distance characteristics The two parameters that determine if a point is in a cluster is Epsilon (Eps), and MinPts If the number of points in Eps is >MinPts, the point is a core point. For every discovered point, this same calculation is performed until the cluster is fully expanded Eps MinPts MinPts: 3 [1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996) Mr. Scan: Efficient Clustering with MRNet and GPUs
Scaling DBSCAN • PDBSCAN (1999)[2] • Qualityequivalent to single DBSCAN • Linearspeedup up to 8 nodes • DBDC (2004)[3] • Sacrifices quality • ~30x speedup on 15 nodes • PDSDBSCAN (2012) [4] • Quality equivalent to single node DBSCAN • 5675x Speedup on 8192 nodes (72 Million Points) • 2 Map/Reduce attempts (2011, 2012) • Quality equivalent to single node DBSCAN • 6x speedup on 12 nodes [2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999) [3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004) [4] M Patwary et. al., A new scalable parallel DBSCAN algorithm using the disjoint-set data structure (2012) Mr. Scan: Efficient Clustering with MRNet and GPUs
Challenges of scaling DBSCAN • Data distribution • How do we effectively take an input file and create partitions that can be clustered by DBSCAN? • Distributed 2-D partitioner reading from a distributed file system • Load balancing • How to keep variance in clustering times across nodes to a minimum? • Dense Box • Merge • How do we reduce the amount of data needed for the merge while keeping accuracy high? • Representative points Mr. Scan: Efficient Clustering with MRNet and GPUs
BE BE BE BE app app app app app app app app app app app app app app app app MRNet – Multicast / Reduction Network FE • General-purpose TBON API • Network: user-defined topology • Stream: logical data channel • to a set of back-ends • multicast, gather, and custom reduction • Packet: collection of data • Filter: stream data operator • synchronization • transformation • Widely adopted by HPC tools • CEPBA toolkit • Cray ATP & CCDB • Open|SpeedShop & CBTF • STAT • TAU F(x1,…,xn) CP CP CP CP CP CP … … … Mr. Scan: Efficient Clustering with MRNet and GPUs
BE BE BE BE app app app app app app app app app app app app app app app app TBON Computation • Ideal Characteristics: • Filter output size • constant or decreasing • Computation rate • similar across levels • Adjustable for load • balance Total Time: ~30 sec Total Time: ~60 sec FE ~10 sec Packet Size: ≤10 MB CP CP ~10 sec Packet Size: ≤10 MB 4x … ~10 sec ~40 sec ~10 sec Data Size: 10MB per BE Mr. Scan: Efficient Clustering with MRNet and GPUs
Intro to Mr. Scan Merge Mr. Scan Phases Partition: Distributed DBSCAN: GPU(@ BE) Merge: CPU (x #levels) Sweep: CPU (x #levels) FE Merge Sweep CP CP DBSCAN Sweep BE BE BE BE FE BE BE BE BE FS Mr. Scan: Efficient Clustering with MRNet and GPUs
Mr. Scan Architecture FS Read 224 Secs FS Write 489 Secs FS Read: 24 Secs Partitioner MRNet Startup 130 Secs DBSCAN 168 Secs DBSCAN Write Output: 19 Secs Merge Time: 6 Secs Merge & Sweep Sweep Time: 4 Secs Time: 0 Clustering 6.5 Billion Points Time: 18.2 Min Mr. Scan: Efficient Clustering with MRNet and GPUs
Partition Phase • Goal: Partitions computationally equivalent to DBSCAN • Algorithm: • Form initial partitions • Add shadow regions • Rebalance Mr. Scan: Efficient Clustering with MRNet and GPUs
Distributed Partitioner Mr. Scan: Efficient Clustering with MRNet and GPUs
GPU DBSCAN Filter DBSCAN is performed in two distinct steps Step 2: Expand core points and color Step 1: Detect Core Points Block 1 Block 1 T 512 T 512 T 1 T 1 T 2 T 2 Block 2 Block 2 T 512 T 1 T 2 T 512 T 1 T 2 Block 900 Block 900 T 512 T 1 T 2 T 512 T 1 T 2 Mr. Scan: Efficient Clustering with MRNet and GPUs
Dense Box • We reduce the computation cost of high density regions by pre-clustering these regions • One significant scalability issue is dealing with dense regions of data • Density increases the computation cost of DBSCAN KD-Tree R2 R1 ` R2 Requires more comparison operations Look at each leaf bounding box looking for boxes with point count > minpts and size < 0.35 * eps DBSCAN no longer needs to expand these regions Mr. Scan: Efficient Clustering with MRNet and GPUs
Merge Algorithm • Merge overlapping clusters found on different nodes. • Two steps in the merge operation • Select Representative points (BE) • Merge operation Mr. Scan: Efficient Clustering with MRNet and GPUs
Representative Points • These are points that represent the core points in the dataset. • Create a boundarywhich at least one core point shared between overlapping clusters must be contained. These points create a boundary(shaded region) which a point must fall in to merge overlapping clusters Representative points are the points closest to the corners and middle of the side of the eps box Mr. Scan: Efficient Clustering with MRNet and GPUs
Merge Algorithm Core Point Core Point Non-Core Point Non-Core Point • Merge algorithm is responsible for merging overlapping clusters detected on different DBSCAN nodes. • Need to handle the merge with low overhead and without the full dataset 2. Non-core/Core overlap 1. Core/Core overlap Node 1 Node 1 Node 2 Node 2 Core point seen as non-core by one node. MinPts * 2 operations required to detect Core Point in common. 64 operations to detect. Mr. Scan: Efficient Clustering with MRNet and GPUs
Sweep Step • Get cluster identifiers and file offsets down to BE’s to write final clusters. • FE gives each cluster a unique ID and a file offset. • This data is passed back down to the BE that holds the data in the cluster. • Data is written out to disk by the BE. Mr. Scan: Efficient Clustering with MRNet and GPUs
Experiment Setup • Dataset: Generated data with distribution from real Twitter data • Measuring: • Weak Scaling up to 8192 GPUs • Strong Scaling • Quality compared to single-threaded DBSCAN Mr. Scan: Efficient Clustering with MRNet and GPUs
Results Weak Scaling: 4096x data/compute increase 18.48x-31.68x time increase Mr. Scan: Efficient Clustering with MRNet and GPUs
Results Breakdown – Partition Phase @ 6.5 Billion Points: 65.9% of Mr. Scan’s time 94.6% I/O time Mr. Scan: Efficient Clustering with MRNet and GPUs
Results Breakdown – GPU Cluster Time Mr. Scan: Efficient Clustering with MRNet and GPUs
Strong Scaling Mr. Scan: Efficient Clustering with MRNet and GPUs
Quality Mr. Scan: Efficient Clustering with MRNet and GPUs
Future Work • Remove partitioner’s I/O bottleneck • Multiple dimensions Mr. Scan: Efficient Clustering with MRNet and GPUs
Conclusion • Clustered 6.5 billion points with DBSCAN in 18.2 minutes • Controlled computational variance of DBSCAN • PartitionerI/O = scaling enemy Mr. Scan: Efficient Clustering with MRNet and GPUs
Questions? A Brief Discussion of Ways and Means
Summary of previous Mr. Scan implementation Algorithm Steps SpatialDecomp: CPU(@ FE) DBSCAN: CPU or GPU(@ BE) DrawBoundBox:CPU or GPU MergeCluster:CPU (x #levels) FE MergeCluster CP CP DBSCAN BE BE BE BE Mr. Scan: Efficient Clustering with MRNet and GPUs