Mr. Scan: Efficient Clustering with MRNet and GPUs

Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben Welton

Density-based clustering • Discovers the number of clusters • Finds oddly-shaped clusters Mr. Scan: Efficient Clustering with MRNet and GPUs

Clustering Example (DBSCAN[1]) Goal: Find regions that meet minimum density and spatial distance characteristics The two parameters that determine if a point is in a cluster is Epsilon (Eps), and MinPts If the number of points in Eps is >MinPts, the point is a core point. For every discovered point, this same calculation is performed until the cluster is fully expanded Eps MinPts MinPts: 3 [1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996) Mr. Scan: Efficient Clustering with MRNet and GPUs

Scaling DBSCAN • PDBSCAN (1999)[2] • Qualityequivalent to single DBSCAN • Linearspeedup up to 8 nodes • DBDC (2004)[3] • Sacrifices quality • ~30x speedup on 15 nodes • PDSDBSCAN (2012) [4] • Quality equivalent to single node DBSCAN • 5675x Speedup on 8192 nodes (72 Million Points) • 2 Map/Reduce attempts (2011, 2012) • Quality equivalent to single node DBSCAN • 6x speedup on 12 nodes [2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999) [3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004) [4] M Patwary et. al., A new scalable parallel DBSCAN algorithm using the disjoint-set data structure (2012) Mr. Scan: Efficient Clustering with MRNet and GPUs

Challenges of scaling DBSCAN • Data distribution • How do we effectively take an input file and create partitions that can be clustered by DBSCAN? • Distributed 2-D partitioner reading from a distributed file system • Load balancing • How to keep variance in clustering times across nodes to a minimum? • Dense Box • Merge • How do we reduce the amount of data needed for the merge while keeping accuracy high? • Representative points Mr. Scan: Efficient Clustering with MRNet and GPUs

BE BE BE BE app app app app app app app app app app app app app app app app MRNet – Multicast / Reduction Network FE • General-purpose TBON API • Network: user-defined topology • Stream: logical data channel • to a set of back-ends • multicast, gather, and custom reduction • Packet: collection of data • Filter: stream data operator • synchronization • transformation • Widely adopted by HPC tools • CEPBA toolkit • Cray ATP & CCDB • Open|SpeedShop & CBTF • STAT • TAU F(x1,…,xn) CP CP CP CP CP CP … … … Mr. Scan: Efficient Clustering with MRNet and GPUs

BE BE BE BE app app app app app app app app app app app app app app app app TBON Computation • Ideal Characteristics: • Filter output size • constant or decreasing • Computation rate • similar across levels • Adjustable for load • balance Total Time: ~30 sec Total Time: ~60 sec FE ~10 sec Packet Size: ≤10 MB CP CP ~10 sec Packet Size: ≤10 MB 4x … ~10 sec ~40 sec ~10 sec Data Size: 10MB per BE Mr. Scan: Efficient Clustering with MRNet and GPUs

Intro to Mr. Scan Merge Mr. Scan Phases Partition: Distributed DBSCAN: GPU(@ BE) Merge: CPU (x #levels) Sweep: CPU (x #levels) FE Merge Sweep CP CP DBSCAN Sweep BE BE BE BE FE BE BE BE BE FS Mr. Scan: Efficient Clustering with MRNet and GPUs

Mr. Scan Architecture FS Read 224 Secs FS Write 489 Secs FS Read: 24 Secs Partitioner MRNet Startup 130 Secs DBSCAN 168 Secs DBSCAN Write Output: 19 Secs Merge Time: 6 Secs Merge & Sweep Sweep Time: 4 Secs Time: 0 Clustering 6.5 Billion Points Time: 18.2 Min Mr. Scan: Efficient Clustering with MRNet and GPUs

Partition Phase • Goal: Partitions computationally equivalent to DBSCAN • Algorithm: • Form initial partitions • Add shadow regions • Rebalance Mr. Scan: Efficient Clustering with MRNet and GPUs

Distributed Partitioner Mr. Scan: Efficient Clustering with MRNet and GPUs

GPU DBSCAN Filter DBSCAN is performed in two distinct steps Step 2: Expand core points and color Step 1: Detect Core Points Block 1 Block 1 T 512 T 512 T 1 T 1 T 2 T 2 Block 2 Block 2 T 512 T 1 T 2 T 512 T 1 T 2 Block 900 Block 900 T 512 T 1 T 2 T 512 T 1 T 2 Mr. Scan: Efficient Clustering with MRNet and GPUs

Dense Box • We reduce the computation cost of high density regions by pre-clustering these regions • One significant scalability issue is dealing with dense regions of data • Density increases the computation cost of DBSCAN KD-Tree R2 R1 ` R2 Requires more comparison operations Look at each leaf bounding box looking for boxes with point count > minpts and size < 0.35 * eps DBSCAN no longer needs to expand these regions Mr. Scan: Efficient Clustering with MRNet and GPUs

Merge Algorithm • Merge overlapping clusters found on different nodes. • Two steps in the merge operation • Select Representative points (BE) • Merge operation Mr. Scan: Efficient Clustering with MRNet and GPUs

Representative Points • These are points that represent the core points in the dataset. • Create a boundarywhich at least one core point shared between overlapping clusters must be contained. These points create a boundary(shaded region) which a point must fall in to merge overlapping clusters Representative points are the points closest to the corners and middle of the side of the eps box Mr. Scan: Efficient Clustering with MRNet and GPUs

Merge Algorithm Core Point Core Point Non-Core Point Non-Core Point • Merge algorithm is responsible for merging overlapping clusters detected on different DBSCAN nodes. • Need to handle the merge with low overhead and without the full dataset 2. Non-core/Core overlap 1. Core/Core overlap Node 1 Node 1 Node 2 Node 2 Core point seen as non-core by one node. MinPts * 2 operations required to detect Core Point in common. 64 operations to detect. Mr. Scan: Efficient Clustering with MRNet and GPUs

Sweep Step • Get cluster identifiers and file offsets down to BE’s to write final clusters. • FE gives each cluster a unique ID and a file offset. • This data is passed back down to the BE that holds the data in the cluster. • Data is written out to disk by the BE. Mr. Scan: Efficient Clustering with MRNet and GPUs

Experiment Setup • Dataset: Generated data with distribution from real Twitter data • Measuring: • Weak Scaling up to 8192 GPUs • Strong Scaling • Quality compared to single-threaded DBSCAN Mr. Scan: Efficient Clustering with MRNet and GPUs

Results Weak Scaling: 4096x data/compute increase 18.48x-31.68x time increase Mr. Scan: Efficient Clustering with MRNet and GPUs

Results Breakdown – Partition Phase @ 6.5 Billion Points: 65.9% of Mr. Scan’s time 94.6% I/O time Mr. Scan: Efficient Clustering with MRNet and GPUs

Results Breakdown – GPU Cluster Time Mr. Scan: Efficient Clustering with MRNet and GPUs

Strong Scaling Mr. Scan: Efficient Clustering with MRNet and GPUs

Quality Mr. Scan: Efficient Clustering with MRNet and GPUs

Future Work • Remove partitioner’s I/O bottleneck • Multiple dimensions Mr. Scan: Efficient Clustering with MRNet and GPUs

Conclusion • Clustered 6.5 billion points with DBSCAN in 18.2 minutes • Controlled computational variance of DBSCAN • PartitionerI/O = scaling enemy Mr. Scan: Efficient Clustering with MRNet and GPUs

Questions? A Brief Discussion of Ways and Means

Summary of previous Mr. Scan implementation Algorithm Steps SpatialDecomp: CPU(@ FE) DBSCAN: CPU or GPU(@ BE) DrawBoundBox:CPU or GPU MergeCluster:CPU (x #levels) FE MergeCluster CP CP DBSCAN BE BE BE BE Mr. Scan: Efficient Clustering with MRNet and GPUs

Mr. Scan: Efficient Clustering with MRNet and GPUs

Mr. Scan: Efficient Clustering with MRNet and GPUs

Presentation Transcript

Efficient Binary Translation In Co-Designed Virtual Machines

EKG

BINF636 Clustering and Classification

Clustering of non-numerical data

Thyroid Scan

Lecture 9: Gene expression analysis/Clustering

Semi-Supervised Clustering and its Application to Text Clustering and Record Linkage

Chapter 9

807 - TEXT ANALYTICS

Clustering IV

الجلسة الرابعة التحليل العنقودي Clustering Analysis تشرح لكل الفئات

Recursive Bipartite Spectral Clustering for Document Categorization

Clustering Methods

SFCC Environmental Scan

Polygon Scan Conversion

Small Galaxy Groups Clustering and the Evolution of Galaxy Clustering

BUSC 185-

A 型 (Amplitude mode)

Segmentation and Clustering

Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering