Exploiting Computing Power of GPU for Data Mining Application

Exploiting Computing Power of GPU for Data Mining Application Wenjing Ma, Leonid Glimcher, Gagan Agrawal

Outline of contents • Background of GPU computing • Parallel data mining • Challenges of data mining on GPU • GPU implementation • k-means • EM • kNN • Apriori • Experiment results • Results of kmeans and EM • Features of applications that are suitable for GPU computing • Related and future work

Background of GPU computing • Multi-core architectures are becoming more popular in high performance computing • GPU is inexpensive and fast • CUDA is a high level language that supports programming on GPU

CUDA functions • Host function • Called by host and executed on host • Global function • Called by host and executed on device • Device function • Called by device and executed on device

Architecture of GeForce 8800 GPU (1 multiprocessor)

Parallel data mining • Common structure of data mining applications (adopted from Freeride) • { * Outer Sequential Loop * } • While () { • { * Reduction Loop * } • Foreach (element e) { • (i,val) = process(e); • Reduc(i) = Reduc(i) op val; • } • }

Challenges of data mining on GPU • SIMD shared memory programming • 3 steps involved in the main loop • Data read • Computing update • Writing update

Computing update copy common variables from device memory to shared memory nBlocks = blockSize/ thread number For i=1 to nBlocks { each thread process 1 data element } Global reduction

GPU Implementation • k-means • Data are points (say, 3 dimension) • Start with k clusters • Find the nearest cluster for each point • determine the k centroids from the points assigned to the corresponding center • Repeat until the assignments of points don’t change

GPU version of kmeans Device function: Shared_memory center nBlocks = blockSize / thread_number tid = thread_ID For i = 1 to nBlocks min = 0; For j = 1 to k dis = distance(data[tid], center[j]) if (dis < min) min = dis min index = i update[tid][min index] (data[tid],dis) Thread 0 combines all copies of update

Other applications • EM • E step and M step, different amount of computation • Apriori • Tree-structured reduction objects • Large amount of updates • kNN

Experiment results • k-means and EM has the best performance when using 512 threads/block and 16 or 32 thread blocks • kNN and apriori hardly get good speedup with GPU

k-means(10MB points)

k-means (continued)(20MB points)

EM (continued)(512K points)

EM (continued)(1M points)

Features of applications that are suitable for GPU computing • the time spent on processing the data must dominate the I/O cost • the size of the reduction object needs to be small enough to have a replica for each thread in device memory • using the shared memory to store frequently accessed data

the time spent on processing the data must dominate the I/O cost I/O computing

the size of the reduction object needs to be small enough to have a replica for each thread in device memory • No locking mechanism on GPU • The access to the reductionobjects are unpredictable

using the shared memory to store frequently accessed data • Accessing device memory is very time consuming • Shared memory serves as a high speed cache • For non-read-only data elements on shared memory, we also need replica for each thread

Related work • Freeride • Other GPU computing languages • The usage of GPU computation in scientific computing

Future work • Middleware for data mining on GPU • Provide some compilation mechanism for data mining applications on MATLAB • Enable tuning of parameters that can optimize GPU computing

Thank you! Questions?

Exploiting Computing Power of GPU for Data Mining Application

Exploiting Computing Power of GPU for Data Mining Application

Presentation Transcript

Enhancing GPU for Scientific Computing

Cloud Computing, Data Mining and Cyberinfrastructure

general Purpose GPU computing

GPU based cloud computing

GPU Computing

Optimizations Techniques for GPU Computing

Approaches to GPU Computing

GPU Computing Techniques

GPU Parallel Computing

Parallel and Distributed Computing for Data Mining

Data Mining Application: CART

Automatic Subspace Clustering Of High Dimensional Data For Data Mining Application

Scan Primitives for GPU Computing

CUDA GPU Computing

Grid Computing in Data Mining and Data Mining on Grid Computing

GPU Computing

GPU Computing

Application of Web Data Mining in CRM

Application of Data Mining to Physics Summary Data

Grid Computing in Data Mining and Data Mining on Grid Computing

Techniques For Exploiting Unlabeled Data