200 likes | 305 Views
Speeding up k -Means by GPUs. YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010. Outline. Introduction Efficiency of data mining -> GPGPU -> k -means on GPU; Related work Method Research Plan. Efficiency of Data mining.
E N D
Speeding up k-Means by GPUs YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010
Outline • Introduction • Efficiency of data mining -> GPGPU -> k-means on GPU; • Related work • Method • Research Plan
Efficiency of Data mining • Face the challenge of efficiency due to the increasing data Parallel data mining Fig.2 Fig.1
Control ALU ALU ALU ALU DRAM Cache DRAM GPGPU • A general-purpose and high performance parallel hardware; • Supply another platform for parallelizing data mining algorithms. CPU Fig.3 GPU
k-means on GPU • Programming on GPU • CUDA: integrated CPU+GPU , C program • k-Means • Widely used in statistical data analysis, pattern recognition, etc.; • Easy to implement on CPU, suitable to implement on GPU;
Outline • Introduction • Related work • UV_k-Means, GPUMiner and HP_k-Means; • Method • Research Plan
Related work Speed of k-Means on low dimension data, in second. NVIDIA GTX 280 GPU; Intel(R) Core(TM) i5 CPU;
Outline • Introduction • Related work • Method and Results • k-Means(three steps)-> step 1 -> step 2 -> step 3; • Experiments; • Research Plan
k-Means algorithm n data point; k centroid; Step 1 O(nkd) Compute distanc (ni, ki) Step 2 O(nk) find the closest centroid Step 3 O(nd) compute new centroid Memory Mechanism If centroid change? Yes No End
Memory Mechanism of GPU • Global Memory • Large size • Long latency • Register • Small size • Short latency • User cannot control • Shared memory • Medium size • Short latency • User control
k-Means on GPU • Key idea • Increase the number of computing operation for each global memory access; • Adopts the method from matrix multiplication and reduction. • Dimension is a key parameter • For low dimension: use register; • For high dimension: use shared memory;
k-Means on GPU • For low dimension Read each data from global memory once
k-Means on GPU • For high dimension Read each data from global memory once
Experiments • The experiments were conducted on a PC with an NVIDIA GTX280 GPU and an Intel(R) Core(TM) i5 CPU. • GTX 280 has 30 SIMD multi-processors, and each one contains eight processors and performs at 1.29 GHz. The memory of the GPU is 1GB with the peak bandwidth of 141.7 GB/sec. • The CPU has four cores running at 2.67 GHz. The main memory is 8 GB with the peak bandwidth of 5.6 GB/sec. We use Visual Studio 2008 to write and compile all the source code. The version of CUDA is 2.3. • We calculate the time of the application after the file I/O, in order to show the speedup effect more clearly.
Experiments • On low dimension data • Compare with HP, UV and GPUMiner, the data is generated randomly Four to ten times faster than HP
Experiments • On high dimension data • Compare with UV and GPUMiner, the data is from KDD 1999. Four to eight times faster than UV
Experiments • Compare with CPU • The results illustrate that our algorithm compares very favorably with other existing algorithms. Forty to two hundred times faster than CPU version
Outline • Introduction • Related work • Method • Research Plan
Research Plan • Detail analysis about k-Means on GPU • GFLOPS • Deal with even larger data set • Other data mining algorithms on GPU • K-nn • SDP (widely used in protein identification )
Q & A • Thanks very much