1 / 24

By Ravi, Ma, Chiu, & Agrawal Presented by Julian Bui

This study explores compiler and runtime support for heterogeneous parallel computations with a focus on K-Means Clustering and Principal Components Analysis, aiming to enhance programmability, performance, and effective work distribution. The key ideas, results, and system architecture regarding work distribution methods are discussed. Experiments on AMD Opteron 8350 and GeForce 9800 GTX machines are conducted to analyze the impact of GPU chunk sizes and thread numbers on performance. Study presented by Julian Bui.

Download Presentation

By Ravi, Ma, Chiu, & Agrawal Presented by Julian Bui

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Computations” By Ravi, Ma, Chiu, & Agrawal Presented by Julian Bui

  2. Outline • Goals • The focus problems • Key ideas • Results

  3. Goals • Programmability • Performance • Effective work distribution

  4. The Focus Problems • K-Means Clustering • Principal Components Analysis

  5. K-means clustering - 1

  6. K-means clustering - 2

  7. K-means clustering - 3

  8. K-means clustering - 4

  9. K-Means Clustering • K-Means Clustering 1. Randomly guess k cluster center locations 2. Each datapoint figures out which cluster center it's closest to 3. Center now "owns" a set of points 4. Cluster calculates the actual center of the points it owns 5. Cluster's center point now becomes the actual center point 6. Repeat steps 2-5 until the cluster's datset doesn't change between iterations

  10. K-Means Clustering Centroid of the cluster For all clusters

  11. Principal Component Analysis (PCA) • Goals • Dimensionality Reduction • INPUT: set of M-dimensional pts • OUTPUT: set of D-dimensional pts where D << M • Extract patterns in the data, machine learning • Transforms possibly correlated data into a smaller number of uncorrelated data • Principal components account for the greatest possible statistical variability

  12. PCA • Principal components are found by extracting eigenvectors from the covariance matrix of the data

  13. PCA - Simple Example

  14. PCA – Facial Recognition

  15. So how does this work apply to those problems? • Code generator input: • Reduction function • Variable list (data to be computed) • Code Generator output • Host functions • Kernel code

  16. System Architecture

  17. Work Distribution • Work Sharing vs. Work Stealing • Uniform vs. Non-uniform chunk sizes

  18. Experiments Machine: AMD Opteron 8350 w/ 8 cores and 16 GB of main memory, GeForce9800 GTX w/ 512MB memory K-Means: K = 125, 6.4GB file, 100M 3D points PCA: 8.5GB data set, covariance matrix width of 64

  19. GPU Chunk Size & Num. Threads vs. Performance

  20. K-Means, Hetero., Uni vs. Non-uniform chunk sizes

  21. PCA, Hetero., Uni vs. Non-uniform chunk sizes

  22. Idle Time

  23. Work Distribution – K-Means

  24. Work Distribution – PCA

More Related