Ubiquitous Parallelism

Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?

Agenda • Introduction/Motivation • Why Parallelism? Why now? • Survey of Parallel Hardware • CPUs vs. GPUs • Conclusion • How Can I Start?

Talk Goal • Encourage undergraduates to answer the call to the era of parallelism • Education • Software Engineering

Why Parallelism? Why now? • You’ve already been exposed to parallelism • Bit Level Parallelism • Instruction Level Parallelism • Thread Level Parallelism

Why Parallelism? Why now? • Single-threaded performance has plateaued • Silicon Trends • Power Consumption • Heat Dissipation

Why Parallelism? Why now?

Power Chart: P = CV2F

Heat Chart (Feature Size)

Why Parallelism? Why now? • Issue: Power & Heat • Good: Cheaper to have more cores, but slower • Bad: Breaks hardware/software contract

Why Parallelism? Why now? • Hardware/Software Contract • Maintain backwards-compatibility with existing codes

Why Parallelism? Why now?

Personal Mobile Device Space iPhone 5 Galaxy S3

Personal Mobile Device Space 2 CPU cores/ 3 GPU cores iPhone 5 Galaxy S3

Personal Mobile Device Space 2 CPU cores/ 3 GPU cores 4 CPU cores/ 4 GPU cores iPhone 5 Galaxy S3

Desktop Space

Desktop Space 16 CPU cores • Rare To Have “Single Core” CPU • Clock Speeds < 3.0 GHz • Power Wall • Heat Dissipation AMD Opteron 6272

Desktop Space • General Purpose • Power Efficient • High Performance • Not All Problems Can Be Done on GPU 2048 GPU Cores AMD Radeon 7970

Warehouse Space (HokieSpeed) • Each node: • 2x Intel Xeon 5645 (6 cores each) • 2x NVIDIA C2050 (448 GPUs each)

Warehouse Space (HokieSpeed) • Each node: • 2x Intel Xeon 5645 (6 cores each) • 2x NVIDIA C2050 (448 GPUs each) • 209 nodes

Warehouse Space (HokieSpeed) • Each node: • 2x Intel Xeon 5645 (6 cores each) • 2x NVIDIA C2050 (448 GPUs each) • 209 nodes • 2508 CPU cores • 187264 GPU cores

All Spaces

Convergence in Computing • Three Classes: • Warehouse • Desktop • Personal Mobile Device • Main Criteria • Power, Performance, Programmability

What is a CPU? • CPU • SR71 Jet • Capacity • 2 passengers • Top Speed • 2200 mph

What is the GPU? • GPU • Boeing 747 • Capacity • 605 passengers • Top Speed • 570 mph

CPU vs. GPU

CPU Architecture • Latency Oriented (Speculation)

GPU Architecture

APU = CPU + GPU • Accelerated Processing Unit • Both CPU + GPU on the same die

CPUs, GPUs, APUs • How to handle parallelism? • How to extract performance? • Can I just throw processors at a problem?

CPUs, GPUs, APUs • Multi-threading (2-16 threads) • Massive multi-threading (100,000+) • Depends on Your Problem

How Can I start? • CUDA Programming • You most likely have a CUDA enabled GPU if you have a recent NVIDIA card

How Can I start? • CPU or GPU Programming • Use OpenCL (your laptop could potentially run)

How Can I start? • Undergraduate research • Senior/Grad Courses: • CS 4234 – Parallel Computation • CS 5510 – Multiprocessor Programming • ECE 4504/5504 – Computer Architecture • CS 5984 – Advanced Computer Graphics

In Summary … • Parallelism is here to stay • How does this affect you? • How fast is fast enough? • Are we content with current computer performance?

Thank you! • Carlo del Mundo, • Senior, Computer Engineering • Website: http://filebox.vt.edu/users/cdel/ • E-mail: cdel@vt.edu Previous Internships @

Appendix

Programming Models • pthreads • MPI • CUDA • OpenCL

pthreads • A UNIX API to create and destroy threads

MPI • A communications protocol • “Send and Receive” messages between nodes

CUDA • Massive multi-threading (100,000+) • Thread-level parallelism

OpenCL • Heterogeneous programming model that is catered to several devices (CPUs, GPUs, APUs)

Comparisons † Productivity is subjective and draws from my experiences

Parallel Applications • Vector Add • Matrix Multiplication

Vector Add

Vector Add • Serial • Loop N times • N cycles† • Parallel • Assume you have N cores • 1 cycles† † Assume 1 add = 1 cycle

Matrix Multiplication

Ubiquitous Parallelism

Ubiquitous Parallelism

Presentation Transcript

Parallelism

Parallelism

Parallelism

Parallelism

Parallelism

parallelism

Parallelism

Parallelism

Parallelism

Parallelism

parallelism

Parallelism

Ubiquitous

Domain-Specific Languages for Ubiquitous Parallelism

Parallelism

Parallelism: Avoiding Faulty Parallelism

Parallelism

PARALLELISM PARALLELISM PARALLELISM

Parallelism

Parallelism

Parallelism