1 / 17

Maximizing Compute Efficiency through Parallel and Distributed Computing

Learn about the benefits of parallel and distributed computing in accelerating problem-solving, discussing key concepts, historical milestones, modern parallel computers, everyday applications, and programming strategies. Explore data dependence graphs, data parallelism, functional parallelism, and pipelining to optimize computational tasks. Discover how shared-memory programming simplifies concurrent processing and supports high-performance parallel programming. Find out how leveraging multi-core and GPU chips can lead to faster problem-solving, reduced design time, improved precision, and a competitive edge.

catarina
Download Presentation

Maximizing Compute Efficiency through Parallel and Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why Parallel/Distributed Computing Sushil K. Prasad sprasad@gsu.edu

  2. What is Parallel and Distributed computing? • Solving a single problem faster using multiple CPUs • E.g. Matrix Multiplication C = A X B • Parallel = Shared Memory among all CPUs • Distributed = Local Memory/CPU • Common Issues: Partition, Synchronization, Dependencies, load balancing

  3. Eniac (350 op/s) 1946 - (U.S. Army photo)

  4. ASCI White (10 teraops/sec 2006) Mega flops = 10^6 flops = 2^20 Giga = 10^9 = billion = 2^30 Tera = 10^12 = trillion = 2^40 Peta = 10^15 = quadrillion = 2^50 Exa = 10^18 = quintillion = 2^60

  5. Today - 2011 • 8 Peta flops = 10^15 flops • K computer ENIAC 350 flops 1946 65 Years of Speed Increases One Trillion Times Faster!

  6. Why Parallel and Distributed Computing? • Grand Challenge Problems • Weather Forecasting; Global Warming • Materials Design – Superconducting material at room temperature; nano-devices; spaceships. • Organ Modeling; Drug Discovery

  7. Why Parallel and Distributed Computing? • Physical Limitations of Circuits • Heat and light effect • Superconducting material to counter heat effect • Speed of light effect – no solution!

  8. Micros Speed (log scale) Supercomputers Mainframes Minis Time Microprocessor Revolution Moore's Law

  9. Why Parallel and Distributed Computing? • VLSI – Effect of Integration • 1 M transistor enough for full functionality - Dec’s Alpha (90’s) • Rest must go into multiple CPUs/chip • Cost – Multitudes of average CPUs give better FLPOS/$ compared to traditional supercomputers

  10. Modern Parallel Computers • Caltech’s Cosmic Cube (Seitz and Fox) • Commercial copy-cats • nCUBE Corporation (512 CPUs) • Intel’s Supercomputer Systems • iPSC1, iPSC2, Intel Paragon (512 CPUs) • Thinking Machines Corporation • CM2 (65K 4-bit CPUs) – 12-dimensional hypercube - SIMD • CM5 – fat-tree interconnect - MIMD • Tiahe-1a 4.7 petaflops, 14K Xeon X5670 and 7,168 Nvidia Tesla M2050 • K-computer 8 petaflops (10^15 FLOPS), 2011, 68 K 2.0GHz 8-core CPUs 548,352 cores;

  11. Why Parallel and Distributed Computing? • Everyday Reasons • Available local networked workstations and Grid resources should be utilized • Solve compute-intensive problems faster • Make infeasible problems feasible • Reduce design time • Leverage of large combined memory • Solve larger problems in same amount of time • Improve answer’s precision • Reduce design time • Gain competitive advantage • Exploit commodity multi-core and GPU chips • Find Jobs!

  12. Why Shared Memory programming? • Easier conceptual environment • Programmers typically familiar with concurrent threads and processes sharing address space • CPUs within multi-core chips share memory • OpenMP an application programming interface (API) for shared-memory systems • Supports higher performance parallel programming of symmetrical multiprocessors • Java threads • MPI for Distributed Memory Programming

  13. Seeking Concurrency • Data dependence graphs • Data parallelism • Functional parallelism • Pipelining

  14. Data Dependence Graph • Directed graph • Vertices = tasks • Edges = dependencies

  15. Data Parallelism • Independent tasks apply same operation to different elements of a data set • Okay to perform operations concurrently • Speedup: potentially p-fold, p #processors for i  0 to 99 do a[i]  b[i] + c[i] endfor

  16. Functional Parallelism • Independent tasks apply different operations to different data elements • First and second statements • Third and fourth statements • Speedup: Limited by amount of concurrent sub-tasks a  2 b  3 m  (a + b) / 2 s  (a2 + b2) / 2 v  s - m2

  17. Pipelining • Divide a process into stages • Produce several items simultaneously • Speedup: Limited by amount of concurrent sub-tasks = #of stages in the pipeline

More Related