1 / 46

Intro to Parallel and Distributed Processing

Intro to Parallel and Distributed Processing. Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License ). Slides from: Jimmy Lin The iSchool

desmondc
Download Presentation

Intro to Parallel and Distributed Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) Slides from: Jimmy Lin The iSchool University of Maryland

  2. Web-Scale Problems? • How to approach problems in: • Biocomputing • Nanocomputing • Quantum computing • … • It all boils down to… • Divide-and-conquer • Throwing more hardware at the problem • Clusters, Grids, Clouds Simple to understand… a lifetime to master…

  3. Cluster computing • Improve performance and availability over single computers • Compute nodes located locally, connected by fast LAN • A few nodes to large number • Group of linked computers working closely together, in parallel • Typically machines were homogeneous • Sometimes viewed as forming single computer

  4. Cluster computing Can be: • Tightly coupled • Share resources (memory, I/O) • Loosely coupled • Connected on network but don’t share resources

  5. Memory Typology: Shared Processor Processor Memory Processor Processor

  6. Memory Typology: Distributed Processor Memory Processor Memory Network Processor Memory Processor Memory

  7. Memory Typology: Hybrid Processor Memory Processor Memory Processor Processor Network Processor Memory Processor Memory Processor Processor

  8. Clusters • Nodes have computational devices and local storage, networked disk storage allows for sharing • Static system, centralized control • Multiple clusters at a datacenter • Google cluster 1000s servers • Racks 40-80 servers

  9. Grids • Grids: • Large number of users • Large volume of data • Large computational task involved • Connecting resources through a network

  10. Grid • Term Grid borrowed from electrical grid • Users obtains computing power through Internet by using Grid just like electrical power from any wall socket

  11. Grid Computing • Heterogeneous, geographically dispersed • Loosely coupled • Nodes do not share resources – connect by network WAN • Thousands of users worldwide • Can be formed from computing resources belonging to multiple organizations • Although can be dedicated to single application, typically for variety of purposes

  12. Grid • By connecting to a Grid, can get: • needed computing power • storage spaces and data • Specialized equipment • Middleware needed to use grid • Each user - a single login account to access all resources • Can connect a data center (cluster) to a grid • Compute grids, data grids

  13. Grids • Resources - owned by diverse organizations – Virtual Organization • Sharing of data and resources in dynamic, multi-institutional virtual organizations • Academia • Access to specialized resources not available locally

  14. An Illustrative Example • NASA research scientist: • collected microbiological samples in the tidewaters around Wallops Island, Virginia. • Needs: • high-performance microscope at National Center for Microscopy and Imaging Research (NCMIR), University of California, San Diego. • Samples sent to San Diego and used NPACI’s Telescience Grid and NASA’s Information Power Grid (IPG) to view and control the output of the microscope from her desk on Wallops Island. • Viewed the samples, and move platform holding them, making adjustments to the microscope.

  15. Example (continued) • The microscope produced a huge dataset of images • This dataset was stored using a storage resource broker on NASA’s IPG • Scientist was able to run algorithms on this very dataset while watching the results in real time

  16. EU Grid (European Grid Infrastructure) • SE – storage element, CE – computing element

  17. Grids • Domains as diverse as: • Global climate change • High energy physics • Computational genomics • Biomedical applications • Volunteer Grid Computing • SetiAtHome • World Community Grid • FoldingAtHome • EinsteinAtHome

  18. Grids • Grid is: • heterogeneous, geographically distributed, independent site • Gridware manages the resources for Grids • Grid differs from: • Parallel computing – grid is more than a single task on multiple machines • Distributed system – grid is more than distributing the load of a program across two or more processes • Cluster computing – grid is more than homogeneous sites connected by LAN (grid can be multiple clusters)

  19. Grid • Precursor to Cloud computing but no virtualization • Some say there wouldn’t be cloud computing if there hadn’t been grid computing “Virtualization distinguishes cloud from grid” Zissis

  20. Parallel/Distributed Background

  21. Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) Single (SD) Data Multiple (MD)

  22. SISD Processor D D D D D D D Instructions

  23. SIMD Processor D0 D0 D0 D0 D0 D0 D0 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D3 D3 D3 D3 D3 D3 D3 D4 D4 D4 D4 D4 D4 D4 … … … … … … … Dn Dn Dn Dn Dn Dn Dn Instructions

  24. MISD Processor D D D D D D D D Instructions

  25. MIMD Processor Processor D D D D D D D D D D D D D D Instructions Instructions Multi-threaded

  26. Stopped here for Exam 1

  27. History CPUs • Single CPU – inserted into single CPU socket on a motherboard • Wanted to increase capabilities: • Tried to add other CPU sockets, so multiple CPUs • Need additional hardware to connect them to RAM, resources • Lots of overhead

  28. Hyper-threading (HT) • Parallel computation to PCs – 2000 • Pentium 4 – single CPU core but HT • Appears as 2 logical CPUs • Can execute at same time but are sharing execution •  multiple instructions operated – separate modules - on separate data in parallel • MIMD • Need OS to take advantage of it • Logical core is the number of physical cores times the number of threads that can run on a machine

  29. Multi-core • Additional cores added • Single CPU socket, all cores on same chip so less latency • If dual core, CPU has 2 central processing units • Can run 2 processes at same time • Since all on same chip • No extra hardware needed to connect to RAM, etc. • faster communication

  30. vCPU • Virtual Processor • VM assigned to a vCPU, share of a physical CPU assigned to a VM • vCPU is a series of time slots on logical processors • Adding more vCPUs can increase wait time

  31. How to increase vCPUs • If a vCPU is presented to OS as single core CPU in single socket, limits number of vCPUs • Typically OS restrict number of physical CPUs not logical CPUs • Multicore vCPUs - VMware introduced virtual sockets and cores per socket, e.g. 2 virtual sockets and 4 virtual cores per socket, allows 8vCPUs • vCPUs part of Simultaneous multithreading SMP compute model (replaced • Allow threads to be split across multiple physical or logical cores

  32. Multiple threads on a VM • If 2 hyper threads each on 4 cores, VM thinks have 8 cores • If want all 8 threads at once?

  33. Each thread at 50% utilization • Each physical CPU at 100% • In the cloud: • Avoid creating a single VM with more vCPUs than physical cores • If need more, split up VM

  34. The Cloud - Parallelization

  35. Divide and ConquerSIMD and MIMD “Work” Partition w1 w2 w3 “worker” “worker” “worker” r1 r2 r3 Combine “Result”

  36. Different Workers • Different threads in the same core • Different cores in the same CPU • Different CPUs in a multi-processor system • Different machines in a distributed system

  37. Choices, Choices, Choices • Commodity vs. “exotic” hardware • Number of machines vs. processor vs. cores • Bandwidth of memory vs. disk vs. network • Different programming models

  38. Parallelization Problems • How do we assign work units to workers? • What if we have more work units than workers? • How do we know all the workers have finished? • What if workers need to share partial results? • How do we aggregate partial results? • What if workers die? “Work” Partition w1 w2 w3 “worker” “worker” “worker” r1 r2 r3 What is the common theme of all of these problems? Combine “Result”

  39. General Theme? • Parallelization problems arise from: • Communication between workers • Access to shared resources (e.g., data) • Thus, we need a synchronizationsystem! • This is tricky: • Finding bugs is hard • Solving bugs is even harder

  40. Managing Multiple Workers • Difficult because • (Often) don’t know the order in which workers run • (Often) don’t know where the workers are running • (Often) don’t know when workers interrupt each other • Thus, we need what structures?: From your OS class • Semaphores (lock, unlock) • Conditional variables (wait, notify, broadcast) • Barriers • Still, lots of problems: • Deadlock, livelock, race conditions, ... • Moral of the story: be careful! • Even trickier if the workers are on different machines

  41. Patterns for Parallelism • Parallel computing has been around for decades • Here are some “design patterns” …

  42. Master/Slaves master GFS slaves

  43. Producer/Consumer P C P/C P/C GFS P/C Bounded buffer

  44. Work Queues P C shared queue P C W W W W W P C Multiple producers/consumers

  45. Basics • Divide and conquer • Partition large problem into smaller subproblems • Worker works on subproblems in parallel • Threads in a core, cores in multi-core processor, multiple processor in a machine, machines in a cluster

  46. Rubber Meets Road • From patterns to implementation: • pthreads, OpenMP for multi-threaded programming • MPI (message passing interface) for clustering computing • … • The reality: • Lots of one-off solutions, custom code • Write your own dedicated library, then program with it • Burden on the programmer to explicitly manage everything • MapReduce to the rescue!

More Related