Intro to Parallel and Distributed Processing

Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) Slides from: Jimmy Lin The iSchool University of Maryland

Web-Scale Problems? • How to approach problems in: • Biocomputing • Nanocomputing • Quantum computing • … • It all boils down to… • Divide-and-conquer • Throwing more hardware at the problem • Clusters, Grids, Clouds Simple to understand… a lifetime to master…

Cluster computing • Improve performance and availability over single computers • Compute nodes located locally, connected by fast LAN • A few nodes to large number • Group of linked computers working closely together, in parallel • Typically machines were homogeneous • Sometimes viewed as forming single computer

Cluster computing Can be: • Tightly coupled • Share resources (memory, I/O) • Loosely coupled • Connected on network but don’t share resources

Memory Typology: Shared Processor Processor Memory Processor Processor

Memory Typology: Distributed Processor Memory Processor Memory Network Processor Memory Processor Memory

Memory Typology: Hybrid Processor Memory Processor Memory Processor Processor Network Processor Memory Processor Memory Processor Processor

Clusters • Nodes have computational devices and local storage, networked disk storage allows for sharing • Static system, centralized control • Multiple clusters at a datacenter • Google cluster 1000s servers • Racks 40-80 servers

Grids • Grids: • Large number of users • Large volume of data • Large computational task involved • Connecting resources through a network

Grid • Term Grid borrowed from electrical grid • Users obtains computing power through Internet by using Grid just like electrical power from any wall socket

Grid Computing • Heterogeneous, geographically dispersed • Loosely coupled • Nodes do not share resources – connect by network WAN • Thousands of users worldwide • Can be formed from computing resources belonging to multiple organizations • Although can be dedicated to single application, typically for variety of purposes

Grid • By connecting to a Grid, can get: • needed computing power • storage spaces and data • Specialized equipment • Middleware needed to use grid • Each user - a single login account to access all resources • Can connect a data center (cluster) to a grid • Compute grids, data grids

Grids • Resources - owned by diverse organizations – Virtual Organization • Sharing of data and resources in dynamic, multi-institutional virtual organizations • Academia • Access to specialized resources not available locally

An Illustrative Example • NASA research scientist: • collected microbiological samples in the tidewaters around Wallops Island, Virginia. • Needs: • high-performance microscope at National Center for Microscopy and Imaging Research (NCMIR), University of California, San Diego. • Samples sent to San Diego and used NPACI’s Telescience Grid and NASA’s Information Power Grid (IPG) to view and control the output of the microscope from her desk on Wallops Island. • Viewed the samples, and move platform holding them, making adjustments to the microscope.

Example (continued) • The microscope produced a huge dataset of images • This dataset was stored using a storage resource broker on NASA’s IPG • Scientist was able to run algorithms on this very dataset while watching the results in real time

EU Grid (European Grid Infrastructure) • SE – storage element, CE – computing element

Grids • Domains as diverse as: • Global climate change • High energy physics • Computational genomics • Biomedical applications • Volunteer Grid Computing • SetiAtHome • World Community Grid • FoldingAtHome • EinsteinAtHome

Grids • Grid is: • heterogeneous, geographically distributed, independent site • Gridware manages the resources for Grids • Grid differs from: • Parallel computing – grid is more than a single task on multiple machines • Distributed system – grid is more than distributing the load of a program across two or more processes • Cluster computing – grid is more than homogeneous sites connected by LAN (grid can be multiple clusters)

Grid • Precursor to Cloud computing but no virtualization • Some say there wouldn’t be cloud computing if there hadn’t been grid computing “Virtualization distinguishes cloud from grid” Zissis

Parallel/Distributed Background

Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) Single (SD) Data Multiple (MD)

SISD Processor D D D D D D D Instructions

SIMD Processor D0 D0 D0 D0 D0 D0 D0 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D3 D3 D3 D3 D3 D3 D3 D4 D4 D4 D4 D4 D4 D4 … … … … … … … Dn Dn Dn Dn Dn Dn Dn Instructions

MISD Processor D D D D D D D D Instructions

MIMD Processor Processor D D D D D D D D D D D D D D Instructions Instructions Multi-threaded

Stopped here for Exam 1

History CPUs • Single CPU – inserted into single CPU socket on a motherboard • Wanted to increase capabilities: • Tried to add other CPU sockets, so multiple CPUs • Need additional hardware to connect them to RAM, resources • Lots of overhead

Hyper-threading (HT) • Parallel computation to PCs – 2000 • Pentium 4 – single CPU core but HT • Appears as 2 logical CPUs • Can execute at same time but are sharing execution • multiple instructions operated – separate modules - on separate data in parallel • MIMD • Need OS to take advantage of it • Logical core is the number of physical cores times the number of threads that can run on a machine

Multi-core • Additional cores added • Single CPU socket, all cores on same chip so less latency • If dual core, CPU has 2 central processing units • Can run 2 processes at same time • Since all on same chip • No extra hardware needed to connect to RAM, etc. • faster communication

vCPU • Virtual Processor • VM assigned to a vCPU, share of a physical CPU assigned to a VM • vCPU is a series of time slots on logical processors • Adding more vCPUs can increase wait time

How to increase vCPUs • If a vCPU is presented to OS as single core CPU in single socket, limits number of vCPUs • Typically OS restrict number of physical CPUs not logical CPUs • Multicore vCPUs - VMware introduced virtual sockets and cores per socket, e.g. 2 virtual sockets and 4 virtual cores per socket, allows 8vCPUs • vCPUs part of Simultaneous multithreading SMP compute model (replaced • Allow threads to be split across multiple physical or logical cores

Multiple threads on a VM • If 2 hyper threads each on 4 cores, VM thinks have 8 cores • If want all 8 threads at once?

Each thread at 50% utilization • Each physical CPU at 100% • In the cloud: • Avoid creating a single VM with more vCPUs than physical cores • If need more, split up VM

The Cloud - Parallelization

Divide and ConquerSIMD and MIMD “Work” Partition w1 w2 w3 “worker” “worker” “worker” r1 r2 r3 Combine “Result”

Different Workers • Different threads in the same core • Different cores in the same CPU • Different CPUs in a multi-processor system • Different machines in a distributed system

Choices, Choices, Choices • Commodity vs. “exotic” hardware • Number of machines vs. processor vs. cores • Bandwidth of memory vs. disk vs. network • Different programming models

Parallelization Problems • How do we assign work units to workers? • What if we have more work units than workers? • How do we know all the workers have finished? • What if workers need to share partial results? • How do we aggregate partial results? • What if workers die? “Work” Partition w1 w2 w3 “worker” “worker” “worker” r1 r2 r3 What is the common theme of all of these problems? Combine “Result”

General Theme? • Parallelization problems arise from: • Communication between workers • Access to shared resources (e.g., data) • Thus, we need a synchronizationsystem! • This is tricky: • Finding bugs is hard • Solving bugs is even harder

Managing Multiple Workers • Difficult because • (Often) don’t know the order in which workers run • (Often) don’t know where the workers are running • (Often) don’t know when workers interrupt each other • Thus, we need what structures?: From your OS class • Semaphores (lock, unlock) • Conditional variables (wait, notify, broadcast) • Barriers • Still, lots of problems: • Deadlock, livelock, race conditions, ... • Moral of the story: be careful! • Even trickier if the workers are on different machines

Patterns for Parallelism • Parallel computing has been around for decades • Here are some “design patterns” …

Master/Slaves master GFS slaves

Producer/Consumer P C P/C P/C GFS P/C Bounded buffer

Work Queues P C shared queue P C W W W W W P C Multiple producers/consumers

Basics • Divide and conquer • Partition large problem into smaller subproblems • Worker works on subproblems in parallel • Threads in a core, cores in multi-core processor, multiple processor in a machine, machines in a cluster

Rubber Meets Road • From patterns to implementation: • pthreads, OpenMP for multi-threaded programming • MPI (message passing interface) for clustering computing • … • The reality: • Lots of one-off solutions, custom code • Write your own dedicated library, then program with it • Burden on the programmer to explicitly manage everything • MapReduce to the rescue!

Intro to Parallel and Distributed Processing