1 / 9

STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.)

STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.). Parallelization Four types of computing: Instruction (single, multiple) per clock cycle Data used (single, multiple) per clock cycle Single Instruction Single Data: Serial computing

owilson
Download Presentation

STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: Instruction (single, multiple) per clock cycle Data used (single, multiple) per clock cycle Single Instruction Single Data: Serial computing Single Instruction Multiple Data: Multiple processors, GPU Multiple Instruction Single Data: Shared memory MIMD: Cluster computing, Multi-core CPU, Multi-threaded, Message-passing (IBM SP-x on hypercube, Intel single chip Xenon Phi: http://spectrum.ieee.org/semiconductors/processors/what-intels-xeon-phi-coprocessor-means-for-the-future-of-supercomputing)

  2. Grid Computing & Cloud Not necessarily parallel Primary focus is the utilization of CPU-cycles across Just networked CPU’s, but middle-layer software makes node utilizations transparent A major focus: avoid data transfer – run codes where data are Another focus: load balancing Message passing parallelization is possible: MPI, PVM, etc. Community specific Grids: CERN, Bio-grid, Cardio-vascular grid, etc. Cloud: Data archiving focus, but really commercial versions of Grid, CPU utilization is under-sold but coming up: expect service-oriented software business model to pick up

  3. RAM Memory Utilization • Two types feasible: • Shared memory: • Fast, possibly on-chip, no message passing time, no dependency on a ‘pipe’ and its possible failure • But, consistency needs to be explicitly controlled, that may cause-deadlock, that needs deadlock checking-breaking mechanism adding overhead • Distributed local memory: • communication overhead • ‘pipe’ failure possibility is a practical problem • good model where threads are independent of each other • most general model for parallelization • easy to code, & well-established library (MPI) • scaling up is easy – on-chip to over-the-globe

  4. Threading Types • Two types feasible: • Static threading: OS controls, typically for single-core CPU’s (why would one do it? - OS), but multi-core CPU’s use it if compiler guarantees safe execution • Dynamic threading: Program controls explicitly, threads are created/destroyed as needed, parallel computing model

  5. Multi-threaded Fibonacci Recursive Fib (n) • If n<=1 then return n; else • x = Fib(n-1); • y = Fib(n-2); • return (x+y). Complexity: O(Gn), where G is Golden ration ~1.6

  6. Fibonacci Recursive Fib (n) • If n<=1 then return n; else • x = Spawn Fib(n-1); • y = Fib(n-2); • Sync; • return (x+y). Parallelization of threads is optional: scheduler decides (programmer, script translator, compiler, os)

  7. Spawn, or Data collection node is counted as time unit 1 • This is message passing • Note, GPU/SIMD uses different model: • Each thread does same work (kernel), & Data goes to shared memory • GPU-type parallelization’s ideal time ~critical path length • The more balanced the tree is the shorter the critical path

  8. Terminologies/Concepts • For P available processor: Tinf , TP , T1 : no-limit to serial-processor • Ideal parallelization: TP = T1 / P • Real situation: TP >= T1 / P • Tinf is theoretical minimum feasible, so, TP >= Tinf • Speedup factor = T1 / P • T1 / TP <= P • Linear speedup: T1 / TP = O(P) [e.g. 3P +c] • Perfect linear speedup: T1 / TP = P • My preferred factor would be TP / T1(inverse speedup: slowdown factor?) • linear O(P); quadratic O(P2), …, exponential O(kP, k>1)

  9. Terminologies/Concepts • For P available processor: Tinf , TP , T1 : no limit to serial processor • Parallelism factor: T1 / Tinf • serial-time by ideal-parallelized-time • note, this is about your algorithm, • unoptimized over the actual configuration available to you • T1 / Tinf < P implies NOT linear speedup • T1 / Tinf << P implies processors are underutilized • We want to be close to P: T1 / Tinf  P, as in limit • Slackness factor: (T1 / Tinf ) / P , or (T1 / Tinf P) • We want slackness  1, minimum feasible • i.e, we want no slack

More Related