720 likes | 847 Views
Parallel Processing Comparative Study. Context . How to finish a work in short time ???? Solution To use quicker worker . Inconvenient: The speed of worker has a limit Inadequate for long works. Context . How to finish a calculation in short time ????
E N D
Context How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker has a limit Inadequate for long works
Context How to finish a calculation in short time???? Solution To use quicker calculator (processor).[1960-2000] Inconvenient: The speed of processor has reach a limit Inadequate for long calculations
Context How to finish a work in short time???? Solution • To use quicker worker. (Inadequate for long works)
Context How to finish a work in short time???? Solution • To use quicker worker. (Inadequate for long works)
Context How to finish a work in short time???? Solution • To use quicker worker. (Inadequate for long works) • To use more than one worker concurrently
Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations)
Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations)
Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations) • To use more than one processor concurrently
Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations) • To use more than one processor concurrently Parallelism
Context Definition The parallelism is the concurrentuse of more than one processing unit (CPUs, Cores of processor, GPUs, or combinations of them) in order to carry out calculations more quickly
Project Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer
the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer
the Goal Parallel Computer • Several parallel computers in the hardware market • Differ in their architecture • Several Classifications • Based on the Instruction and Data Streams (Flynn classification) • Based on the Memory Charring Degree • ….
the Goal Flynn Classification • Single Instruction and Single Data stream
the Goal Flynn Classification B. Single Instruction and Multiple Data
the Goal Flynn Classification C. Multiple Instruction and Single Data stream
the Goal Flynn Classification D. Multiple Instruction and Multiple Data stream
the Goal Memory Sharing Degree Classification A . Shared MemoryB. Distributed memory
the Goal Memory Sharing Degree Classification C. Hybrid Distributed-Shared Memory
the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer • Dividing the calculation and data between the processors • Defining the execution scenario (how the processor cooperates)
the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer • Dividing the calculation and data between the processors • Defining the execution scenario (how the processor cooperates)
the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer • Dividing the calculation and data between the processors • Defining the execution scenario (how the processors cooperate)
the Goal The accommodation of calculation to parallel computer • Is called parallel processing • Depend closely on the architecture
the Goal Goal : A comparative study between • Shared Memory Parallel Processing approach • Distributed Memory Parallel Processing approach
Plan • Distributed Memory Parallel Processing approach • Shared Memory Parallel Processing approach • Case study problems • Comparison results and discussion • Conclusion
Distributed Memory Parallel Processing approach Distributed-Memory Computers (DMC) = Distributed Memory System (DMS) = Massively Parallel Processor (MPP)
Distributed Memory Parallel Processing approach • Distributed-memory computers architecture
Distributed Memory Parallel Processing approach • Architecture of nodes Nodes can be : identicalprocessors Pure DMC different types of processorHybrid DMC different type of nodeswith different ArchitecturesHeterogeneous DMC
Distributed Memory Parallel Processing approach • Architecture of Interconnection Network • No shared memory space between nodes • Network is the only way of node-communications • Network performance influence directly the performance of parallel program on DMC • Network performance depends on : • Topology • Physical connectors (as wires…) • Routing Technique • The DMC evolutions closely depends on the Networking evolutions
Distributed Memory Parallel Processing approach The Used DMC in our Comparative Study • Heterogeneous DMC • Modest cluster of workstations • Three nodes: • Sony Laptop: i3 processor • HP Laptop: i3 processor • HP Laptop core 2 due processor • Communication Network: 100 MByte-Ethernet
Distributed Memory Parallel Processing approach Parallel Software Development for DMC Designer main tasks: • Global Calculation decomposition and tasks assignment • Data decomposition • Communications scheme Definition • Synchronization Study
Distributed Memory Parallel Processing approach Parallel Software Development for DMC Important considerations for efficiency: • Minimize Communication • Avoid barrier synchronization
Distributed Memory Parallel Processing approach Implementation environments Several implementation environments • PVM (Parallel Virtual Machine) • MPI (Message Passing Interface)
Distributed Memory Parallel Processing approach MPI Application Anatomy All the node execute the same code All the nodes does not do the same work It’s possible using SPMD application form SPMD :.... The processes are organized in one controller and workers Contradiction
Shared Memory Parallel Processing approach Several SMPC in the Markets Multi-core PC: Intel i3 i5 i7 ,AMD Which SMPC we use ? • GPU originally for image processing • GPU NOW : Domestic Super-Computer Characteristics: • Chipset and fastest Shared Memory Parallel computer • Hard Parallel Design
Shared Memory Parallel Processing approach • The GPU Architecture • The implementation environment
Shared memory parallel processing approach GPU Architecture As the classical processing unit, the Graphics Processing Unit is composed from two main components: A- Calculation Units B- Storage Unit
Shared memory parallel processing approach Shared memory parallel processing approach
Shared Memory Parallel Processing • The GPU Architecture • The implementation environment • CUDA : for GPUS manufactured by NVIDIA • OpenCL: independent of the GPU architecture
Shared Memory Parallel Processing CUDA Program Anatomy
Shared Memory Parallel Processing Q: How to execute code fragments to be parallelized in the GPU? R: By Calling a kernel Q: What’s Kernel ? R: A kernel is a function callable from the host andexecuted on the devicesimultaneously by many threads in parallel
Shared Memory Parallel Processing Kernel launch
Shared Memory Parallel Processing Kernel launch
Shared Memory Parallel Processing kernel launch
Shared Memory Parallel Processing Design recommendations • utilize the shared memory to reduce the amount of time to access the global memory. • reduce the amount of idle threads ( control divergence) to fully utilize the GPU resource.
Case study problem Square Matrix multiplication problem • ALGORITHM:() // Input: Two matrices and // Output: Matrix for to do for to do for to do return • Complexity: If we use big notation the
Case study problem Pi approximation • ALGORITHM:PiApprox () // Input: number of Bins // Output: approximation for to do return • Complexity: If we use big notation the.