Parallel Processing Comparative Study

ParallelProcessing Comparative Study

Context How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker has a limit Inadequate for long works

Context How to finish a calculation in short time???? Solution To use quicker calculator (processor).[1960-2000] Inconvenient: The speed of processor has reach a limit Inadequate for long calculations

Context How to finish a work in short time???? Solution • To use quicker worker. (Inadequate for long works)

Context How to finish a work in short time???? Solution • To use quicker worker. (Inadequate for long works) • To use more than one worker concurrently

Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations)

Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations) • To use more than one processor concurrently

Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations) • To use more than one processor concurrently Parallelism

Context Definition The parallelism is the concurrentuse of more than one processing unit (CPUs, Cores of processor, GPUs, or combinations of them) in order to carry out calculations more quickly

Project Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer

the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer

the Goal Parallel Computer • Several parallel computers in the hardware market • Differ in their architecture • Several Classifications • Based on the Instruction and Data Streams (Flynn classification) • Based on the Memory Charring Degree • ….

the Goal Flynn Classification • Single Instruction and Single Data stream

the Goal Flynn Classification B. Single Instruction and Multiple Data

the Goal Flynn Classification C. Multiple Instruction and Single Data stream

the Goal Flynn Classification D. Multiple Instruction and Multiple Data stream

the Goal Memory Sharing Degree Classification A . Shared MemoryB. Distributed memory

the Goal Memory Sharing Degree Classification C. Hybrid Distributed-Shared Memory

the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer • Dividing the calculation and data between the processors • Defining the execution scenario (how the processor cooperates)

the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer • Dividing the calculation and data between the processors • Defining the execution scenario (how the processors cooperate)

the Goal The accommodation of calculation to parallel computer • Is called parallel processing • Depend closely on the architecture

the Goal Goal : A comparative study between • Shared Memory Parallel Processing approach • Distributed Memory Parallel Processing approach

Plan • Distributed Memory Parallel Processing approach • Shared Memory Parallel Processing approach • Case study problems • Comparison results and discussion • Conclusion

Distributed Memory Parallel Processing approach

Distributed Memory Parallel Processing approach Distributed-Memory Computers (DMC) = Distributed Memory System (DMS) = Massively Parallel Processor (MPP)

Distributed Memory Parallel Processing approach • Distributed-memory computers architecture

Distributed Memory Parallel Processing approach • Architecture of nodes Nodes can be : identicalprocessors Pure DMC different types of processorHybrid DMC different type of nodeswith different ArchitecturesHeterogeneous DMC

Distributed Memory Parallel Processing approach • Architecture of Interconnection Network • No shared memory space between nodes • Network is the only way of node-communications • Network performance influence directly the performance of parallel program on DMC • Network performance depends on : • Topology • Physical connectors (as wires…) • Routing Technique • The DMC evolutions closely depends on the Networking evolutions

Distributed Memory Parallel Processing approach The Used DMC in our Comparative Study • Heterogeneous DMC • Modest cluster of workstations • Three nodes: • Sony Laptop: i3 processor • HP Laptop: i3 processor • HP Laptop core 2 due processor • Communication Network: 100 MByte-Ethernet

Distributed Memory Parallel Processing approach Parallel Software Development for DMC Designer main tasks: • Global Calculation decomposition and tasks assignment • Data decomposition • Communications scheme Definition • Synchronization Study

Distributed Memory Parallel Processing approach Parallel Software Development for DMC Important considerations for efficiency: • Minimize Communication • Avoid barrier synchronization

Distributed Memory Parallel Processing approach Implementation environments Several implementation environments • PVM (Parallel Virtual Machine) • MPI (Message Passing Interface)

Distributed Memory Parallel Processing approach MPI Application Anatomy All the node execute the same code All the nodes does not do the same work It’s possible using SPMD application form SPMD :.... The processes are organized in one controller and workers Contradiction

Shared Memory Parallel Processing approach Several SMPC in the Markets Multi-core PC: Intel i3 i5 i7 ,AMD Which SMPC we use ? • GPU originally for image processing • GPU NOW : Domestic Super-Computer Characteristics: • Chipset and fastest Shared Memory Parallel computer • Hard Parallel Design

Shared Memory Parallel Processing approach • The GPU Architecture • The implementation environment

Shared memory parallel processing approach GPU Architecture As the classical processing unit, the Graphics Processing Unit is composed from two main components: A- Calculation Units B- Storage Unit

Shared memory parallel processing approach

Shared memory parallel processing approach Shared memory parallel processing approach

Shared Memory Parallel Processing • The GPU Architecture • The implementation environment • CUDA : for GPUS manufactured by NVIDIA • OpenCL: independent of the GPU architecture

Shared Memory Parallel Processing CUDA Program Anatomy

Shared Memory Parallel Processing Q: How to execute code fragments to be parallelized in the GPU? R: By Calling a kernel Q: What’s Kernel ? R: A kernel is a function callable from the host andexecuted on the devicesimultaneously by many threads in parallel

Shared Memory Parallel Processing Kernel launch

Shared Memory Parallel Processing kernel launch

Shared Memory Parallel Processing Design recommendations • utilize the shared memory to reduce the amount of time to access the global memory. • reduce the amount of idle threads ( control divergence) to fully utilize the GPU resource.

Case study problem Square Matrix multiplication problem • ALGORITHM:() // Input: Two matrices and // Output: Matrix for to do for to do for to do return • Complexity: If we use big notation the

Case study problem Pi approximation • ALGORITHM:PiApprox () // Input: number of Bins // Output: approximation for to do return • Complexity: If we use big notation the.

Parallel Processing Comparative Study

Parallel Processing Comparative Study

Presentation Transcript

CS575 Parallel Processing

Comparative Study

Parallel Processing

CH18 Parallel Processing

Parallel Image Processing

Parallel Image Processing

Parallel Processing

SIP PROCESSING Parallel Processing Submittals

Comparative Study

PARALLEL PROCESSING

Parallel Processing

Parallel Processing

A Comparative Study of Two Natural Language Processing Frameworks

Comparative Study of Parallel Performance Visualization Tools

Parallel Processing

Parallel Processing

Parallel Query Processing

Parallel Image Processing

Parallel Processing CM0323

Parallel Processing

Comparative Study of Parallel Vertical Junction Solar Cell Photovoltages