470 likes | 495 Views
Parallel Algorithms: status and prospects. Guo-Liang CHEN Dept.of computer Science and Technology National High Performance Computing Center at Hefei Univ. of Science and Technology of China Hefei,Anhui,230027,P.R.China glchen@ustc.edu.cn http://www.nhpcc.ustc.edu.cn. Abstract.
E N D
Parallel Algorithms: status and prospects Guo-Liang CHEN Dept.of computer Science and Technology National High Performance Computing Center at Hefei Univ. of Science and Technology of China Hefei,Anhui,230027,P.R.China glchen@ustc.edu.cn http://www.nhpcc.ustc.edu.cn
Abstract The parallel algorithm is very important problem to parallel processing. In this talk, we first briefly introduce to the parallel algorithm. And then, we focus on discussion issues and direction of the parallel algorithm research. Lastly, the existent problems and faced new challenges for parallel algorithm research are given. We argue that the parallel algorithm research should establish a systematic approach to “Theory-Design-Implementation-Application”, should form an integrated methodology of “Architecture-Algorithm-Programming”. Only in this way, parallel algorithm research becomes continuous development and more realistic.
Outline • Introduction • Research Issues • Parallel computation models • Design techniques • Parallel complexity theory • Research directions • Parallel computing • Solving problem from applied domains • Non-Tradition computation modes • Existent problems and faced new challenges
Introduction (1) • What is a parallel algorithm? • Algorithm------ a method and procedure to solve a given problem • Parallel algorithm------ an algorithm in which multiple operations are performed simultaneously. • Why parallelism has been an interesting topic? • The real world is inherently parallel: it is natural and straightforward to express something about the real world in a parallel way. • There are limits to sequential computing performance: physical limits such as the speed of light. • Parallel computation is still likely to be more cost-effective for many applications than using uniprecessors which are very expensive.
Introduction (2) • Why parallelism has not led to widespread use? • Conscious human thinking appears to be sequential to us. • The theory required for parallel algorithm in immature and was developed after the technology. • The hardware platform required for parallel algorithm are very expensive. • Portability is much more serious issue in parallel programming than in sequential. • Why we need parallel algorithm? • To increase computational speed. • To increase computational precision (to generate the fine mesh). • To meet requirement of real time computation (weather forecasting).
Introduction (3) • Classification of parallel algorithms • Numerical parallel algorithms (algebraic operation: matrix operations, solving a system of linear equations etc.). • Non-numerical parallel algorithms (symbolic operation: sorting, searching, graph algorithms etc.). • Research hierarchy of parallel algorithms • Parallel complexity theory (parallelizable problem, NC class problem, P-complete problem, lower bound etc.) • Design and analysis of parallel algorithms (efficient parallel algorithms). • Implementation of parallel algorithms (hardware platform, software supporting).
Introduction (4) • The history of parallel algorithm research • From 70’s to 80’s two decades, parallel algorithm research were very hot, many excellent papers, textbooks, monographs of parallel algorithms were published. • From the middle of 90’s, parallel algorithm had shifted to parallel computing. • New opportunity for parallel algorithm research • The dramatically decrease of computer price and the rapid development of communication technology, make it possible to build PC-cluster by ourselves. • It is easy to get free software to support cluster from internet.
Research Issues • Parallel computation models • PRAM • APRAM • BSP • logP • MH and UMH • Memory-LogP • Design techniques • Partitioning Principle • Divide-and-Conquer Strategy • Balanced Trees Method • Doubling Techniques • Pipelining Techniques • Parallel complexity theory • NC class • P-complete
Research Issues – Parallel computation models(1) • PRAM(Parallel Random Access Memory) • SIMD-SM, Used for fine grain parallel computing, Centralized shared memory, Globally synchronized. • Advantages • Suitable for representing and analyzing complexity of parallel algorithms, Simple to use , hiding the most of low-level details of parallel computer (communication, synchronization etc. ). • Disadvantages • Unsuitable for MIMD computers, Unrealistic to neglect the issues of memory contention, communication delay etc.
Research Issues – Parallel computation models(2) • Asynchronous PRAM • APRAM or MIMD-SM, Used for medium grain parallel computation, Centralized shared memory, Asynchronous operation, Read/write shared variable communication, Explicit synchronous (barrier, etc.). • Computation in APRAM • Computation Consists of global phases separated by barriers: in a phase all processors execute operation asynchronized, the last instruction must be a synchronization instruction. • Advantages: Preserving much of simplicity of PRAM, better programmability, program must be correctness, easy to analyze complexity. • Disadvantages: Unsuitable for MIMD computers with distributed memory.
Research Issues – Parallel computation models(3) • Bulk Synchronous Parallel (BSP) model • MIMD-DM, Consist of a set of processors, send/receive message communication , A mechanism for synchronization. • Bulk synchronous: to combine message into a bulk one to transfer, to delay communication. • BSP parameters • p: number of processors • l: Barrier synchronization time • g: unary packet transmission time (time steps/packet)=1/bandwidth • BSP bulk synchronization can reduce the difficulty of design and analysis and ensure easily the correctness of the algorithm.
Research Issues – Parallel computation models(4) • LogP model • MIMD-DM, point-to-point communication, implicit synch. • Parameters (LogP) • L (network latency), o (communication overhead), g (gap=1/bandwidth), P(#processors). • Advantages • Capturing the communication bottleneck of parallel computers. • Hiding details of topology, routing algorithm and network protocol. • Can be applicable shared variable, message passing and data parallel algorithm. • Disadvantages • Restricting the network capacity, neglecting communication congestion. • Difficult to describe and design of algorithms.
Research Issues – Parallel computation models(5) • BSP (Bulk Synch.)-> BSP (Subset synch.)-> BSP (Pairwise synch.) = logP • BSP can simulate logP with constant factor and logP can simulate BSP with at most logarithmic factor • BSP=LogP+Barriers-Overhead • BSP offers a more convenient abstraction for design of algorithm and program and logP provides a better control of machine resources • BSP seems preferable with greater simplicity and portability and more structured programming style
Research Issues – Parallel computation models(5) • MH (Memory Hierarchy) model: • A sequential computer memory is modeled as a sequence of memory modules <M0, M1, M2, M3, …>, With buses connecting adjacent modules, All buses may be active simultaneously, M0 (Central processor), M1 (Cache), M2(Main memory), M3(Storage). • MH is oriented address access model, memory access cost function f(a) is a monotonic increasing function, a is memory address. • MH model is suitable for sequential memory (magnetic tape etc.)
Research Issues – Parallel computation models(5) • UMH (Uniform Memory Hierarchy) model: • UMH model captures performance-relevant aspects of the hierarchical nature of computer memory which is a tool for quantifying the efficiency of data movements. • Memory access cost function f(k), k is # of memory hierarchy. • The memory access of the algorithm is always repeatly from farther memory modules neglecting from nearer memory modules if possible. • Prefetching operands and overlapping computation with memory access operations are encouraged.
Research Issues – Parallel computation models(5) • Memory LogP Model: • This model is based on data movement across a memory hierarchy from resource LM to target LM (Local Memory) using point-to-point memory communication inspired by LogP to predict and analyze the latency of memory copy, pack and unpack. • Communication lost consist of the sum of memory communication and network communication times. Memory communication is from user local menory to network buffer, network communication is from network buffer to network buffer. • Estimating the cost of point-to-point communication is similar to the original LogP only parameters have different meaning.
Research Issues – Parallel computation models(5) • Model Parameters: • l: effective latency, l = f (d, s), s (data size), d (access pattern) which is the cost of data transfer for application, middleware and hardware. • o: ideal overhead, which is the cost of data transfer for middleware and hardware. • g: the reciprocal of g corresponds to per-process bandwidth, usually o = g. • p: # of processors, p =1 (since consider only point-to-point communication). • Cost Function (cost per byte): • (om+ l) + (Ln/wn) + (om+ l) which is similar to o + l + o of LogP. • Om + l ── average cost between packing / unpacking, Ln ── word size of network communication, wn ── word size of instruction set.
Research Issues – Design Techniques(1) • Partitioning • Breaking up the given problem into several nonoverlapping subproblems of almost equal sizes. • Solving concurrently these subproblems. • Divide and conquer • Dividing the problem into several subproblems. • Solving recursively the subproblems. • Merging solutions of subproblems into a solution for original problem. • Balanced Tree • Building a balanced binary tree on input elements. • Traveling the tree forward/backward to/from the root.
Research Issues – Design Techniques(2) • Pipelining • Breaking an algorithm into a sequence of segment in which the output of each segment is the input of its successor. • All segments must produce results at the same rate. • Doubling • It is also called pointer jumping, path doubling. • The computation proceeds by a recursive application of the calculation. • This distance doubles in successive steps. After k steps the computation has been performed over all elements within a distance of 2K .
Research Issues - Parallel Complexity Theory(1) • Nick's Class problem • Definition: if it can be solved in time polylogarithmic in the size of the problem using at most a polynomial number of processors. • Role: the class NC in parallel complexity theory plays the role of P in sequential complexity. • P-complete problem • Definition: a problem L∈P is said to be P-complete if every other problem in P can be transformed to L in polylogarithmic parallel time using a polynomial number of processors. • Role: P-completeness plays the role of NPC in sequential complexity.
Research Issues - Parallel Complexity Theory(2) • Parallelisable problem • NC is the class of problem solvable in polylogarithmic Parallel time using a polynomial number of processors. • Obviously, any problem in NC is also in P (NC P), but few believe P class is also in NC(P NC). • Even if a problem is P-complete, there may be efficient(but not necessarily polygarithmic time) parallel algorithm for solving it.(Ex. maximum flow problem which is P-complete, but several efficient parallel algorithms are known for solving it).
Research Directions • Parallel computing • Architecture • Algorithm • Programming • Solving problem from applied domains • Non-Tradition computation modes • Neuro-computing • Nature inspired computing • Molecular computing • Quantum computing
Research Directions – Parallel Computing(1):Architecture • SMP(Symmetric MultiProcessors) • MIMD,UMA, medium grain, higher DOP(Degree of Parallelism). • Commodity microprocessors with on/off-chip caches. • A high-speed snoopy bus or crossbar switch • Central shared memory. • Symmetric: each processor has equal access to SM(Shared Memory),I/O and OS services. • Unscalable due to SM and bus.
Research Directions – Parallel Computing(1):Architecture • MPP (Massively Parallel Processors) • MIMD, NORMA, medium/large grain. • A large number of commodity processors. • A customized high bandwidth, low latency communication network. • Physically distributed memory. • May or may not have local disk. • Synchronized through blocking message-passing operations.
P/C P/C M M M/IO M/IO D LAN D NIC NIC Research Directions – Parallel Computing(1):Architecture • Cluster • MIMD, NUMA, coarse grain, Distributed memory. • Each node of Cluster is a complete computer (SMP of PC), sometimes called headless workstation. • A low-cost commodity network. • There is always a local disk. • A complete OS resides on each node, whereas MPP only a microkernel exists.
Research Directions – Parallel Computing(1):Architecture • Constellation • Constellation: clusters of custom vector processors, very expensive • Small/medium collection of fast vector nodes, vector operation on vector registers • Large memory & moderate scalability and very limited scalability in processor count • High bandwidth pipelined memory access • Global shared memory (PVP), easy programming model
Research Directions – Parallel Computing(2):algorithms Policy: Parallelizing a Sequential Algorithm • Method Description • Detect and exploit any inherent parallelism in an existing sequential algorithm. • Parallel implementation of a parallelizable code segment. • Remark • parallelizing is most useful and effective usually. • Not all sequential algorithms can be parallelized. • A good sequential algorithm is uncertain to be parallelized a good parallel algorithm. • Many sequential numerical algorithms can be parallelized directly into effective parallel numerical algorithms.
Research Directions – Parallel Computing(2):algorithms Policy: Designing a new Parallel Algorithm • Method Description • In terms of the description of a given problem, we redesign or invent a new parallel algorithm without regard to the related a sequential algorithm. • remark • Investigating inherent feature of the problem. • Inventing a new parallel algorithm is a challenge and creative work.
Research Directions – Parallel Computing(2):algorithms Policy:Borrowing Other Well-known Algorithm • Method Description • To find relationship between to be solved problem and well-know problem. • Design a similar algorithm that solves a given problem using a well-know algorithm. • remark • This work is very skilled work where rich and practical experience of algorithm design is needed.
Research Directions – Parallel Computing(2):algorithms Methods • Decomposition • Divide-and-Conquer Strategy • Randomization • Parallel Iterative • Pipelining Techniques • MultiGrid • Conjugate Gradient • ……
Research Directions – Parallel Computing(2):algorithms Prcedure (Steps) • PCAM Algorithm Design • 4 Stages to designing a parallel algorithm: • P: Partitioning • C: Communication • A: Agglomeration • M: Mapping • P & C focus on concurrency and scalability. • A & M focus on locality and performance.
Research Directions – Parallel Computing(2):algorithms • Procedure (Steps)
Research Directions – Parallel Computing(3):programming Parallel Programming Models • Implicit Parallel • Sequential programming language, compiler is responsible to convert automatically it into a parallel codes. • Data Parallel • Emphasize local computations and data routing operations. It can be implemented either on SIMD or SPMD. • Shared Variable • Native model for PVP,SMP and DSM. The portability of programs is problematic. • Message Passing • Native model for MPP and Cluster. The portability of programs is enhanced greatly by PVM and MPI libraries.
Research Directions –Parallel Computing(3):programming Unified Parallel Programming Model • High Abstraction Level • Suitable for various distributed and shared memory parallel architecture • Hide the underlying implementation details of message-passing or synchronization • Support high abstraction level parallel algorithm design and description • High Productivity • Support fast and intuitive mapping from parallel algorithms to parallel programs • Support high-performance implementation of parallel programs • Highly readable parallel programs • High Extensibility • Can be customized or extended conveniently • Can accommodate the needs of various application areas
Research Directions –Parallel Computing(3):programming Unified Parallel Programming ModelMain Layers and Components • Core Support Layer • GOOMPI: Generic Object Oriented MPI • PMT: Parallel Multi-Thread • Core Application Layer • Centered on smart parallel and distributed abstract data structures • Implementation of a highly reusable basic parallel algorithm library • High-level Framework Layer • Provide extensible parallel algorithmic skeletons • Support the research and design of new parallel algorithms
Research Directions – Parallel Computing(3):programming Unified Parallel Programming Model System Architecture
Research Directions – Parallel Computing(3):programming Parallel Programming Languages • ANSI X3H5 • POSIX Threads • OpenMP • PVM:Parallel Virtual Machine • MPI:Message Passing Interface • HPF:High-Performance Fortran • ……
Research Directions – Parallel Computing(3):programming Parallel Programming Environment tools • Parallelize Compiler • SIMDizing:Vectorizing • MIMDizing:Parallelizing • Performance Analysis • Data Collection • Data Transformation and Visualization • ……
Research Directions – solving problems from applied domains • Computational Science&Engineering(CSE) • Computational physics • Computational chemistry • Computational biology • …… • Science and Engineering computing requirements • Global change • Human Genome • Fluid turbulence • Vehicle dynamics • Ocean circulation • Superconductor modeling • weather forecast • ……
Research Directions –non-tradition computation modes(1) • Neuro computing: Using an amount of 1012 neuron to perform parallel and distributed processing. • Nature inspired computing: Using something inspired by natural systems, often has the unique characters of self-adaptive, self-organizing and self-learning. • Molecule parallel computing: Using an amount of 1020 molecules to perform computation of spatial parallel instead time parallel. • Quantum computing: Using quantum superpostition principle to make the quantum computation very powerful.
Research Directions –non-tradition computation modes(2) Neuro computing • Principle of Neural Network computing • collective-decision • cooperation-and-competition • learning-and-selforganization • massively parallel processing, distributed memory, analogy computation • Dynamics evolution • Complexity theory of NN computing • For any NP-hard problem, even finding approximate solution by a polynomial size of network is also impossible unless NP=co-NP. • In the sense of the average case, NN is likely to be more efficient than conventional computers. A great many of experiments have shown it at least. • For some particular problems, it is possible to find an efficient solution by some NN, but the learning of NN is a hard problem.
Research Directions –non-tradition computation modes(3) Nature inspired computing • Nature Inspired Computation is an emerging interdisciplinary area between Computer Science and Natural Sciences (especially Life Sciences). • Artificial Neural Network • Inspired by the function of neurons in the brain • Genetic Algorithms • Inspired by the biological process of evolution • Artificial Immune System • Inspired by the principle of biological immune system • Ant Clone System/Swarm Intelligence • Inspired by the behaviour of social insects • Ecological computation • Inspired by the principle of ecosystem
Research Directions –non-tradition computation modes(4) Molecular Computing or DNA Computing • in 1994, L. Adleman published a breakthrough for making a general-purpose computer with biological molecules (DNA). • Molecular Computation Project (MCP) is an attempt to harness the computational power of molecules for information processing. In other words, it is a trial to develop a general-purpose computer with molecules. • Ability to compute quickly (Adleman’s experiment performed at a rate of 100 teraflops, or 100 trillion floating point operations per second. By comparison, NEC Corporation’s Earth Simulator, the world’s fastest supercomputer, operates at approximately 36 teraflops)
Research Directions –non-tradition computation modes(5) Quantum computing • Reversible computation(Bennett and Fredkin) • Quantum complexity • Shor's factorization algorithm (1994) • Grover's quantum search algorithm (1997) • Some scientists believe that the power of quantum computation derives from quantum superpostion and parallelism, other than entanglement. • Shor’s and Grover’s quantum algorithms were only of theoretical interest, as it proved extremely difficult to build a quantum computer.
Existent problems and Faced challenges(1) • Existent problems • pure theoretical parallel algorithm research is somewhat slowish. • Some theoretical results of parallel algorithms are unrealistic. • Parallel software is behind parallel hardware. • Parallel applications are not popular and very weak. • Faced new challenges • How to use efficiently thousands upon thousands processors to solve practical problem. • How to write, map, schedule, run, monitor the an amount of parallel processes. • What is the parallel computational model for grid computing?
Existent problems and Faced Challenges(2) • What should we do? • To establish a systematic approach of the "Theory-Design-Implementation-Application" for parallel algorithm research. • To form an integrate methodology of the "Architecture-Algorithm-Programming" for parallel algorithm design. • Our contributions • To cultivate many students in parallel algorithm area for our country • To publish a series of parallel computing textbooks, including • parallel computing: Architecture ·Algorithm ·Programming • Design and Analysis of Parallel Algorithms • Parallel Computer Architectures • Parallel Algorithm to Practice.
Thank you for your listening !