1 / 56

Introduction

Introduction. Why parallel?. 因為需要 large amount of computation. Example. 3D 立體影像. 用在外科手術時 , 每秒約需 10 15 個計算. 一般的電腦每秒約可算 10 7 個計算. 10 15 /10 7 ≒10 8 秒 10 8 /10 5 ≒10 3 天 ≒3 年. 其他需要大量計算的領域. Aircraft Testing New Drug Development Oil Exploration Modeling Fusion Reactors Economic Planning

webb
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction Why parallel? 因為需要 large amount of computation.

  2. Example 3D立體影像 用在外科手術時,每秒約需1015個計算 一般的電腦每秒約可算107個計算 1015/107≒108秒 108/105≒103天≒3年

  3. 其他需要大量計算的領域 Aircraft Testing New Drug Development Oil Exploration Modeling Fusion Reactors Economic Planning Cryptanalysis Managing Large Databases Astronomy Biomedical Analysis …

  4. 電腦的速度 每3、4年增快一倍 Unfortunately 光速: 3×108 m/s Components 太近: 會互相干擾 It is evident that this trend will soon come to an end. A simple law of physics

  5. Parallelism 人(CPU)海戰術 目前許多電腦中都不只一顆CPU

  6. Models of Computation Flynn's classification: SISD(Single Instruction, Single Data) MISD(Multiple Instruction, Single Data) SIMD(Single Instruction, Multiple Data) MIMD(Multiple Instruction, Multiple Data)

  7. SISD Computers Sequential(serial) Computation

  8. Example summation of n numbers: Sum = 0 For I = 1 to n Sum = Sum + I Next need n-1 additions O(n) time

  9. MISD Computers

  10. Example test whether a positive integer is prime or not. n is prime: no divisors except 1 & n. n is composite: otherwise Each processor tests a number between 1 and n. Each processor needs O(1) operation. O(1) time

  11. SIMD Computers

  12. Share-Memory(SM) SIMD computers PRAM(Parallel Random Access Machine)

  13. How does a processor i pass a number to another processor j? O(1) time

  14. 1. EREW (Exclusive Read, Exclusive Write)2. CREW (Concurrent Read, Exclusive Write)3. ERCW (Exclusive Read, Concurrent Write)4. CRCW (Concurrent Read, Concurrent Write) Write Read

  15. How to resolve conflicts in CRCW? (a) the smallest-numbered processor is allowed to write, and access is denied to all other processor. (b) the values to be written in all processorsare equal, otherwise access is denied to all processor. (c) sum the values to be written.

  16. Simulating multiple accesses on an EREW computer: • N multiple accesses (read) : broadcasting procedure:1. P1→P22. P1,P2, → P3,P43. P1,P2,P3,P4 → P5,P6,P7,P8 : :O(log N) steps

  17. 5 5 5 5 5 5 5 5 5 5 5 5 D 2 3 4 5 6 8 1 7 5 5 A 5 5 5 P2 P3 P4 P5 P6 P7 P8 P1

  18. (ii) m out of N multiple access(read), m≦N Each memory location is associated with a binary tree.

  19. [1] [7] [7] [1] [4] [1] [1] [1] [2] [2] [2] [4] [4] [4] [7] [7] d P7 P1 P2 P4

  20. d [4] d [7] [1] [1] d [2] [4] d [7] d d d [1] d [7] d [1] d P7 P1 P2 P4

  21. Dividing a shared memory into blocks

  22. Interconnection networks SIMD computers The memory is distributed among the N processors. Every pair of processors can communicate with each other if they are connected by a line(link). Instantaneous communication between several pairs of processors is allowed.

  23. (1) fully connected networks links. There are

  24. (2) linear array(1-dimensional array) 1-D array Pi-1 Pi Pi+1 , 2≦i≦ N–1

  25. (3) two-dimensional array (2-D mesh-connected)

  26. P(j,k) has links connecting to P(j+1,k), P(j-1,k), P(j,k+1) and P(j,k-1).

  27. (4) tree connection (tree machines)

  28. The sons of Pi: P2i and P2i+1 The parent of Pi: Pi/2

  29. (5) shuffle-exchange connection shuffle links: Pi Pj j=2i if 0  i  exchange links: Pi Pi+1 if i is even j=2i-(N-1) if i  N-1

  30. (6) cube connection (n-cube,hypercube) 3-cube:

  31. 4-cube:

  32. An n-cube connected network consists of 2n node. (N=2n) binary representation of node i: in-1 in-2 ... ij ... i1 i0 connects to in-1 in-2 ... ... i1 i0 Each processor has n links.

  33. adding 8 numbers:

  34. This concept can be implemented on a tree machine. time: O(log n) , n:# of inputs m sets, each of n numbers: pipelining: log n+(m-1) steps. This concept can also be implemented on other models.

  35. MIMD Computers

  36. MIMD computers sharing a common memory: multiprocessors, tightly coupled machines MIMD computers with an interconnection network: multicomputers, loosely coupled machines, distributed system

  37. 以工程為例: ● 到完工所花的時間 ● 和一個人所花的時間比較 ● 人年(或人月) ● 一人之人年/n人之人年 Analyzing Algorithms ● running time ● speedup ● cost ● efficiency

  38. Running Time 從 the first processor begins computing 到 the last processor ends computing 的時間

  39. Running Time Sum = 0 For I = 1 to n Sum = Sum + I Next 1次 N+1次 N次 N+1次 共3N+3次 = O(N)

  40. f(n) = n g(n) = 3n + 3 n0 c g(n)=O(f(n)): at most f(n), 3n+3 ≦ 5n 當 n≧2時

  41. Upper Bound g(n)=O(f(n)): at most f(n),

  42. cf(n) g(n) n0

  43. f(n) = n g(n) = 3n + 3 n0 c g(n)= Ω(f(n)): at least f(n), 3n+3 ≧ 2n 當 n≧1時

  44. Lower Bound g(n)= Ω(f(n)): at least f(n),

  45. g(n) cf(n) n0

  46. adding 8 numbers: time: O(log n)

  47. The lower bound of sorting is Ω(n log n). (why?) The time complexity of heapsort is O(n log n). Thus, heapsort is optimal. The upper bound of sorting is O(n log n).

  48. For matrix multiplication, the lower bound is Ω(n2) and the upper bound is O(n2.5). (There exists such an algorithm.)

  49. speedup = 即:一個CPU所花的時間/平行所花的時間

More Related