1 / 86

Advanced Computing Techniques & Applications

Advanced Computing Techniques & Applications. Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn. Course Profile. Lecturer: Dr. Bo Yuan Contact Phone: 2603 6067 E-mail: yuanb@sz.tsinghua.edu.cn Room: F - 301B Time: 10:25 am – 12:00pm , Friday Venue: CI - 208 Teaching Assistant

fancy
Download Presentation

Advanced Computing Techniques & Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Computing Techniques & Applications Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

  2. Course Profile • Lecturer: Dr. Bo Yuan • Contact • Phone: 2603 6067 • E-mail: yuanb@sz.tsinghua.edu.cn • Room: F-301B • Time: 10:25 am – 12:00pm, Friday • Venue: CI-208 • Teaching Assistant • Mr. Shiquan Yang

  3. We will study ... • MPI • Message Passing Interface • API for distributed memory parallel computing (multiple processes) • The dominant model used in cluster computing • OpenMP • Open Multi-Processing • API for shared memory parallel computing (multiple threads) • GPU Computing with CUDA • Graphics Processing Unit • Compute Unified Device Architecture • API for shared memory parallel computing in C (multiple threads) • Parallel Matlab • A popular high-level technical computing language and interactive environment

  4. Aims & Objectives • Learning Objectives • Understand the main issues and core techniques in parallel computing. • Obtain first-hand experience in Cloud Computing. • Able to develop MPI based parallel programs. • Able to develop OpenMP based parallel programs. • Able to develop GPU based parallel programs. • Able to develop Matlab based parallel programs. • Graduate Attributes • In-depth Knowledge of the Field of Study • Effective Communication • Independence and Teamwork • Critical Judgment

  5. Learning Activities • Lecture (10) • Introduction (3) • MPI and OpenMP (3) • GPU Computing (3) • Invited Talk (1) • Practice (3) • GPU Programming (1) • Cloud Computing (1) • Parallel Matlab (1) • Others (2) • Industry Tour (1) • Final Exam (1)

  6. Assessment Final Exam (50%) • Assignment 1 • Weight: 20% • Task: Parallel Programming using MPI • Type: Individual • Assignment 2 • Weight: 10% • Task: Parallel Programming using OpenMP • Type: Individual • Assignment 3 • Weight: 20% • Task: Parallel Programming using CUDA • Type: Individual

  7. Learning Resources

  8. Learning Resources • Books • http://www.mcs.anl.gov/~itf/dbpp/ • https://computing.llnl.gov/tutorials/parallel_comp/ • http://www-users.cs.umn.edu/~karypis/parbook/ • Journals • http://www.computer.org/tpds • http://www.journals.elsevier.com/parallel-computing/ • http://www.journals.elsevier.com/journal-of-parallel-and-distributed-computing/ • Amazon Cloud Computing Services • http://aws.amazon.com • CUDA • http://developer.nvidia.com

  9. Rules & Policies • Plagiarism • Plagiarism is the act of misrepresenting as one's own original work the ideas, interpretations, words or creative works of another. • Direct copying of paragraphs, sentences, a single sentence or significant parts of a sentence. • Presenting as independent work done in collaboration with others. • Copying ideas, concepts, research results, computer codes, statistical tables, designs, images, sounds or text or any combination of these. • Paraphrasing, summarizing or simply rearranging another person's words, ideas, without changing the basic structure and/or meaning of the text. • Copying or adapting another student's original work into a submitted assessment item.

  10. Rules & Policies • Late Submission • Late submissions will incur a penalty of 10% of the total marks for each day that the submission is late (including weekends). Submissions more than 5 days late will not be accepted. • Assumed Background • Acquaintance with C language is essential. • Knowledge of computer architecture is beneficial. • We have CUDA supported GPU cards available!

  11. Half Adder A: AugendB: Addend S: Sum C: Carry

  12. Full Adder

  13. SR Latch

  14. Address Decoder

  15. Address Decoder

  16. Electronic Numerical Integrator And Computer • Programming • Programmable • Switches and Cables • Usually took days. • I/O: Punched Cards • Speed (10-digit decimal numbers) • Machine Cycle: 5000 cycles per second • Multiplication: 357 times per second • Division/Square Root: 35 times per second

  17. Stored-Program Computer

  18. Personal Computer in 1980s BASIC IBM PC/AT

  19. Top 500 Supercomputers GFLOPS

  20. Cost of Computing

  21. Complexity of Computing • A: 10×100 B: 100×5 C: 5×50 • (AB)C vs. A(BC) • A: N×N B: N×NC=AB • Time Complexity: O(N3) • Space Complexity: O(1)

  22. Why Parallel Computing? • Why we need every-increasing performance: • Big Data Analysis • Climate Modeling • Gaming • Why we need to build parallel systems: • Increase the speed of integrated circuits  Overheating • Increase the number of transistors  Multi-Core • Why we need to learn parallel programming: • Running multiple instances of the same program is unlikely to help. • Need to rewrite serial programs to make them parallel.

  23. Sum Example 8 19 7 15 7 13 12 14 0 1 2 3 4 5 6 7 Cores 0 95

  24. Sum Example 8 19 7 15 7 13 12 14 0 1 2 3 4 5 6 7 Cores 0 4 6 27 22 2 20 26 0 4 49 46 0 95

  25. Levels of Parallelism • Embarrassingly Parallel • No dependency or communication between parallel tasks • Coarse-Grained Parallelism • Infrequent communication, large amounts of computation • Fine-Grained Parallelism • Frequent communication, small amounts of computation • Greater potential for parallelism • More overhead • Not Parallel • Giving life to a baby takes 9 months. • Can this be done in 1 month by having 9 women?

  26. Data Decomposition 2 Cores

  27. Granularity 8 Cores

  28. Coordination • Communication • Sending partial results to other cores • Load Balancing • Wooden Barrel Principle • Synchronization • Race Condition

  29. Data Dependency • Bernstein's Conditions • Examples Flow Dependency Output Dependency 1: function Dep(a, b) 2: c = a·b 3: d = 3·c 4: end function 1: function NoDep(a, b) 2: c = a·b 3: d = 3·b 4: e = a+b 5: end function

  30. What is not parallel? Loop-Carried Dependence for (k=5; k<N; k++) { b[k]=DoSomething(K) a[k]=b[k-5]+MoreStuff(k); } Recurrences for (i=1; i<N; i++) a[i]=a[i-1]+b[i]; Atypical Loop-Carried Dependence wrap=a[0]*b[0]; for (i=1; i<N; i++) { c[i]=wrap; wrap=a[i]*b[i]; d[i]=2*wrap; } Solution for (i=1; i<N; i++) { wrap=a[i-1]*b[i-1]; c[i]=wrap; wrap=a[i]*b[i]; d[i]=2*wrap; }

  31. What is not parallel? Induction Variables i1=4; i2=0; for (k=1; k<N; k++) { B[i1++]=function1(k,q,r) i2+=k; A[i2]=function2(k,r,q); } Solution i1=4; i2=0; for (k=1; k<N; k++) { B[k+3]=function1(k,q,r) i2=(k*k+k)/2; A[i2]=function2(k,r,q); }

  32. Types of Parallelism • Instruction Level Parallelism • Task Parallelism • Different tasks on the same/different sets of data • Data Parallelism • Similar tasks on different sets of the data • Example • 5 TAs, 100 exam papers, 5 questions • How to make it task parallelism? • How to make it data parallelism?

  33. Assembly Line • How long does it take to produce a single car? • How many cars can be operated at the same time? • How long is the gap between producing the first and the second car? • The longest stage on the assembly line determines the throughput. 15 20 5

  34. Instruction Pipeline 1: Add 1 to R5. 2: Copy R5 to R6. • IF: Instruction fetch • ID: Instruction decode and register fetch • EX: Execute • MEM: Memory access • WB: Register write back

  35. Superscalar

  36. Computing Models • Concurrent Computing • Multiple tasks can be in progress at any instant. • Parallel Computing • Multiple tasks can be run simultaneously. • Distributed Computing • Multiple programs on networked computers work collaboratively. • Cluster Computing • Homogenous, Dedicated, Centralized • Grid Computing • Heterogonous, Loosely Coupled, Autonomous, Geographically Distributed

  37. Concurrent vs. Parallel Job 1 Job 2 Job 1 Job 2 Job 1 Job 2 Job 3 Job 4 Core Core 1 Core 2 Core 1 Core 2

  38. Process & Thread • Process • An instance of a computer program being executed. • Threads • The smallest units of processing scheduled by OS • Exist as a subset of a process. • Share the same resources from the process. • Switching between threads is much faster than switching between processes. • Multithreading • Better use of computing resources • Concurrent execution • Makes the application more responsive Thread Process Thread

  39. Parallel Processes Process 1 Node 1 Process 2 Program Node 2 Process 3 Node 3 Single Program, Multiple Data

  40. Parallel Threads

  41. Graphics Processing Unit

  42. CPU vs. GPU

  43. CUDA

  44. CUDA

  45. GPU Computing Showcase

  46. MapReduce vs. GPU • Pros: • Run on clusters of hundreds or thousands of commodity computers. • Can handle excessive amount of data with fault tolerance. • Minimum efforts required for programmers: Map & Reduce • Cons: • Intermediate results are stored in disks and transferred via network links. • Suitable for processing independent or loosely coupled jobs. • High upfront hardware cost and operational cost • Low Efficiency: GFLOPS per Watt, GFLOPS per Dollar

More Related