440 likes | 708 Views
并行程序设计 Parallel Programming. Pingpeng Yuan. Parallel Programming. What Why How Goal exam. What is Parallel Programming?. Coordinating multiple processing elements to solve a problem. Parallelism - A simplistic understanding. Multiple tasks at once.
E N D
并行程序设计Parallel Programming Pingpeng Yuan
Parallel Programming • What • Why • How • Goal • exam
What is Parallel Programming? • Coordinating multiple processing elements to solve a problem
Parallelism - A simplistic understanding • Multiple tasks at once. • Distribute work into multiple execution units. • Two approaches - • Data Parallelism • Functional or Control Parallelism • 数据并行 – 将数据分成块,然后每一计算单元分别处理数据块. • 功能并行– 将问题划分成不同的任务,然后处理单元分别处理任务
Why • Why • Technology Trend • Application Needs
Human Architecture! Growth Performance Vertical Horizontal Growth 5 10 15 20 25 30 35 40 45 . . . . Age
Computational Power Improvement Multiprocessor Uniprocessor C.P.I. 1 2 . . . . No. of Processors
General Technology Trends • Microprocessor performance increases 50% - 100% per year • Clock frequency doubles every 3 years • Transistor count quadruples every 3 years
Clock Frequency Growth Rate(Intel family) • 30% per year
Intel Many Integrated Core (MIC) 32 core version of MIC:
Tilera’s 100 cores (June 2011) • Tilera has introduced a range of processors (64-bit Gx family: 36 cores, 64 cores and 100 cores), aiming to take on Intel in servers that handle high-throughput web applications • 64-bit cores running up to 1.5GHz • Manufactured in 40nm technology
Top500 Paradigm Change in HPC ….
GPU Architecture NVIDIA Fermi, 512 Processing Elements (PEs)
The Gap Between CPU and GPU ref: Tesla GPU Computing Brochure
Transistor Count Growth Rate (Intel family) • Transistor count grows much faster than clock rate • - 40% per year, order of magnitude more contribution in 2 decades
How to Use More Transistors • Improve single threaded performance via architecture: • Not keeping up with potential given by technology • Use transistors for memory structures to improve data locality • Use parallelism • Instruction-level • Thread level
Trends in DRAM Capabilities • DRAM densities to double every 3 years • Projections for DRAM densities revised downwards over time • Current densities at 4Gb/die • DRAM data rates to double every 4-5 years • Projections for DRAM data rates revised upwards over time • Current data-rates at 2.2 Gb/s
Similar Story for Storage • 内存容量和内存访问速度差距更明显 • 从1980-95起内存容量扩大了1000x,每年增长50% • 延迟每年只降低了3% (only 2x from 1980-95) • 内存带宽增加了2x • 处理器速度变快,内存变大,内存相对变慢 • 需要并行传输更多地数据 • 需要更多的cache层次
存储层次Memory hierarchy 100 bytes CPU registers < 1 ns • 每一层次可视作为下一层的cache 32KB L1 cache 1 ns 256KB L2 cache 4 ns 1GB Primary Memory 60 ns 1TB Secondary Storage 10 ms Tertiary Storage 1s-1hr 1PB
Similar Story for Storage • 并行增加了每层的效率,但没有增加访问时间 • 并行和局部性在存储系统内部同样如此 • 内存芯片上同时取多个bit;然后在狭窄的通道上流水传输 • 缓冲区存储最近访问的数据
Disk trends • Disks too: Parallel disks plus caching • Disk capacity, 1975-1989 • doubled every 3+ years • 25% improvement each year • factor of 10 every decade • Still exponential, but far less rapid than processor performance • Disk capacity, 1990-recently • doubling every 12 months • 100% improvement each year • factor of 1000 every decade • Capacity growth 10x as fast as processor performance!
Disk trends • Only a few years ago, we purchased disks by the megabyte • Today, 1 GB (a billion bytes) costs $1 $0.50$0.05 from Dell • => 1 TB costs $1K $500 $50, 1 PB costs $1M $500K $50K • Technology is amazing • Flying a 747 6” above the ground • Reading/writing a strip of postage stamps
总之,飞速增长 • 处理器速度 • 存储能力 • 带宽相对于延迟和时钟频率之间的差距 • 并行是计算机体系结构发展的必然趋势
Commodity computer systems 19462003 General-purpose computing: Serial. 5KHz4GHz. 2004 General-purpose computing goes parallel. Clock frequency growth flat. #Transistors/chip 19802011: 29K30B! #”cores”: ~dy-2003
If you want your program to run significantly faster … you’re going to have to parallelize it
Drivers of Parallel Computing – Application needs ref: http://www.nvidia.com/object/tesla_computing_solutions.html
Example 1: Southern oceans heat Modeling (10-minute iterations) 300 GFLOP per iteration 300 000 iterations per 6 yrs = 1016 FLOP 4096 E-W regions 1024 N-S regions 12 layers in depth Why Do We Need Parallel Processing? Reasonable running time = Fraction of hour to several hours (103-104 s) In this time, a TIPS/TFLOPS machine can perform 1015-1016 operations Example 2: Fluid dynamics calculations (1000 1000 1000 lattice) 109 lattice points 1000 FLOP/point 10 000 time steps = 1016 FLOP Example 3: Monte Carlo simulation of nuclear reactor 1011 particles to track (for 1000 escapes) 104 FLOP/particle = 1015 FLOP Decentralized supercomputing ( from Mathworld News, 2006/4/7 ): Grid of tens of thousands networked computers discovers 230402457– 1, the 43rdMersenne prime, as the largest known prime (9 152 052 digits )
大数据时代 根据IDC的报告,2012年全球的数据总量为2.7ZB,预计到2020年,全球的数据总量将达到35ZB。 • 大数据分类: • 互联网数据 • 科学数据 • 多媒体数据 • 行业应用数据,如金融数据
What Makes it Big Data? SOCIAL 101100101001001001101010101011100101010100100101 BLOG SMARTMETER VOLUME VELOCITY VARIETY VALUE
Numbers • How many data in the world? • 800 Terabytes, 2000 • 160 Exabytes, 2006 • 500 Exabytes(Internet), 2009 • 2.7 Zettabytes, 2012 • 35 Zettabytes by 2020 • How many data generated ONE day? • 7 TB, Twitter • 10 TB, Facebook Big data: The next frontier for innovation, competition, and productivity McKinsey Global Institute 2011
How • How • 实践是检验真理的唯一标准
Parallel Programming • 课程内容结构 • Parallel Architectures • Parallel Algorithms • Parallel Programming
Goal • Most people in the research community agree that there are at least two kinds of parallel programmers that will be important to the future of computing • Programmers that understand how to write software, but are naïve about parallelization and mapping to architecture • Programmers that are knowledgeable about parallelization, and mapping to architecture, so can achieve high performance
授课计划 • 总共32学时 • 4学时: 课程介绍+并行计算系统体系结构 • 4学时:并行算法基础 • 24学时:并行程序设计
考核要求 • 成绩评定方式:平时成绩(出勤率 + 1 doc) +考试成绩(分数比例:20:80) • 1 doc • 针对某一并行计算技术问题,对相关解决技术进行评论并给出改进 • 评论主要着眼于创新点和存在的问题,以及可能下一步的研究工作。