210 likes | 352 Views
Introduction to Parallel Processing with Multi-core Part I. Jie Liu, Ph.D. Professor Department of Computer Science Western Oregon University USA liuj@wou.edu. Now the question – Why parallel?. Three things are for sure: Tax, death, and parallelism
E N D
Introduction to Parallel Processing with Multi-corePart I Jie Liu, Ph.D. Professor Department of Computer Science Western Oregon University USA liuj@wou.edu
Now the question – Why parallel? • Three things are for sure: • Tax, death, and parallelism • How long does it take a single person to build I-5? • Answer • What we do is that we want to solve a very computational intensive problem, such as modeling protein interacting with the water surrounding it. The problem could take a long long time. • The protein simulation problem take a Cray X/MP 31,688 years to simulate 1 second of interaction (in 1990). Let’s say today super computer is 100 time faster than Cray X/MP, we still need more than 300 years! • The only solution parallel processing
Why parallel (2) • Moore’s Law • The logic density of silicon-based IC (Integrated Circuits) closely followed the curve , that is, it doubles every year (until 1970, then every 18 months) • Why is the density related to processor’s speed? Because, during the process of “Computing,” the electrons need to carry signal from one end of a circuit to the other end. • For a 2GHz computer, its signals travel about .5 meters per clock cycle (.5 nanosecond) • That is, the speed of light places a physical limitation on how fast a sign processor computer can run
Why parallel (3) • There are problems require much faster computation power than today’s fastest single CPU computers can provide. • The speed of light limits how fast a single CPU computer can run • If we want to solve some computational intensive problems in a reasonable amount of time, we have to result to parallel computers!
Some Definitions • Parallel processing • Information processing that emphasizes on concurrent manipulation of data belonging to many processes solving a single problem • Example: having 100 processors sorting an array of 1,400,000,000 element – is Parallel processing • Example: printing homework while reading emails – is not Parallel processing be cause the processes are not solving a single problem. • A parallel computer is a multi-processor computer capable of parallel processing • Computers with just co-processors for math and image processing are not considered as parallel computers (some people be disagree with this notion)
Two forms of parallelisms • Control Parallelism • Concurrency is achieve by applying different operations to different data elements of a single problem • Pipeline is a special form of control parallelism • Assembly line is an example of pipeline • Data Parallelism • Concurrency is achieve by applying the same operation to different data elements of a single problem • Taking a class is an example of data parallelism (if we assuming you all are learning at the same speed) • Marching of army brigade can be considered as data parallelism
Control VS. Data Parallelism • Looking the following statement • if a[i] > b[i] • a[i] = a[i]*b[i] • else • b[i] = a[i]-b[i] • In a control parallelism fashion, some processors execute statement a[i] = a[i]*b[i], other may execute b[i] = a[i]-b[i] during the same clock cycle • In a data parallelism fashion, especially on a SIMD machine, this if statement is executed in two clock cycles: • During the first clock cycle, all the processors satisfy the condition of a[i] > b[i] execute statement a[i] = a[i]*b[i]. • During the second machine cycle, processors not satisfy the condition of a[i] > b[i] execute statement b[i] = a[i]-b[i]
Speedup – Take I • Speedup is a measurement of how well or how effective a parallel algorithm is • Is defined as the ratio between the time needed for the most efficient sequential algorithm to perform a computation and the time needed to perform the same computation on a parallel computer with a parallel algorithm. That is, • Example, we developed a parallel bubble sort that sort n elements in O(log n) time using nprocessors. The speedup is because there are efficient sorting algorithms that has a complexity of O(nlogn)
Brain Exercise • Six equally skilled students need to make 210 special cookies, each consists of the following tasks • Break dough into small pieces of equal size (1) • Hand roll the small size dough pieces into balls (1) • Press the balls into flat “cookies” (1) • Roll the “cookies” into wrappers (1) • Place suitable amount of fillings onto the wrappers (1) • Fold the wrappers to enclose the fillings completely (1) • How to do this in a pipeline fashion? • How to do this in a control parallelism fashion, other than pipeline? • How to do this in data parallel fashion?
Approach #1 D1 ~ D6 D7 ~ D12
Approach #2 D1 D2 D3 D4 D5 D6 D7
Analysis • Sequential cost (1+1+1+1+1+1)*210 = 1260 time units • Maximum Speedup for Approach #1 • Maximum Speedup for Approach #2 • Other questions to consider • If I have 1260 students, can I get the task done in 1 time unit? • What if step 3 takes 3 time units and step 6 takes 2 time units? • What if I add more “skilled” students to different approach, what would be the effect?
Grand challenges • A list of problems that are very computational intensive, but can benefit human being greatly, heavily funded by the US government • The following is just the category of problems
One of the Fastest Computer • Per ttp://abcnews.go.com/Technology/WireStory?id=5028546&page=2 • By: IBM and Los Alamos National Laboratory • Name: Roadrunner (Named after New Mexico’s state bird ) • Twice as fast as IBM's Blue Gene, which is three time faster than the next fastest computer in the world • Cost $100,000,000 – very cheap • Speed 1,000,000,000,000,000 FLOP per second (petaflop) • Usage: primarily on nuclear weapons work, including simulating nuclear explosions • Related to gaming: In some ways, it's "a very souped-up Sony PlayStation 3." • Some facts: • The interconnecting system occupies 6,000 square feet with 57 miles of fiber optics and weighs 500,000 pounds. Although made from commercial parts, the computer consists of 6,948 dual-core computer chips and 12,960 cell engines, and it has 80 terabytes of memory housed in 288 connected refrigerator-sized racks. • Two years ago, the fastest computer in the world can perform 100,000,000,000,000 FLOP per second 100 taraflop
Parallel Computers and Programming – the trend • Hardware • Super computers – multiprocessor/multicomputer – the fastest computers at the time • Beowulf – cluster of off-the-shelf computers linked by a switch • Othe distributed system such as NOW • Multi-core – Many core (a CPU itself) within a CPU, soon will go over 60+ cores per CPU • Programming • MPI for message passing architecture • Vendor specific add-on to well known programming languages • New language such as Microsoft’s F# • Multi-core programming (add-on to well known programming languages) • Intel's Threading Building Blocks (TBB) • Microsoft’s Task Parallel Library -- support Parallel For, PLINQ and etc, need to keep an eye on this one • Third party such as Jibu – may merge with MS
Multi-Core Programming • Sequential • Parallel
Why Study Parallel Processing/Programming • Making your code run more efficiently • Utilize existing resources (other cores) • … … • Good coding class for CS students • To learn something new • To improve your skill sets • To improve your problem solving skills • To exercise your brain • To review may Computer Science subject areas • To relax a constraint our professors embedded in our thinking process in our early years of studying (What is the PC in a CPU?)
PRAM (Parallel Random Access Machine) • A theoretical parallel computer • Consists of a control unit, global memory, and an unbounded set of processors, each with its own memory. • In addition, • Each processor has its unique id • At each step, a active processor can Read/Write memory (global or private), perform the instruction as all other active processors, idle, or activate another processor • How many steps does it take to activate n processors
Important Terms • Massive Parallel Computer • Roadrunner • petaflop • Super computers • Beowulf • NOW • MPI • Multi-core • PRAM • computational intensive problem • Moore’s Law • Parallel processing • parallel computer • Control Parallelism • Data Parallelism • Speedup • Grand challenges