210 likes | 276 Views
CS8625-June-22-2006. Homework & Midterm Review. CS8625 High Performance and Parallel Computing Dr. Ken Hoganson. Class Will Start Momentarily…. Balance Point. The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law
E N D
CS8625-June-22-2006 Homework & Midterm Review CS8625 High Performance and Parallel ComputingDr. Ken Hoganson • Class • Will • Start • Momentarily…
Balance Point • The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law • Note the balance point in the denominator where both parts are equal. • Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.
Balance Point Heuristic • Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup. Solved for N N= α -------- 1-α Solved for α α= N -------- N + 1
Balance Point • Example • Parallel Fraction = 90% • (10% in serial) Solved for N N= α -------- 1-α N=0.90/0.10=9, Sup=5
Example • Example: Workload has an average alpha of 94%. How many processors can reasonably be applied to speedup this workload? Solved for N N= α -------- 1-α
Example • Example: An architecture has 32 processors. What workload parallel fraction is the minimum need to make reasonably efficient use of the processors? Solved for α α= N -------- N + 1
Multi-Bus Multiprocessors • Shared-Memory Multiprocessors are very fast • Low latency to memory on bus • Low communication overhead through shared-memory • Scalability problems • Length of bus slows signals (.75 SOL) • Contention for the bus reduces performance • Requires Cache to reduce contention CPU CPU MEM CPU
Bus Contention Multiple devices – processors, etc, compete for access to a bus Only one device can use a bus at a time, limiting performance and scalability 1 – zero requests – exactly one request = probability of 2 or more (at least one blocked request)
Performance degrades as requests are blocked • Resubmitted blocked requests degrades performance even further than that shown above
Clearly, the probability that a processor’s access to a shared bus will be denied will increase with both: • The number of processors sharing a bus • The probability a processor will need access to the bus. • What can be done? What is the “universal band-aid” for performance problems?
If cache greatly reduces access to mem, then • Blocking rate on the bus is much lower.
Two approaches to improving shared memory/bus machine performance: • Invest in large amounts, and multiple levels of, cache, • and a connection network to allow caches to synchronize contents. • Invest in multiple buses and independently accessible blocks of memory • Combining both may be the best strategy.
Homework • Your project is to explore the effect on the performance of a shared-memory bus-based multiprocessor, of interconnection network contention. • You will do some calculations, use the HPPAS simulator, and write a couple-page report to turn in.
Task 1 • For a machine with processors that include on-chip cache that yield a cache hit rate of 90%, determine the maximum number of processors that can go on a single shared-bus, and still maintain at least a 98% acceptance of requests. • Use the calculations shown in the lecture to zero in on the correct answer, recording your calculations in a table for your report. Show each step of the calculation as was done in the lecture/ppt. • Your results should “bracket” the maximum.
Task 1 • Task 1: Use the formula in the table to find
Task 2 • Use the maximum number of processors (Task 1) and Amdahl’s law at the balance point, to figure out what workload parallel fraction yields a balance in the denominator. • Determine the theoretical speedup that will be obtained. Solved for α α= N -------- N + 1
Task 3 • Use the data values developed so far, to run the HPPAS simulation system. Record the speedup obtained from this system. • If it differs markedly from the theoretical value, check all the settings, and rerun the simulation, and explain any variation from the theoretical expected value. • Record your results in your report, showing each step of the calculation as was done in the lecture/ppt.
Dates • The current plan: • Make the midterm available on Friday June 23. • Due date will be July 10 (after the conference and after the July 4th weekend). • Conference week: • Complete homework: Due on July 3 by email. • Work on Midterm exam. • No class lecture on June 27 and 29. • No class on July 4. • Next live class is Wed July 6.
Topic Overview Overview of topics for the exam: • Five parallel levels • Problems to be solved for parallelism • Limitations to parallel speedup • Amdahl’s Law: theory, implications • Limiting factors in realizing parallel performance • Pipelines and their performance issues • Flynn’s classification • SIMD architectures • SIMD algorithms • Elementary analysis of algorithms • MIMD: Multiprocessors and Multicomputers • Balance point and heuristic (from Amdahl’s Law) • Bus contention and analysis of single shared bus. • Use of the online HPPAS tool. • Specific multiprocessor clustered architectures: • Compaq • DASH • Dell Blade Cluster
End of Lecture End Of Today’s Lecture.