240 likes | 315 Views
Chapter 6 Multiprocessor System. Introduction. Each processor in a multiprocessor system can be executing a different instruction at any time. The major advantages of MIMD system Reliability High performance The overhead involved with MIMD Communication between processors
E N D
Introduction • Each processor in a multiprocessor system can be executing a different instruction at any time. • The major advantages of MIMD system • Reliability • High performance • The overhead involved with MIMD • Communication between processors • Synchronization of the work • Waste of processor time if any processor runs out of work to do • Processor scheduling
Introduction (continued) • task • An entity to which a processor is assigned • a program, a function or a procedure in execution • process • another word for a task • processor (or processing element) • hardware resource on which tasks are executed
Introduction (continued) • Thread • The sequence of tasks performed in succession by a given processor • The path of execution of a processor through a number of tasks. • Multiprocessors provide for the simultaneous presence of a number of threads of execution in an application. • Refer to Example 6.1 (degree of parallelism =3)
R-to-C ratio • Ameasure of how much overhead is produced per unit of computation. • R: the length of the run time of the task (=computation time) • C: the communication overhead • This ratio signifies task granularity • A high R-to-C ratio implies that communication overhead is insignificant compared to computation time.
Task granularity • Task granularity • Coarse grain parallelism • High R-to-C ratio • Fine grain parallelism • Low R-to-C ratio • The general tendency to maximum performance is to resort to the finest possible granularity. providing for the highest degree of parallelism. • Maximum parallelism does not lead to maximum overhead. a trade-off is required to reach an optimum level.
6.1 MIMD Organization(Figure 6.2) • Two popular MIMD organizations • Shared memory (or tightly coupled ) architecture • Message passing (or loosely coupled) architecture • Share memory architecture • UMA (uniform memory architecture) • Rapid memory access • Memory contention
6.1 MIMD Organization (continued) • Message-passing architecture • Distributed memory MIMD system • NUMA (nonuniform memory access) • Heavy communication overhead for remote memory access • No memory contention problem • Other models • Mixed of two
6.2 Memory Organization • Two parameters of interest in MIMD memory system design • bandwidth • latency. • Memory latency is reduced by increasing the memory bandwidth. • By building the memory system with multiple independent memory modules (Banked and interleaved memory architecture) • By reducing the memory access and cycle times
Multi-port memories • Figure 6.3 (b) • Each memory module is a three-port memory device. • All three ports can be active simultaneously. • The only restriction is that only one location can be write data into a memory location.
Cache incoherence • The problem wherein the value of a data item is not consistent throughout the memory system. • Write-through • A processor updates the cache and also the corresponding entry in the main memory. • Updating protocol • Invalidating protocol • Write-back • An updated cache-block is written back to the main memory just before that block is replaced in the cache.
6.2 Memory Organization (continued) • Cache coherence schemes • Not to use private caches (Figure 6.4) • With private cache architecture, but to cache only non-sharable data items. • Cache flushing • Shared data are allowed to be cached only when it is known that only one processor will be accessing the data
6.2 Memory Organization (continued) • Cache coherence schemes (continued) • Bus watching (or bus snooping) (Figure 6.5) • Bus watching schemes incorporate hardware that monitors the shared bus for data LOAD and STORE into each processor’s cache controller. • Write-once • The first STOREcauses a write-through to the main memory. • Ownership protocol
6.3 Interconnection Network • Bus (Figure 6.6) • Bus window (Figure 6.7(a)) • Fat tree (Figure 6.7 (b)) • Loop or ring • token ring standard • Mesh
6.3 Interconnection Network(continued) • Hypercube • Routing is straightforward. • The number of nodes must be increased by powers of two. • Crossbar • It offers multiple simultaneous communications but at a high hardware complexity. • Multistage switching networks
6.4 Operating System Considerations • The major functions of the multiprocessor system • Keeping track of the status of all the resources at all time • Assigning tasks to processors in a justifiable manner • Spawning and creating new processors such that they can be executed in parallel or independently of each other. • Collecting their individual results when all the spawned processed are completed and passing them to other processors as required.
6.4 Operating System Considerations (continued) • Synchronization mechanisms • Processes in an MIMD operate in a cooperative manner and a sequence control mechanism is needed to ensure the ordering of operations. • Processes compete with each other to gain access to shared data items. • An access control mechanism is needed to maintain orderly access
6.4 Operating System Considerations (continued) • Synchronization mechanisms • The most primitive synchronization techniques • Test & set • Semaphores • Barrier synchronization • Fetch & add • Heavy-weight process and Light-weight process • Scheduling • Static • Dynamic : load balancing
6.5 Programming (continued) • Four main structures of parallel programming • Parbegin / parend • Fork / join • Doall • Processes, tasks, procedures, and so on can be declared for parallel execution.
6.6 Performance Evaluation and Scalability • Performance evaluation • Speed-up : S = Ts / Tp To= TpP-Ts Tp=(To+Ts)/P S = Ts P/(To+Ts) • Efficiency : E = S/p = Ts/(Ts+To) = 1/(1+To/Ts)
Scalability • Scalability: the ability to increase speedup as the number of processors increase. • A parallel system is scalable if its efficiency can be maintained at a fixed value by increasing the number of processors as the problem size increases. • Time-constrained scaling • Memory-constrained scaling
Isoefficiency function • E = 1/(1+To/Ts) To/Ts=(1-E)/E. Hence, Ts=ETo/(1-E) For a given value of E, E/(1-E) is a constant, K. Then Ts=KTo (Isoefficency function) • A small isoeffiency function indicates that small increments in problem size are sufficient to maintain efficiency when p is increased.
6.6 Performance Evaluation and Scalability (continued) • Performance models • The basic model • Each task is equal and takes R time units to be executed on a processor. • If two tasks on different processors wish to communicate with each other, they do so at a cost C time units. • Model with linear communication overhead • Model with overlapped communication • Stochastic model
Examples • Alliant FX series • Figure 6.17 • Parallelism • Instruction level • Loop level • Task level