310 likes | 480 Views
Overview. Why multiprocessors? The structure of multiprocessors. Elements of multiprocessors: Processing elements. Memory. Interconnect. Why multiprocessing?. True parallelism: Task level. Data level. May be necessary to meet real-time requirements. print engine. File read,
E N D
Overview • Why multiprocessors? • The structure of multiprocessors. • Elements of multiprocessors: • Processing elements. • Memory. • Interconnect. Overheads for Computers as Components 2e
Why multiprocessing? • True parallelism: • Task level. • Data level. • May be necessary to meet real-time requirements. Overheads for Computers as Components 2e
print engine File read, Rendering, Etc. Multiprocessing and real time • Faster rate processes are isolated on processors. • Specialized memory system as well. • Slower rate processes are shared on a processor (or processor pool). mem CPU mem CPU Overheads for Computers as Components 2e
Heterogeneous multiprocessors • Will often have a heterogeneous structure. • Different types of PEs. • Specialized memory structure. • Specialized interconnect. Overheads for Computers as Components 2e
Multiprocessor system-on-chip • Multiple processors. • CPUs, DSPs, etc. • Hardwired blocks. • Mixed-signal. • Custom memory system. • Lots of software. Overheads for Computers as Components 2e
System-on-chip applications • Sophisticated markets: • High volume. • Demanding performance, power requirements. • Strict price restrictions. • Often standards-driven. • Examples: • Communications. • Multimedia. • Networking. Overheads for Computers as Components 2e
Terminology • PE: processing element. • Interconnection network: may require more than one clock cycle to transfer data. • Message: address+data packet. Overheads for Computers as Components 2e
Shared memory: Message passing: Generic multiprocessor PE PE PE mem mem mem … … PE PE PE Interconnect network Interconnect network mem mem mem … Overheads for Computers as Components 2e
Shared memory vs. message passing • Shared memory and message passing are functionally equivalent. • Different programming models: • Shared memory more like uniprocessor. • Message passing good for streaming. • May have different implementation costs: • Interconnection network. Overheads for Computers as Components 2e
Shared memory implementation • Memory blocks are in address space. • Memory interface sends messages through network to addressed memory block. Overheads for Computers as Components 2e
Message passing implementation • Program provides processor address, data/parameters. • Usually through API. • Packet(s) interface appears as I/O device. • Packet routed through network to interface. • Recipient must decode parameters to determine how to handle the message. Overheads for Computers as Components 2e
Processing element selection • What tasks run on what PEs? • Some tasks may be duplicated (e.g., HDTV motion estimation). • Some processors may run different tasks. • How does the load change? • Static vs. dynamic task allocation. Overheads for Computers as Components 2e
Matching PEs to tasks • Factors: • Word size. • Operand types. • Performance. • Energy/power consumption. • Hardwired function units: • Performance. • Interface. Overheads for Computers as Components 2e
Task allocation • Tasks may be created at: • Design time (video encoder). • Run time (user interface). • Tasks may be assigned to processing elements at: • Design time (predictable load). • Run time (varying load). Overheads for Computers as Components 2e
Memory system design • Uniform vs. heterogeneous memory system. • Power consumption. • Cost. • Programming difficulty. • Caches: • Memory consistency. Overheads for Computers as Components 2e
Parallel memory systems • True concurrency---several memory blocks can operate simultaneously. PE PE PE … Interconnect network mem mem mem … Overheads for Computers as Components 2e
Cache consistency • Problem: caches hide memory updates. • Solution: have caches snoop changes. PE PE cache cache network mem mem Overheads for Computers as Components 2e
Cache consistency and tasks • Traditional scientific computing maps a single task onto multiple PEs. • Embedded computing maps different tasks onto multiple PEs. • May be producer/consumer. • Not all of the memory may need to be consistent. Overheads for Computers as Components 2e
Network topologies • Major choices. • Bus. • Crossbar. • Buffered crossbar. • Mesh. • Application-specific. Overheads for Computers as Components 2e
Bus network • Advantages: • Well-understood. • Easy to program. • Many standards. • Disadvantages: • Contention. • Significant capacitive load. Overheads for Computers as Components 2e
Crossbar • Advantages: • No contention. • Simple design. • Disadvantages: • Not feasible for large numbers of ports. Overheads for Computers as Components 2e
Buffered crossbar • Advantages: • Smaller than crossbar. • Can achieve high utilization. • Disadvantages: • Requires scheduling. Xbar Overheads for Computers as Components 2e
Mesh • Advantages: • Well-understood. • Regular architecture. • Disadvantages: • Poor utilization. Overheads for Computers as Components 2e
Application-specific. • Advantages: • Higher utilization. • Lower power. • Disadvantages: • Must be designed. • Must carefully allocate data. Overheads for Computers as Components 2e
TI OMAP OMAP 5910: • Targets communications, multimedia. • Multiprocessor with DSP, RISC. C55x DSP MPU interface bridge MMU I/O System DMA control Memory ctrl ARM9 Overheads for Computers as Components 2e
RTOS for multiprocessors • Issues: • Multiprocessor communication primitives. • Scheduling policies. • Task scheduling is considerably harder with true concurrency. Overheads for Computers as Components 2e
Distributed system performance • Longest-path algorithms don’t work under preemption. • Several algorithms unroll the schedule to the length of the least common multiple of the periods: • produces a very long schedule; • doesn’t work for non-fixed periods. • Schedules based on upper bounds may give inaccurate results. Overheads for Computers as Components 2e
Data dependencies help • P3 cannot preempt both P1 and P2. • P1 cannot preempt P2. P1 P3 P2 Overheads for Computers as Components 2e
Preemptive execution hurts • Worst combination of events for P5’s response time: • P2 of higher priority • P2 initiated before P4 • causes P5 to wait for P2 and P3. • Independent tasks can interfere—can’t use longest path algorithms. P1 P2 P3 P5 P4 M1 M2 M3 Overheads for Computers as Components 2e
Period shifting example t1 t2 t3 process CPU time P1 30 P2 10 P3 30 P4 20 taskperiod t1 150 t2 70 t3 110 • P2 delayed on CPU 1; data dependency delays P3; priority delays P4. Worst-case t3 delay is 80, not 50. P1 P2 P4 P3 P1 P2 P2 CPU 1 P3 P4 P3 P4 CPU 2 Overheads for Computers as Components 2e