170 likes | 322 Views
COMP25212 CPU Multi Threading. Learning Outcomes: to be able to: Describe the motivation for multithread support in CPU hardware To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading To explain when multithreading is inappropriate
E N D
COMP25212 CPU Multi Threading • Learning Outcomes: to be able to: • Describe the motivation for multithread support in CPU hardware • To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading • To explain when multithreading is inappropriate • To be able to describe a multithreading implementations • To be able to estimate performance of these implementations • To be able to state important assumptions of this performance model
Revision: IncreasingCPU Performance Inst Cache Data Cache c f e b Fetch Logic Fetch Logic Decode Logic Exec Logic Fetch Logic Fetch Logic Mem Logic Write Logic d Clock How can throughput be increased? a
Increasing CPU Performance • By increasing clock frequency • By increasing Instructions per Clock • Minimizing memory access impact – data cache • Maximising Inst issue rate – branch prediction • Maximising Inst issue rate – superscalar • Maximising pipeline utilisation – avoid instruction dependencies – out of order execution • (What does lengthening pipeline do?)
Increasing Program Parellelism • Keep issuing instructions after branch? • Keep processing instructions after cache miss? • Process instructions in parallel? • Write register while previous write pending? • Where can we find additional independent instructions? • In a different program!
Revision – Process States New Terminated Needs to wait (e.g. I/O) Running on a CPU Blocked waiting for event Pre-empted (e.g. timer) Dispatch(scheduler) I/O occurs Ready waiting for a CPU
Revision – Process Control Block • Process ID • Process State • PC • Stack Pointer • General Registers • Memory Management Info • Open File List, with positions • Network Connections • CPU time used • Parent Process ID
Revision: CPU Switch Operating System Process P1 Process P0 Save state into PCB0 Load state fromPCB1 Save state into PCB0 Load state fromPCB1
What does CPU load on dispatch? • Process ID • Process State • PC • Stack Pointer • General Registers • Memory Management Info • Open File List, with positions • Network Connections • CPU time used • Parent Process ID
What does CPU need to store on deschedule? • Process ID • Process State • PC • Stack Pointer • General Registers • Memory Management Info • Open File List, with positions • Network Connections • CPU time used • Parent Process ID
CPU Support for Multithreading Inst Cache Data Cache GPRsA VA MappingA PCA Address Translation Fetch Logic Decode Logic Fetch Logic Fetch Logic Exec Logic Mem Logic Fetch Logic Write Logic VA MappingB PCB GPRsB
How Should OS View Extra Hardware Thread? • A variety of solutions • Simplest is probably to declare extra CPU • Need multiprocessor-aware OS
CPU Support for Multithreading Design Issue: when to switch threads Inst Cache Data Cache GPRsA VA MappingA PCA Address Translation Fetch Logic Fetch Logic Decode Logic Exec Logic Fetch Logic Fetch Logic Mem Logic Write Logic GPRsB PCB VA MappingB
Coarse-Grain Multithreading • Switch Thread on “expensive” operation: • E.g. I-cache miss • E.g. D-cache miss • Some are easier than others!
Performance of Coarse Grain • Assume (conservatively) • 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks) • 1 i-cache miss per 100 instructions • 1 instruction per clock otherwise • Then, time to execute 100 instructions without multithreading • 100 + 20 clock cycles • Inst per Clock = 100 / 120 = 0.83. • With multithreading: time to exec 100 instructions: • 100 [+ 1] • Inst per Clock = 100 / 101 = 0.99..
Switch Threads on Dcache miss Abort these Performance: similar calculation (STATE ASSUMPTIONS!) Where to restart after memory cycle? I suggest instruction “a” – why?