480 likes | 497 Views
Explore the concepts of multithreading, parallelism, and cache coherence to maximize processor usage and improve resource efficiency in modern complex systems. Learn about different types of threading, their benefits and challenges, and the impact on performance and memory access.
E N D
Process vs Thread • Thread : Instruction sequence • Own registers/stack • Share memorywith otherthreads in process (program)
Threaded Code • Demo…
Multithreading • Multithreading • Alternate or combine threads to maximize use of processor • Hardware required • Multiple register sets • Track "owner" of pipeline instructions
Resource Usage • Code running in a superscalar pipeline • Can't always fill all 4 issue slots • Have bubbles from memory access, page faults, etc… Issue Slots
Threading Examples • Assumptions: • Three threads of work • In order execution • Must obay stalls (i.e. A3 is 3+ cycles after A2)
Threading Examples • Assumptions: • Three threads of work • In order execution • Must obay stalls (i.e. A3 is 3+ cycles after A2) • Two wide pipeline (two instructions per cycle)
Multithreading • Corse Grained Multitasking • Threads run until stall • Cache miss, page fault • Other long event • On stall, drain pipeline, and start next thread
Multithreading • Corse Grained Multitasking • Threads run until stall • Cache miss, page fault • Other long event • On stall, drain pipeline, and start next thread
Corse Example • Course Multi threading • Avoids waiting for long periods • Wastes tme on context switches • 16/30 possible units of work
Multithreading • Course Grained • Assumption1 cycle to retire after stall Threads to run Single Pipeline Time
Multithreading • Course Grained • Assumption1 cycle to retire after stall Threads to run Dual Pipeline Time
Multithreading • Corse Grained Multitasking • Avoids wasting time on large stalls • Context switches waste time • Ex: Does work in 16/30 possible slots
Multithreading • Fine Grained Multitasking • Every cycle, switch threads
Multithreading • Fine Grained • Switch eachcycle to nextready thread Threads to run Single Pipeline Time
Multithreading • Fine Grained • Switch eachcycle to nextready thread Threads to run Dual Pipeline Time A6 can’t run until 4 after A5Gets skipped at time 10
Multithreading • Fine Grained Multithreading • More responsive for each thread • Significant hardware required • Multiple register sets • Track "owner" of pipeline instructions • Ex: Finishes in 15 steps24 out of 30 possible units of work
Latency vs Throughput • Multithreading favors throughput over latency • Longer to do any one task • Shorter overall to do all
Multithreading • SMT : Simultaneous Multithreading • AKA Hyperthreading • Can issue instructions from multiple threads in one cycle
SMT • SMT : Simultaneous Multithreading • AKA Hyperthreading • Execution units can each work on different threads
Multithreading SMT • Switch like fine grained • Do work from multiplethreads if needed to fillpipelines Threads to run B4 not ready, but C3 is Time
Multithreading SMT • Switch like fine grained • Do work from multiplethreads if needed to fillpipelines Threads to run C5 not ready but A5 is Time
Multithreading SMT • Switch like fine grained • Do work from multiplethreads if needed to fillpipelines Threads to run B4, C5, A6 all waiting Time
Multithreading • Simultaneous Multi Threading • Better potential to use all hardware execution units • Depends on complimentary work loads • More book keeping required
SMT Challenges • Resources must be duplicated or split • Split too thin hurts performance… • Duplicate everything and you aren't maximizing use of hardware…
Intel vs AMD • Variations on SMT
Intel vs AMD • AMD Zen architecture
Development • Single Core
Development • Single Core with Multithreading • 2002 Pentium 4 / Xeon
Development • Multi Processor • Multiple processors coexisting in system • PC space in ~1995
Development • Multi Core • Multiple CPU's on one chip • PC space in ~2005
Development • Modern Complexity • Many cores • Private / Shared cache levels
Development • Massively Parallel Systems
UMA • Uniform Memory Access • Every processor sees every memory using same addresses • Same access time for any CPU to any memory word
NUMA • Non Uniform Memory Access • Single memory address space visible to all CPUs • Some memory local • Fast • Some memory remote • Accessed in same way, but slower
Sunway Architecture • One chip : 256 cores~1.5Ghz • Computer :40,000+ chips
Multiprocessing & Memory • Memory demo…
Memory Access • Race conditions : unpredictable effects of sharing memory • May add 10, 1 or 11 to x
Memory Access • Syncronization – using locks to prevent others from accessing memory
Memory Access • Syncronization issues: • No longer parallel • Deadlock
Cache Coherence • Cache Coherence : Trying to make sure cached memory stays synched
Cache Coherence • Cache Coherence : • Need ability to snoop on activity and/or broadcast changes
Cache Coherence • Cache Coherence : • Need ability to snoop on activity and/or broadcast changes A broacasts write on X, B knows it no longer has valid value
Cache Coherence • Cache Coherence : • Need ability to snoop on activity and/or broadcast changes A snoops on B asking for X, provides New value and updates memory