1 / 13

SIMULTANEOUS MULTITHREADING

SIMULTANEOUS MULTITHREADING. Ting Liu Liu Ren Hua Zhong. Contemporary forms of parallelism. Instruction-level parallelism(ILP) Wide-issue Superscalar processors (SS) 4 or more instruction per cycle Executing a single program or thread

lovette
Download Presentation

SIMULTANEOUS MULTITHREADING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong

  2. Contemporary forms of parallelism • Instruction-level parallelism(ILP) • Wide-issue Superscalar processors (SS) • 4 or more instruction per cycle • Executing a single program or thread • Attempts to find multiple instructions to issue each cycle. • Thread-level parallelism(TLP) • Fine-grained multithreaded superscalars(FGMS) • Contain hardware state for several threads • Executing multiple threads • On any given cycle a processor executes instructions from one of the threads • Multiprocessor(MP) • Performance improved by adding more CPUs

  3. Simultaneous Multithreading • Key idea Issue multiple instructions from multiple threads each cycle • Features • Fully exploit thread-level parallelism and instruction-level parallelism. • Better Performance • Mix of independent programs • Programs that are parallelizable • Single threaded program

  4. Superscalar(SS) Multithreading(FGMT) SMT Issue slots

  5. Multiprocessor vs. SMT Multiprocessor(MP2) SMT

  6. SMT Architecture(1) • Base Processor: like out-of-order superscalar processor.[MIPS R10000] • Changes: With N simultaneous running threads, need N PC and N subroutine return stacks and more than N*32 physical registers for register renaming in total.

  7. SMT Architecture(2) • Need large register files, longer register access time, pipeline stages are added.[Register reads and writes each take 2 stages.] • Share the cache hierarchy and branch prediction hardware. • Each cycle: select up to 2 threads and each fetch up to 4 instructions.(2.4 scheme)

  8. Effectively Using Parallelism on a SMT Processor Instruction Throughput executing a parallel workload

  9. Effects of Thread Interference In Shared Structures • Interthread Cache Interference • Increased Memory Requirements • Interference in Branch Prediction Hardware

  10. Interthread Cache Interference • Because the share the cache, so more threads, lower hit-rate. • Two reasons why this is not a significant problem: • The L1 Cache miss can almost be entirely covered by the 4-way set associative L2 cache. • Out-of-order execution, write buffering and the use of multiple threads allow SMT to hide the small increases of additional memory latency. 0.1% speed up without interthread cache miss.

  11. Increased Memory Requirements • More threads are used, more memory references per cycle. • Bank conflicts in L1 cache account for the most part of the memory accesses. • It is ignorable: • For longer cache line: gains due to better spatial locality outweighted the costs of L1 bank contention • 3.4% speedup if no interthread contentions.

  12. Interference in Branch Prediction Hardware • Since all threads share the prediction hardware, it will experience interthread interference. • This effect is ignorable since: • the speedup outweighted the additional latencies • From 1 to 8 threads, branch and jump misprediction rates range from 2.0%-2.8% (branch) 0.0%-0.1% (jump)

  13. Discussion • ???

More Related