1 / 12

Korey Sewell* , Trevor Mudge*, Steven K. Reinhardt* † *Advanced Computer Architecture Labaratory (ACAL) University of

e X treme V irtual P ipelining (XVP): Moving Towards Scalable Multithreaded Processors. Korey Sewell* , Trevor Mudge*, Steven K. Reinhardt* † *Advanced Computer Architecture Labaratory (ACAL) University of Michigan, Ann Arbor † Advanced Micro Devices (AMD). ASPLOS – WACI ‘09.

jayme
Download Presentation

Korey Sewell* , Trevor Mudge*, Steven K. Reinhardt* † *Advanced Computer Architecture Labaratory (ACAL) University of

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eXtremeVirtual Pipelining (XVP): Moving Towards Scalable Multithreaded Processors Korey Sewell*, Trevor Mudge*, Steven K. Reinhardt*† *Advanced Computer Architecture Labaratory (ACAL) University of Michigan, Ann Arbor †Advanced Micro Devices (AMD) ASPLOS – WACI ‘09

  2. The Comp. Arch. Research Train P = Processor(s) T =Thread(s) Uniprocessor-Place (1P, 1T) Many-Core Mansion (~32-64P, ~2-4T) Multithreading-Ville (1P, ~2-4T) Did we miss a stop on the way???? What about “Many”-Threading?!!! Multicore-Estates (2-4P, ~2-4T)

  3. Why “Many-Threading”? • CHANGES the way we think about architecture… • Moving from 2-4 threads per core to 16, 32 or even 64 threads per core • Threads aren’t just Parallel…They’re Adjacent! • What would you create if you had “threads to throw away”? • Hmmmmmmm…..

  4. WACI,“Many”-Threading Possibilities • “Coherence-Free” Synchronization & Communication • Why Suffer from Non-Deterministic Memory Latency when so many threads are adjacent (on same core)? Memory System CPU CPU … … T0 T0 T1 T1 T2 T2 TN TN

  5. WACI,“Many”-Threading Possibilities • Extremely Speculative Multithreading • Use extra threads during speculative events (e.g. branch misprediction, cache miss) • Fast forward execution by traversing speculation tree and then switching threads. T T F … Branch Misprediction T F F

  6. WACI,“Many”-Threading Possibilities • Super Virtual Machines • Security: Every application given it’s own VM? • Many-Many Systems! • Many Threads, Many Cores • 1000 thread system = 64 cores, 16 threads per core • Redundant Multithreading • This list keeps going….and going…and going!!!

  7. How do we get to Many-Threading? • A design that avoids non-scalable, conventional multithreading pitfalls such as… • Replication of per-thread resources • Extensive size increases of shared resources • Complex resource distribution methods amongst threads

  8. WACI Solution:eXtremeVirtual Pipelining (XVP) = T1 = TN = T0 • Provide each thread the illusion that it has all the processor resources to itself • Traditionally, simultaneous executing threads have a shared pipeline view = T0 - TN IQ IQ IQ IQ RF RF RF RF ROB ROB ROB ROB F F F D D D D R R R R F EXE EXE EXE EXE LSQ LSQ LSQ LSQ …

  9. WACI Solution:eXtremeVirtual Pipelining (XVP): • Pipeline Virtualization: Resource entries are mapped into each thread’s address space Resource “X” BaseT0 + BaseT1 + BaseTN + 0 … 7 0 … 7 0 … 7 0 T0 T1 TN CPU 7 MEMORY

  10. WACI Solution:eXtremeVirtual Pipelining (XVP): • XVP extends the notion of a hardware context to include pipeline resources • Add a C-Cache (Context) to avoid D-Cache thrashing and potentially reduce memory footprint in workloads • Each stallable resource matched with it’s own “on-demand” Fill-Spill-Unit (FSU) • Ex:Spill IQ on dep. load miss / Fill when miss resolves • FSU allows resources to dynamically partition themselves for arbitrary workloads • Virtualize all stalling processor resources to memory • Fetch Buffer, Instruction Queue, Load/Store Queue, Register File, Reorder Buffer C-Cache FSU FSU FSU FSU IQ RF ROB F D R EXE LSQ

  11. WACI Conclusion:eXtremeVirtual Pipelining (XVP) • A high # of threads per core opens up interesting multithreading research angles • XVP’s pipeline virtualization moves toward scalable many-threads per core • Each thread has illusion that it has it’s own pipeline • XVP can also benefit single-thread processors… • Because XVP’s virtualization provides more resources than traditionally available.

  12. Thanks for Listening!

More Related