310 likes | 547 Views
UltraSparc IV. Tolga TOLGAY. OUTLINE. Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion. INTRODUCTION. Sparc = Scalable Processor Architecture Open processor architecture SUN UltraSparc v9: RISC Architecture 64 bit address and data
E N D
UltraSparc IV Tolga TOLGAY
OUTLINE • Introduction • History • What is new? • Chip Multitreading • Pipeline • Cache • Branch Prediction • Conclusion
INTRODUCTION • Sparc = Scalable Processor Architecture • Open processor architecture • SUN UltraSparc v9: • RISC Architecture • 64 bit address and data • Superscalar
HISTORY • Begin developing Sparc – 1984 • First Sparc Processor – 1986 • SuperSparc – 1992 • UltraSparc I – 1995 • UltraSparc II – 1997 • UltraSparc III – 2001 • UltraSparc IV – 2004 • UltraSparc IV+ – 2005 • UltraSparc T1 – 2005
WHAT IS NEW? • What UltraSparc IV offers new : • CMT (Chip Multithreading) • New registers added due to CMT enhancement • MCU registers, Sun Fireplan Interconnect registers are shared. • Enhancements on Floating Point Unit • 16 MB L2 cache with 128 byte line-size shared by two processors. • L2 caches uses LRU replacement strategy • New write-cache indexing-hashing feature
Chip Multitreading (CMT) • Two UltraSparc III cores into one die. • Two mirrored cores share : • System bus • DRAM controller • Off-die L2 cache • Fireplan registers. • Also called Chip Multiprocessing
Chip Multitreading • Aim is to increase performance without increasing clock speed. • Mirroring the cores cause a hot spot of floating point units. • How to avoid hot spot : • Heat towers in copper interconnect
Core • More core improvements: • Improved instruction fetch and store bandwidth. • Improved data prefetching • FPU can handle more unexpected and underflow cases so reducing exceptions. • On-die cache enhanced with a hashed index to better handle multiple writes.
Pipeline • Because UltraSparc IV contains two UltraSparc III cores, it uses the same pipeline. • 4-way superscalar architecture. • 14-stage pipeline.
Pipeline Stages • Stage A : Address Generation • Generates and selects the fetch address • Address can be selected from several sources • Stage P : Preliminary Fetch • Starts fetching from I-Cache • Accesses to Branch Predictor • Stage F : Fetch • Second half of I-Cache access • At the end of stage 4 instructions may be latched • Stage B : Branch Target Computation • Analyzes the instructions • Calculate branch target address
Pipeline Stages • Stage I : Instruction Group Formation • Instructions are grouped into instruction queue. • Stage J : Instruction Group Staging • A group of instructions are dequeued and sent to R-Stage • Stage R : Dispatch and Register Access • Dependency calculation • Dependency solution
Pipeline Stages • Stage E : Integer Instruction Execution • First stage of execution pipelines • Integer instructions -> A0 and A1 pipelines • Branch instructions -> Branch pipeline • Other instructions -> MS pipeline • Stage C : Cache • Integer pipelines write results back • SIU results are produced • First stage for Floating Point Instructions
Pipeline Stages • Stage M : Miss • Data cache misses are determined • Second step for FP instructions • Stage W : Write • MS pipeline results are written • Third step for FP instructions • D-cache miss requests send to L2 cache • Stage X : Extend • Final step for Floating Point instructions • Results from FP instructions are ready for bypass
Pipeline Stages • Stage T : Trap • Traps are signalled • After trap, instructions invalidate results • Stage D : Done • Integer results are written into architectural register file • Floating point results are written to floating point register file. • Results became visible to any traps generated from younger instructions.
Pipeline Rules • Grouping rules : • Group : collection of instructions that does not limit eachother to be executed in parallel • Made before R-stage • Needed for : • The execution order is maintained • Each pipeline runs a subset of instructions • Instructions may require helpers • Execution order : in – order execution
Cache Organization • Doubled cache size because of dual core. • Data Cache : 64 KB x 2 • Instruction Cache : 32 KB x 2 • L2 Cache : 16 MB, off-chip, shared • No L3 Cache
Cache Organization • Data Cache • 64 KB Level 1 cache per core • Instruction Cache • 32 KB Level 1 cache per core • 4 – way associative
Cache Organization • Prefetch Cache • One of L1 caches • 2 Kbyte SRAM : 32 x 64 bytes • Uses LRU replacement algorithm • Aim is to fetch data before needed • Reduces main memory access latency • 2 ports reads 8 bytes, 1 port writes 16 bytes per cycle. • Hardware prefetch
Cache Organization • Write Cache • Reduces the bandwidth due to store traffic • 2 Kbyte cache • Handles multiprocessor and on-chip cache consistency • Improves error recovery • Optionally uses a hashed index
Cache Organization • L2 Cache • 16 MB SRAM shared by two processors • Seperate L2 cache tags • Two way set associative • LRU replacement policy • 128 bytes of line size • UltraSparc IV+ has an on-die Level 2 cache with an off-die Level 3 cache
Branch Prediction • Branch Predictor : • Small, single-cycle accessed • SRAM • Output is connected to P-stage • Branch detemination is made in B-stage • If miss, return to A-Stage.
Conclusion • UltraSparc IV is a milestone as it is first dual core chip of UltraSparc family • Sun continues to develop UltraSparc : • UltraSparc IV+ • UltraSparc T1
References • UltraSparc IV User’s Manual, Sun Microsystems • UltraSparc IV Whitepaper, Sun Microsystems • UltraSparc IV Mirrors Predecessor, Kevin Krewell • Implementation and Productization of a 4th Generation 1.8GHz Dual-Core SPARC V9 Microprocessor, Anand Dixit, Jason Hart, ... • UltraSparc III User’s Manual, Sun Microsystems
References • Web Sites : • http://web.cs.unlv.edu/cs219/group3/index.html • http://bwrc.eecs.berkeley.edu/CIC/archive/cpu_history.html#SPARC • http://www.arcade-eu.org/overview/2005/sparcIV.html • http://www.top500.org/orsc/2006/sparcIV.htm • http://www.sparc.org/history.html