1 / 17

Lect 16: CASE STUDY--UltraSPARC

Lect 16: CASE STUDY--UltraSPARC. UltraSPARC II TM Microprocessor. Features Full 64-bit implementation of SPARC V9 architecture 4- Way SuperScalar, In-order dispatch, out-of-order completion 100% binary compatibility with previous versions of SPARC systems

qiana
Download Presentation

Lect 16: CASE STUDY--UltraSPARC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lect 16: CASE STUDY--UltraSPARC

  2. UltraSPARC IITM Microprocessor • Features • Full 64-bit implementation of SPARC V9 architecture • 4-Way SuperScalar, In-order dispatch, out-of-order completion • 100% binary compatibility with previous versions of SPARC systems • Built-in MP support(glueless 4-way and up to 64-way) • Performance Scalability (frequency range: 250-480MHz; L2 cache support: 256KB-16MB) • VIS multimedia accelerating instructions • Error Checking & Correction (ECC) and Parity • State-of-the-art 0.25 micron technology and packaging • Superscalar/Superpipelined high-performance micro-architecture

  3. Specification • Transistors: 5.4M • Process technology: 0.25 micron, 5 metal layers • Die size: 126 mm sq. • Frequency range: 250-480MHz • Core voltage: 1.9V • Power dissipation: 21W @ 400MHz • On-chip instruction cache: 16KB • On-chip data cache: 16KB • L2 cache size: 256KB-16MB • Max. Bandwidth to L2 cache: 5333MB/sec • Max. bandwidth to memory: 1.92GB/sec @ 120MHz UPA

  4. Outstanding memory requests: 3 loads, 2 stores • Software data prefetch: yes • Integer execution units: 4 • Floating-point execution units: 3 • Graphics execution unit: 1 • System Performance (est): 19.6 (SPECint95) and 27.1 (SPECfp95)@ 450MHz • System I/O voltage: 3.3V

  5. Architecture

  6. Architecture • Prefetch, branch prediction and dispatch unit (PDU) • Up to 12 instructions • 16-Kilobyte instruction cache (I-Cache) • 2-way set associative, 32 byte blocks • Memory management unit (MMU) containing two 64-entry buffers • a 64-entry instruction translation lookaside buffer (iTLB) • a 64-entry data translation lookaside buffer (dTLB) • 44-bit virtual address and a 41-bit physical address • Page sizes : 8-, 64-, and 512 KB and 4 MB • Integer execution unit (IEU) with two arithmetic logic units (ALUs) • Eight register windows • Four sets of global registers(normal, alternate, MMU, and INT)

  7. Load and store unit with a separate address generation adder • Load buffer and store buffer decoupling data accesses from the pipeline • 16-Kilobyte data cache (D-Cache) • Direct-mapped, two 16-byte sub-blocks per line • Floating-point unit (FPU) with independent add, multiply and divide/square root sub-units • Graphics unit (GRU) composed of two independent execution pipelines • External cache (E-Cache) control unit • Memory interface unit • responsible for main memory and I/O accesses

  8. Processor Pipeline N3 Resolves traps F Fetch Fetches instructions from I-cache D Decode Decodes and sends inst to inst buffer G Group Groups and dispatches up to 4 insts Accesses reg file E Execute Executes Integer Inst Calculates VA C Cache access Access D-cache and TLB Resolves branches N1 Determines D-cache hit or miss Deferred load enters load buffer N2 Integer waits for FP/G pipeline W Write Writes all results to register files X2 Execution continues X3 Execution finishes R Register Further decodes FP/G instr Accesses reg file X1 Execution starts

  9. VIS Instruction Set • What is the VIS instruction set? • To accelerate multimedia, image processing, and networking applications by 2x-7x, performing up to 10 operations per cycle • IMPDEP1 instructions (op3 = 110110) • Only 3 percent of the actual chip real estate is devoted to the graphics instructions

  10. Floating-point/graphics unit(FGU)

  11. VIS • Data Formats Tailored for Graphics • Image components : 8 or 16 bits • 8-bit format : unsigned 8-bit integers in a 32-bit word • 16-bit format : 16-bit signed fixed-point values in a 64-bit word • Intermediate results : 16 or 32 bits

  12. UltraSPARC VIS Set • Pixel expand • Pixel packing • Partitioned add • Partitioned multiply • Partitioned compare • Align • Edge handling • Array addressing • Merge • Pixel distance • Logical

  13. Pixel Expand • Pixel Pack

  14. Align instruction • allows the processor to access pixels that are in the middle of a 64-bit word • Edge instruction • masks off all the unused pixels • Array operation • converts the 3-D fixed-point addresses into a blocked-byte address • typically composed of 24 instructions : one instructions by UltraSPARC

  15. UltraSPARC III and the Future • UltraSPARC-III • 600 MHz • MP systems with over 1,000 processors • 0.18 micron-process technology • The Future Directions • UltraSPARC IV and V • 0.07 micron technology • 1 GHz and 1.5 GHz • Odd-numbered generations: New architecture pipelines Even-numbered generations: Upgrading process technology

  16. Road Map

More Related