170 likes | 339 Views
Lect 18: CASE STUDY--UltraSPARC. UltraSPARC II TM Microprocessor. Features Full 64-bit implementation of SPARC V9 architecture 4- Way SuperScalar, In-order dispatch, out-of-order completion 100% binary compatibility with previous versions of SPARC systems
E N D
UltraSPARC IITM Microprocessor • Features • Full 64-bit implementation of SPARC V9 architecture • 4-Way SuperScalar, In-order dispatch, out-of-order completion • 100% binary compatibility with previous versions of SPARC systems • Built-in MP support(glueless 4-way and up to 64-way) • Performance Scalability (frequency range: 250-480MHz; L2 cache support: 256KB-16MB) • VIS multimedia accelerating instructions • Error Checking & Correction (ECC) and Parity • State-of-the-art 0.25 micron technology and packaging • Superscalar/Superpipelined high-performance micro-architecture
Specification • Transistors: 5.4M • Process technology: 0.25 micron, 5 metal layers • Die size: 126 mm sq. • Frequency range: 250-480MHz • Core voltage: 1.9V • Power dissipation: 21W @ 400MHz • On-chip instruction cache: 16KB • On-chip data cache: 16KB • L2 cache size: 256KB-16MB • Max. Bandwidth to L2 cache: 5333MB/sec • Max. bandwidth to memory: 1.92GB/sec @ 120MHz UPA
Outstanding memory requests: 3 loads, 2 stores • Software data prefetch: yes • Integer execution units: 4 • Floating-point execution units: 3 • Graphics execution unit: 1 • System Performance (est): 19.6 (SPECint95) and 27.1 (SPECfp95)@ 450MHz • System I/O voltage: 3.3V
Architecture • Prefetch, branch prediction and dispatch unit (PDU) • Up to 12 instructions • 16-Kilobyte instruction cache (I-Cache) • 2-way set associative, 32 byte blocks • Memory management unit (MMU) containing two 64-entry buffers • a 64-entry instruction translation lookaside buffer (iTLB) • a 64-entry data translation lookaside buffer (dTLB) • 44-bit virtual address and a 41-bit physical address • Page sizes : 8-, 64-, and 512 KB and 4 MB • Integer execution unit (IEU) with two arithmetic logic units (ALUs) • Eight register windows • Four sets of global registers(normal, alternate, MMU, and INT)
Load and store unit with a separate address generation adder • Load buffer and store buffer decoupling data accesses from the pipeline • 16-Kilobyte data cache (D-Cache) • Direct-mapped, two 16-byte sub-blocks per line • Floating-point unit (FPU) with independent add, multiply and divide/square root sub-units • Graphics unit (GRU) composed of two independent execution pipelines • External cache (E-Cache) control unit • Memory interface unit • responsible for main memory and I/O accesses
N1 Determines D-cache hit or miss Deferred load enters load buffer R Register Further decodes FP/G instr Accesses reg file C Cache access Access D-cache and TLB Resolves branches N2 Integer waits for FP/G pipeline D Decode Decodes and sends inst to inst buffer G Group Groups and dispatches up to 4 insts Accesses reg file F Fetch Fetches instructions from I-cache E Execute Executes Integer Inst Calculates VA X1 Execution starts X2 Execution continues X3 Execution finishes W Write Writes all results to register files N3 Resolves traps Processor Pipeline
VIS Instruction Set • What is the VIS instruction set? • To accelerate multimedia, image processing, and networking applications by 2x-7x, performing up to 10 operations per cycle • IMPDEP1 instructions (op3 = 110110) • Only 3 percent of the actual chip real estate is devoted to the graphics instructions
VIS • Data Formats Tailored for Graphics • Image components : 8 or 16 bits • 8-bit format : unsigned 8-bit integers in a 32-bit word • 16-bit format : 16-bit signed fixed-point values in a 64-bit word • Intermediate results : 16 or 32 bits
UltraSPARC VIS Set • Pixel expand • Pixel packing • Partitioned add • Partitioned multiply • Partitioned compare • Align • Edge handling • Array addressing • Merge • Pixel distance • Logical
Pixel Expand • Pixel Pack
Align instruction • allows the processor to access pixels that are in the middle of a 64-bit word • Edge instruction • masks off all the unused pixels • Array operation • converts the 3-D fixed-point addresses into a blocked-byte address • typically composed of 24 instructions : one instructions by UltraSPARC
UltraSPARC III and the Future • UltraSPARC-III • 600 MHz • MP systems with over 1,000 processors • 0.18 micron-process technology • The Future Directions • UltraSPARC IV and V • 0.07 micron technology • 1 GHz and 1.5 GHz • Odd-numbered generations: New architecture pipelines Even-numbered generations: Upgrading process technology