200 likes | 552 Views
CS 433 – Computer System Organization Manish Agrawal Brett Daniel Josh Smith. UltraSPARC T2 Sun Microsystems. Overview of the UltraSPARC T2. Multi-threaded(8), multi-core(8) CPU Frequency ranges from 900MHz to 1.4GHz Powered by less than 95 watts (nominal) with less than 2 watts per thread
E N D
CS 433 – Computer System Organization Manish Agrawal Brett Daniel Josh Smith UltraSPARC T2Sun Microsystems
Overview of the UltraSPARC T2 • Multi-threaded(8), multi-core(8) CPU • Frequency ranges from 900MHz to 1.4GHz • Powered by less than 95 watts (nominal) with less than 2 watts per thread • Integrated • 10 Gb Ethernet networking • PCI Express I/O expansion • FPU and cryptographic processing units per core
History • Codename Niagara2 • Member of SPARC family • 2 previous multi-core processors • UltraSPARC IV • UltraSPARC IV+ • UltraSPARC T1 (first multi-core and multi-threaded) • Released 14 November 2005 • 4, 6, or 8 cores with 4 threads each • UltraSPARC T2 Released 7 August 2007 • Now 8 threads per core (instead of 4)
Motivation • Instead of optimizing each core, overall goal was running as many concurrent threads as possible maximizing and utilizing each core’s pipeline • Each core is less complex than those of current high end processor, allowing 8 cores to fit on the same die. • Does not feature out-of-order execution, or a sizable amount of cache • Each core is a barrel processor
Components • 8 Fully pipelined FPUs • 8 SPUs • 2 integer ALUs per core, each one shared by a group of four threads • 4MB L2 Cache (8-banks, 16-way associative) • 8 KB data cache and 16 KB instruction cache • Two 10Gb Ethernet ports and one PCIe port Source: http://www.sun.com/processors/UltraSPARC-T2/datasheet.pdf
Chip Source: http://www.opensparc.net/images/stories/t2/ultrasparc-t2-layout.png
For a single thread • Memory is THE bottleneck to improving performance • Commercial server workloads exhibit poor memory locality • Only a modest throughput speedup is possible by reducing compute time • Conventional single-thread processors optimized for ILP have low Utilizations With many threads • It’s possible to find something to execute every cycle • Significant throughput speedups are possible • Processor utilization is much higher Source: Golla R, „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct. 2006, http://www.opensparc.net/pubs/preszo//06/04-Sun-Golla.pdf
Engineering Solutions • • Goals of the T2 project were: • Double UltraSparc T1's throughput and throughput/watt • Improve UltraSparc T1's FP single-thread (T1 was unable to handle workloads with more than 1-3% FP instructions) • throughput performance • Minimize required area for these improvements • • Considered doubling number of UltraSparc T1 cores • 16 cores of 4 threads each • Takes too much die area • No area left for improving FP performance
Core Architecture Source: http://realworldtech.com/page.cfm?ArticleID=RWT090406012516&p=2
Core Architecture Source: http://blogs.sun.com/sprack/resource/N2_Announce_Breakout_final.pdf
Fetch Cache Pick Decode Execute Mem Bypass W Fetch Cache Pick Decode Execute Fx1 . . . Fx5 FB FW Efficient in-order single issue pipeline • Eight-stage integer pipeline • Pick is for selecting 2 threads for execution (Added this stage for T2) • In the bypass stage, the load/store unit (LSU) forwards data to the integer register files (IRFs) with sufficient write timing margin. All integer operations pass through the bypass stage. • 12-stage floating point pipeline • 6-cycle latency for dependent FP ops • Integer multiplies are pipelined between different threads. Integer multiplies block within the same thread. • Integer divide is a long latency operation. Integer divides are not pipelined between different threads.
“Server on a chip” • Two 10/1 Gigabit ethernet ports • Integrated PCI-Express • Embedded cryptography http://www.podtech.net/home/1293/niagara-2-server-on-a-chip/
Comparison Against AMD Opteron • 4 cores max • Allows multiprocessors • “Hypertransport” between cores • Shared execution units
Comparison Against Intel Core • 4 cores6 in development8+ in “Nehalem” • Allows multiprocessors • Shared FSB
OpenSPARC • Open source release under GNU GPL • Verilog, verification/tests, simulation/modeling tools • ISA specification • http://www.opensparc.net/ "We truly believe OpenSparc will blossom in the future because it is open." Naxin Zhang, Polaris Micro
Future • Niagra III: “Victoria Falls” • "Pushing up threads and cores" • Retain simplicity: In-order processing • Target multiprocessor servers
http://www.sun.com/processors/ UltraSPARC-T2/gallery/index.xml?p=1&s=1 Video
Sources • http://www.sun.com/processors/UltraSPARC-T2/ • http://www.opensparc.net/ • http://www.opensparc.net/pubs/preszo/06/04-Sun-Golla.pdf • http://realworldtech.com/page.cfm?ArticleID=RWT090406012516 • http://www.news.com/2100-1006_3-6127137.html • http://www.news.com/2100-7344_3-6183562.html