1 / 12

Niagara: a 32-Way Multithreaded SPARC Processor

Niagara: a 32-Way Multithreaded SPARC Processor. P. Kongetira, K. Aingaran, K.Olokotun Sun Microsystems. Presented by Bogdan Romanescu. Goal. Commercial server applications: High thread level parallelism (TLP) Large numbers of parallel client requests

channing
Download Presentation

Niagara: a 32-Way Multithreaded SPARC Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Niagara: a 32-Way Multithreaded SPARC Processor P. Kongetira, K. Aingaran, K.Olokotun Sun Microsystems Presented by Bogdan Romanescu

  2. Goal • Commercial server applications: • High thread level parallelism (TLP) • Large numbers of parallel client requests • Low instruction level parallelism (ILP) • High cache miss rates • Many unpredictable branches • Frequent load-load dependencies • Power, cooling, and space are major concerns for data centers

  3. Sun’s Solution • UltraSPARC T1 processor • “the highest-throughput and most eco-responsible processor ever created”® • Multicore • Fine-grain multithreading within core • Simple pipelines • Small L1 cache • Shared L2 • Metric: Performance/Watt

  4. Architecture

  5. Sparc pipe • UltraSPARC II style • Single issue 6 stage: F, S, D, E, M, W • Shared units: • L1 $ • TLB • X units • pipe registers • Hazards: • Data • Structural

  6. Integer Register file • One register file / thread • SPARC window: in, out, local registers • Highly integrated cell structure to support 4 threads: • 8 windows of 32 locations / thread • 3 read ports + 2 write ports • Read/write: single cycle latency • 1 Active Window Cell (copy of the architectural set window)

  7. Thread scheduling • Thread selection based on: • Previous long latency instruction in pipe • Instruction type • LRU status • Select & Fetch coupled

  8. Memory • 16 KB 4 way set assoc. I$/ core • 8 KB 4 way set assoc. D$/ core • 3MB 12 way set assoc. L2 $ shared • 4 x 750KB independent banks • 2 cycle throughput, 8 cycle latency • Direct link to DRAM & Jbus • Manages cache coherence for the 8 cores • CAM based directory • Write through • allocate LD • no-allocate ST

  9. Performance

  10. “Home run“ ? • Relatively slow single-thread performance • Poor floating-point performance • Lack of software support ( Sun Fire T2000 does not support Linux or Windows) • Price • Concurrency counterattack • no place as a general-purpose computer running databases • small low-end market segment ? • Niagara II & The “Rock” – multiprocessor & enhanced single thread support

  11. References • [1] P. Kongetira, et al, “A 32-Way Multithreaded SPARC Processor,” IEEE Micro, vol. 25, pp. 21-29, Mar., 2005. • [2] A. S. Leon, et al, “A Power-Efficient High-Throughput 32-Thread SPARC Processor”, ISSCC 2006 , SESSION 5 , PROCESSORS • [3] S. Chaudhry, S. Yip, P. Caprioli and M. Tremblay, “High Performance Throughput Computing” , IEEE Micro, vol. 25, Issue 3, 2005 • [4] http://opensparc.sunsource.net/nonav/opensparct1.html • [5] http://www.sun.com/processors/UltraSPARC-T1/features.xml • [6] http://www.sun.com/servers/coolthreads/t1000/benchmarks.jsp • [7] http://news.com.com/Sun+begins+Sparc+phase+of+server+overhaul/2163-1010_3-5983365.html • [8] http://h71028.www7.hp.com/ERC/cache/280124-0-0-0-121.html

More Related