230 likes | 319 Views
XX Jornadas de Paralelismo, A Coruña (Spain) – September 17, 2009. Performance Analysis of NUCA Policies for CMPs Using Parsec v2.0 Benchmark Suite. Javier Lira ψ Carlos Molina ф Antonio González λ. ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili
E N D
XX Jornadas de Paralelismo, A Coruña (Spain) – September 17, 2009 Performance Analysis of NUCA Policies for CMPs Using Parsec v2.0 Benchmark Suite Javier Liraψ Carlos Molinaф Antonio Gonzálezλ фDept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spaincarlos.molina@urv.net ψDept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spainantonio.gonzalez@intel.com
Outline • Introduction • Methodology • Analysis of NUCA policies • Bank Placement Policy • Bank Access Policy • Bank Migration Policy • Bank Replacement Policy • Conclusions
Introduction • CMPs have emerged as a dominant paradigm in system design. • Keep performance improvement while reducing power consumption. • Take advantage of Thread-level parallelism. • Commercial CMPs are currently available. • CMPs incorporate larger and shared last-level caches. • Wire delay is a key constraint.
NUCA • Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al.[1]. • NUCA divides a large cache in smaller and faster banks. • Banks close to cache controller have smaller latencies than further banks. Processor [1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02
NUCA Policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy
Outline • Introduction • Methodology • Analysis of NUCA policies • Bank Placement Policy • Bank Access Policy • Bank Migration Policy • Bank Replacement Policy • Conclusions
Methodology • Simulation tools: • Simics + GEMS • CACTI v6.0 • PARSEC v2.0 Benchmark Suite
Baseline NUCA cache architecture 8 cores 256 banks [2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04
Outline • Introduction • Methodology • Analysis of NUCA policies • Bank Placement Policy • Bank Access Policy • Bank Migration Policy • Bank Replacement Policy • Conclusions
Bank Placement Policy • 1B + Static • 16B + Static • 16B + Local
Bank Placement Policy • 1B + Static placement provides fair distribution. • 16B configurations concentrate data in few banks. • Placement and migration policies are strictly correlated.
Outline • Introduction • Methodology • Analysis of NUCA policies • Bank Placement Policy • Bank Access Policy • Bank Migration Policy • Bank Replacement Policy • Conclusions
Bank Access Policy • Serial • 9P + 7P • Parallel
Bank Access Policy • Power efficiency vs. Perfomance. • 9P + 7P is a trade-off, but it is still far from the performance potencial. • These results suggest the broad area of improvement on this policy.
Outline • Introduction • Methodology • Analysis of NUCA policies • Bank Placement Policy • Bank Access Policy • Bank Migration Policy • Bank Replacement Policy • Conclusions
Bank Migration Policy • Static • Gradual + Swapping • Gradual + Replication
Bank Migration Policy • Replication reduces the effective size of the cache. • Migration approaches concentrate data blocks in few banks. • Static approach fairly distribute data blocks in the whole cache. • Placement and migration policies are strictly correlated.
Outline • Introduction • Methodology • Analysis of NUCA policies • Bank Placement Policy • Bank Access Policy • Bank Migration Policy • Bank Replacement Policy • Conclusions
Bank Replacement Policy • Zero-copy • One-copy • Last Bank Last Bank
Bank Replacement Policy • Giving a second chance to evicted data blocks provides significant performance gain. • Last Bank is a promising mechanism, but this is restricted by its small size. • Further exploration on this policy is required.
Outline • Introduction • Methodology • Analysis of NUCA policies • Bank Placement Policy • Bank Access Policy • Bank Migration Policy • Bank Replacement Policy • Conclusions
Conclusions • NUCA is characterized by four policies. • NUCA policies are related. • Static placement with no-migration: Good trade-off. • Bank placement and bank migration are strictly correlated. • Bank access: Power efficiency vs. Performance. • Bank replacement: ↑ Performance (unbounded last bank). • Still room for improvement in all policies.
Performance Analysis of NUCA Policies for CMPs Using Parsec v2.0 Benchmark Suite Questions?