Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs

International Symposium on Quality Electronic Design 03/27-29, 2006, San Jose Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs Sri Hari Krishna Narayanan, Mahmut Kandemir, Ozcan Ozturk Embedded Mobile Computing Center (EMC2) The Pennsylvania State University

Introduction to the Problem • Increasing transistor counts and rising clock frequencies leads to increased power dissipation. • Increased scaling coupled with increased power dissipation has lead to increased power density. • Increased power density leads to rising thermal problems which requires solutions.

Solutions to Thermal Issues in multiprocessor environments • Dynamic Thermal Management • Heo et al. ISLPED2003 • Activity Migration between two processors. • Shang et al. Micro 2003 • Communication is routed away from a potential hotspot. • Upon a thermal emergency communication is throttled.

Problems with the current solutions • Repeated suspension of execution or communication leads to performance loss. • So it is beneficial to reduce the number of suspensions. • How? • Reduce the number of thermal emergencies by reducing the power density. • Reduce the density by changing which processors are active and how much computation they perform within certain bounds.

Default Mapping Default (performance oriented) Mapping • Performance oriented • Active processors are close to each other. • Less communication cost. • Higher power density • More thermal emergencies. • We propose to change this mapping into a temperature aware one. Default Code Mapping Module #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Code

Integer Linear Programming Model • Phase 1 • Increases the bounding box of the active processors given a communication cost limit and hence reduces the overall power density. After Phase 1 Initial

Integer Linear Programming Model • Constraints * • The number of active processors remains constant • The amount of extra communication between active processors in the new mapping has to be under the sum of the old communication and the relaxation allowed. • The area of bounding box must be maximized. * Exact mathematical expressions are given in the paper.

Phase1mapping Phase 1 Default (performance oriented) Mapping • Overall density is reduced • Communication cost increased Default Code Mapping Module Overall power density reduced mapping #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ ILP Module Code

Integer Linear Programming Model • Phase 2 • Given the reduced overall power density mapping from phase 1, a new mapping with reduced local power density is generated. After Phase 2 After Phase 1

Integer Linear Programming Model • Constraints * • Each old active processor that has high power density is split. • Each split processor performs same communication as the old processor. • The area of the bounding box remains constant. • The total power spent is within the bouding box is minimized by minimizing the communication path. * Exact mathematical expressions are given in the paper.

Phase 1 Default (performance oriented) Mapping Default Code Mapping Module Overall power density reduced mapping Thermal aware mapping Phase 2 #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ ILP Module Code

#define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Profiling HotSpot + Shutdown HotSpot + Shutdown Implementation • HotSpot • Temperature estimation tool • Developed by Skadron at UVa • T(i+ ) = HS(T(i), floorplan, power,cycles,) • Shutdown • Any processor or router that is too hot • must be turned off to allow cooldown • Cycle times • Chunk sizes • Proc. Energy • Communication • Router Energy

Algorithm • 1. Initially mark processors as being active • 2. While (all execution is not completed) { • 2.a Time_Taken = Time_Taken + 1 • 2.b If a processor was active • 2.b.i. Reduce the chunks that it has to execute by 1 • 2.c Calculate the new current temperature for all processors. • T(i+ ) = HS(T(i), floorplan, power,cycles,) • 2.d If a processor is too hot • 2.d.i. Mark it as inactive • 2.e If a router is too hot • 2.e.i. Mark all processors communicating though it as inactive. • 2.f Determine all the active processors and routers for the next • scheduling step. • } • 3. Return Time_Taken

NoC Multi-core Model • Routers are roughly 1/5th the area of the processors • Processors communicate using x-y routing • Used to estimate the cost of communication

Parameters used

Benchmarks Used

Results – Thermal Emergencies

Results - Performance

Conclusions • Dynamic thermal management leads to suspension of execution. • We propose a novel compiler directed mechanism to reduce occurrences of thermal emergencies. • By reducing the number of thermal emergencies performance is improved.

Thank you!

Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs

Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs

Presentation Transcript

The Power of Priority : NoC based Distributed Cache Coherency

Heat Stroke: Power-Density-Based Denial of Service in SMT

Power-aware NOC Reuse on the Testing of Core-based Systems*

Current Density Aware Power Switch Placement Algorithm for Power Gating Designs

Guarded Power Gating in a Multi-core Setting

Density based Clustering

Parallelizing Applications With a Reduction Based Framework on Multi-Core Clusters

Low Power designs in Memories

Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Compiler Designs and Constructions

The Core SVP compiler

Compiler Designs and Constructions

DAMAN (Directed Assembly in Multi-agent Networks)

Compiler Designs and Constructions

Compiler-Directed instruction cache leakage optimizations

Pattern-Directed Circuit Virtual Partitioning for Test Power Reduction

Compiler Designs and Constructions

Multi-Core Debug Platform for NoC-Based Systems