320 likes | 386 Views
Good suggestions Use more efficient code Train the users Reuse the waste heat. Interesting, but… Use alternative, renewable energy sources Make predictions Do monitoring Repair instead of replace. Homework 5: the 11 th rule.
E N D
Good suggestions Use more efficient code Train the users Reuse the waste heat Interesting, but… Use alternative, renewable energy sources Make predictions Do monitoring Repair instead of replace Homework 5: the 11th rule
Thermal-aware Task placement (spatial scheduling) of Data Centers Overview
Thermal issuesin dense computer rooms (i.e. Data centers, Computer Clusters, Data warehouses) • Heat recirculation • Hot air from the equipment air outlets is fed back to the equipment air inlets • Hot spots • Effect of Heat Recirculation • Areas in the data center with alarmingly high temperature • Consequence • Cooling has to be set very low to have allinlet temperatures in safe operating range Courtesy: Intel Labs
Conceptual overview ofthermal-aware task placement Peak air inlet temperaturedetermines upper bound toCRAC temperature setting Task placement determinestemperature distribution Temperature distributiondetermines the equipmentpeak air inlet temperature CRAC temperature settingdetermines it’s efficiency(Coefficient of Performance) The lower the peak inlet temperaturethe higher the CRAC efficiency Coefficient of Performance(source: HP) bottomline There is a task placement that maximizes cooling efficiency.Find it and use it!
Prerequisites forthermal management • Task profiling • CPU utilization, I/O activity etc • Equipment power profiling • CPU consumption, disk consumption etc • Heat recirculation modeling • Task management technologies • Need for a comprehensive research framework
Thermal management research framework Characterization Characterize the power consumption of a given workload (CPU, memory, disk etc) on a given equipment • Thermal Models • To enable on-line real-time thermal-aware job scheduling • fast (analytical, non CFD based) • non-evasive (machine-learning) Model the thermal impact of multicore systems Thermal-awarejob scheduling On-line job scheduling algorithm to minimize peak air inlet temperature, thus minimizing the cost of cooling. http://impact.asu.edu/ Sandeep GuptaQinghui Tang Tridib Mukherjee Michael Jonas Georgios Varsamopoulos
Power Model and Profiling • Power Consumption is mainly affected by the CPU utilization • Power consumption is linear to the CPU utilization P = a U + b
A simple thermal model • Basic Idea: • We don’t need an extensive CFD model • We only need to know the effect of recirculation at specific points • Express recirculation as “coefficients” N5 Courtesy: Intel Labs N4 N3 N2 N1
Recirculation coefficients:a fast thermal model • Reduce/Simplify the “thermal map” concept to points of interest: equipment air inlets • Can be computed from CFD models/simulations A Matrix Aaij: portion of heatexhausted from node ithat directly goes to node j recirculation coefficients
Linear Thermal Model • Heat Recirculation Coefficients • Analytical • Matrix-based • Properties of model • Granularity at air inlets (discrete/simplified) • Assumes steadiness of air flow Tin Tsup D P + × = heat distribution powervector inlettemperatures supplied airtemperatures
P Benefit: fast thermal evaluation Extracttemperatures Run CFD simulation (days) Give workload (job) Courtesy: Flometrics D Tsup Tin × + Yieldstemperatures Give workload (job) Compute vector (seconds)
Thermal-awareTask Placement Problem Given an incoming task consisting of homogeneous processes, find a placement of the processes to minimize the (increase of) peak inlet temperature Formulation Given a task that requires Ctot servers, a matrix D that describes recirculation, and the power profile parameters a, b : P = a U + b Tin Tsup D U bbb b b bb (a + ) + × = heat distribution inlettemperatures supplied airtemperatures utilizationvector
Thermal-aware task placementSimulation results for data centers • Simulation environment • Small-scale data center • One row is equipped with Dell Poweredge 1955 • The other row is equipped with Dell Poweredge 1855 • Due to the heterogeneity of equipment: • There is a difference between minimizing just cooling cost vs. minimizing total cost • Difference is small but can be larger depending on the data center • We now have to minimize total cost For small loads,the task is assigned on 1955s, therefore the optimization has to sacrifice cooling cost to improve overall power cost
Spatio-Temporal Thermal-aware Job Scheduling Algorithms for (heterogeneous) data centers
Motivation • Past work on spatial-only thermal-aware job scheduling has shown considerable energy savings • Savings from: • Knowing/modeling heat recirculation and controling the server assignment (i.e. spatial scheduling) to minimize it • Adjusting the CRAC thermostat to the highest yet safe setting to save energy
Onto spatio-temporal job scheduling • Data center utilization changes over time • Job scheduling though is mainly a temporal process • Problem: • How to incorporate thermal awareness into the temporal dimension?
Spatio-temporal approaches • Based on approach of XInt • SCINT: • Discretize time as well as space • Formulate a discrete spatio-temporal reservation problem to minimize objective function • Solve using a genetic algorithm • Based on extending FCFS w/ back-filling • FCFS-XInt • FCFS temporal, XInt spatial • FCFS-LRH • FCFS temporal, least-recirculated-heat spatial • Based on approximating SCINT behavior • Running SCINT is very time-consuming • SCINT induces savings by temporally spreading workload to allow more energy-efficent spatial placement. • Approximate using earliest-deadline-first (temporal) with LRH (spatial)
Some challenges… Interference coefficients matrix • The algorithms require a good model of the heat recirculation • Use of the abstract linear heat interference model (ALHI) requires profiling of the heat recirculation, either through measurements or through simulation • The algorithms require a good estimate of the actual execution time • Reservation time (i.e. slack) is not a good estimate, it is almost always a generous over-estimation • Deadline is not specified by the submissions • Use the (submission time + slack) as deadline Slack vs execution time
Submission and execution time Energy consumption of schedules Power profile of schedules
Submission and execution time Energy consumption of schedules Power profile of schedules
Submission and execution time Energy consumption of schedules Power profile of schedules
Submission and execution time Energy consumption of schedules Power profile of schedules
Submission and execution time Energy consumption of schedules Power profile of schedules
Submission and execution time Energy consumption of schedules Power profile of schedules
Conclusions from this work • There exist synergies in spatio-temporal scheduling: • Synergy between temporal smoothing and thermal-aware placement (spatial scheduling) • Synergy between proactivity of spatio-temporal scheduling and “power scheduling” • Near-optimal heuristics are very slow • Fast approximations are preferable
Energy consumption of spatio-temporal job scheduling in a linear cooling environment
Cooling Models • Constant-value cooling (FloVENT) • Tout = b • Linear cooling • Tout = aTin + b • Segmented constant-linear cooling (FloVENT) • Stepwise linear (observed)
Cooling distribution model • Assume a 3-mode heat-extractor cooling system: • Pout=5 KW cooling until Tin=16 • Pout=75 KW cooling until Tin=20 • Pout=250 KW when Tin>26 • Time delay of 10 minutes to fully switch the mode • Return Heat • Total -recirculated heat • Pin = Σ(1-Σaij)Pi • Supplied Heat • Input heat – extracted heat • Psup = Pin - Pout
Results: FCFS-XintCooling Power and Energy • Assume a 3-mode heat-extractor cooling system: • 5 KW cooling until Tin=16, 75 KW cooling until Tin=20, 250 KW when Tin>26
Results: EDF-LRHCooling Power and Energy • Assume a 3-mode heat-extractor cooling system: • 5 KW cooling until Tin=16, 75 KW cooling until Tin=20, 250 KW when Tin>26
Results: SCINTCooling Power and Energy • Assume a 3-mode heat-extractor cooling system: • 5 KW cooling until Tin=16, 75 KW cooling until Tin=20, 250 KW when Tin>26
Conclusions from this work • Data Center energy consumption is increasing • Benefits emerge if viewed as Cyber-Physical Systems • Thermal-aware scheduling • Need to bridge the gap between simulation results and practice • Non-invasive ways to apply modeling methods in real data centers • Use realistic cooling models