1 / 22

C-Oracle: Predictive Thermal Management for Data Centers

C-Oracle: Predictive Thermal Management for Data Centers. Luiz Ramos Ricardo Bianchini HPCA 2008. Motivation (1/4). Server clusters in data centers Higher power densities  higher temperatures Expensive cooling Thermal emergencies Failed fans or air conditioners

benita
Download Presentation

C-Oracle: Predictive Thermal Management for Data Centers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C-Oracle: Predictive Thermal Management for Data Centers Luiz Ramos Ricardo Bianchini HPCA 2008

  2. Motivation (1/4) • Server clusters in data centers • Higher power densities  higher temperatures • Expensive cooling • Thermal emergencies • Failed fans or air conditioners • Poor cooling or air distribution • Hot spots • Brownouts • Component reliability decreases • Unpredictable behaviors or failures • Can impact system performance and availability

  3. Motivation (2/4) • Hardware-level thermal management (TM) • Disregards high-level information • E.g. CPU shutdown mechanism • Unnecessary performance loss • Software TM policies • More sophisticated reactions to emergencies • E.g. reduce load on “hot server” in a datacenter • Example: Freon for Internet services [ASPLOS’06]

  4. Motivation (3/4) Disable reaction(Restore load) Enable reaction(Reduce load) Tcpu Tcpu Tdisk Tdisk (HOT!) • Freon • Move load away from “hot server” • Feedback control and admission control W1 Server software tempd Front-end node Web requests Server 1 Load-balancing software W2 admd Server software tempd Server 2

  5. Motivation (4/4) • Need for prediction • Single pre-defined reaction and set of parameters • Severe: performance loss, new emergencies • Mild: take too long or not be effective • Our approach • Predict behavior of potential TM reactions online • Selects the best reaction • C-Oracle (Celsius-Oracle)

  6. Outline • Motivation • C-Oracle • Overview • Design • Predictive Policies • Experimental Results • Related Work • Conclusions

  7. C-Oracle Overview C-Oracle 50% CPU frequency reduction 25% load reduction 1) Prediction kept until expiration2) Regularly checked 1) Decision algorithm2) Check prediction status TM system Summary of each prediction Server software In the next 5 minutesa) Decrease load by 25%? b) Decrease CPU frequency by 50%? Server 1 (HOT!)

  8. C-Oracle Architecture Proxy Oracle predicted utilizations Models of TM policies Oracle driver Machinethermalmodels Solver predictedtemperatures Request handler prediction requests predictedtemperatures andperformance real utilizations and temperatures Monitor Monitor Reactions +Decision Alg. Reactions +Decision Alg. … Server software Server software Server 1 Server N

  9. C-Oracle Details • Oracle’s thermal model (in Solver) • Conservation of energy • Newton’s law of cooling • Power(utilization) • Energy equivalent • Heat capacity • Model of TM policies (in Proxy) • Model based on actual policy code • Policy designer uses primitives from Proxy’s library • E.g. read temperature, change load distribution

  10. Reactions and Decision Algorithm • Base TM policies for Internet services • Freon • LiquidN2 (new) • LiquidN2 • Problem with Freon: assumes control over distribution • Services with session state (e.g., shopping cart) • Slow down hot server (DVFS) • Least-connections • Feedback control and admission control

  11. Reactions and Decision Algorithm • Reactions = policies + parameters • CFreon: Freon using C-Oracle • Weak: moves load away from hot servers • Strong: 6x weak reaction • CLiquidN2: LiquidN2 using C-Oracle • Weak: reduce CPU frequency of hot servers • Strong: 4x weak reaction • Decision algorithm • No server shutdown • Lowest temperature after 5 min • No request drop

  12. Outline • Motivation • C-Oracle • Overview • Design • Predictive Policies • Experimental Results • Related Work • Conclusions

  13. CFreon Results (1/3) • Experimental setup • Single-tier Web service (no session state) • Workload is a sequence of peaks and valleys • 1 front-end (LVS) and 4 servers (Apache) • Mercury emulates temperatures [ASPLOS’06]

  14. CFreon Results (2/3) Strong reaction Weak reaction 1st emergency 2nd emergency Thigh More effective TM and good accuracy

  15. CFreon Results (3/3) Accurate predictions of average utilization, avoids request drops

  16. CLiquidN2 Results (1/3) • Experimental setup • Three-tier auction service • Session state (auctions of interest) stored in the 2nd tier • Apache (2), Tomcat (2), MySQL (1) tiers • 1 LVS node – distributes load between tiers • CPU has 8 DVFS steps – 2.8GHz to 350MHz

  17. CLiquidN2 Results (2/3) Strong reaction Weak reaction Thigh CPU frequency set to 75%

  18. CLiquidN2 Results (3/3)

  19. Outline • Motivation • C-Oracle • Overview • Design • Predictive Policies • Experimental Results • Related Work • Conclusions

  20. Related Work • TM policies for data centers • For thermal emergencies • For normal operation • TM predictions • Offline evaluation • For batch systems, reducing cooling costs • Our work • For thermal emergencies • Online predictions • LiquidN2: DVFS + request distribution + adm control

  21. Conclusions • LiquidN2 useful for services with session state • C-Oracle allows the prediction of TM reactions • Predictive policies make best available decisions

  22. Thank you!

More Related