1 / 17

Towards Optimal Sensor Placement for Hot Server Detection in Data Centers

Towards Optimal Sensor Placement for Hot Server Detection in Data Centers. Xiaodong Wang , Xiaorui Wang, Guoliang Xing, Jinzhu Chen, Cheng-Xian Lin. and Yixin Chen. Outline. Introduction Related work Hot server detection problem CFD-guided sensor placement Evaluation Summary.

brad
Download Presentation

Towards Optimal Sensor Placement for Hot Server Detection in Data Centers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Optimal Sensor Placement for Hot Server Detection in Data Centers Xiaodong Wang, Xiaorui Wang, Guoliang Xing, Jinzhu Chen, Cheng-Xian Lin. and Yixin Chen.

  2. Outline • Introduction • Related work • Hot server detection problem • CFD-guided sensor placement • Evaluation • Summary

  3. Introduction • Thermal monitoring is important in data center operation: • Overheating is harmful to data center. • Malfunction of hardware components. • Server shut down. • Excessive cooling energy is consumed. • Operation of cooling systems is not efficient enough. • Excessive energy consumption required by overcooling. • To have precise hot server detection: • Precise hot server detection can guide air conditioning system. • Thermal dynamics in data center need to be better studied. • Place more sensors to increase thermal visibility.

  4. Related Work • Studies of thermal profile • [Choi et al. HPCA ‘07 ] studied thermal profile of a rack. • [Patel et al. IPACK ‘01] studied the air temperature specification of a data center in normal condition. • Improve thermal monitoring with sensor networks: • [Liang et al. SenSys ‘09] deployed sensor networks in data center to achieve a high-fidelity thermal monitoring. • [Moore et al. USENIX ‘05] and [Bash et al. USENIX, ’07] proposed to allocate server job and workload based on thermal readings from sensor networks. Not used to guide sensor placement. How to effectively place sensors?

  5. Hot Server Detection Problem • Problem to solve: • To intelligently place sensors for a maximum hot server detection probability. • Problem formulation: • Given M locations to monitor and N (N<M) sensors to use: Subject to the constraint: • : Detection probability of overheating at monitored location • : False alarm rate of overheating at monitored location

  6. Problem Solving Architecture • Overheating data center analysis. • Analyze the data center in overheating condition. • Obtain the temperature distribution for overheating cases. • Find the sensor placement solution. • Sensor readings usually are corrupted by noise. • Sensors need to collaboratively make hot server detection decision (data fusion) Overheating Analysis Sensor Placement Data fusion & placement algorithm Temperature Interpolation CFD Modeling

  7. Overheating Data Center Analysis • Computational Fluid Dynamics (CFD) model for overheating data center • A finite volume method. • Example: Datacenter physical model temperature distribution CFD Modeling Temperature Interpolation Power consumption …… CRAC settings A/C in A/C In/Out A/COut A/C Out A/C Out

  8. Overheating Data Center Analysis (cont’d) • Spatial temperature interpolation • Results from CFD are discrete in locations. • Granularity of CFD modeling is a tradeoff between accuracy and computational complexity. • Inverse Distance Weighting (IDW) interpolation: • Weighted average of the available temperature data. • Optimize sensor placement based on the overheating analysis • To achieve a maximized average overheating server detection probability.

  9. CFD Guided Sensor Placement • Sensor placement with existing solver: • To decide the x, y, and z variables of each sensor location. • Constrained Simulated Annealing (CSA) • An existing solver with 3N variables. • Computational time increases exponentially. • Lightweight Sensor Placement (LSP): • Only searches placement solution at areas with clustered racks. • A greedy algorithm, which adds sensors one by one. • Search space and computational time are significantly reduced.

  10. Simulation Setup • Experiment environment setup • CFD software packages: Gambit and Fluent • Server room size: 32m x 7m x 3m • 13 racks in the server room. • 4 monitored locations each rack (52 locations in total) • 14,400 watts power consumption for each overheating rack. • CRAC settings are collected by external sensor.

  11. Simulation Results • Different sensor numbers • Baselines: • Uniformly Random, current practice. • CFD+ proportional. • Using more sensors increases the detection probability. • CFD+LSP (our solution) is closest to the optimal solution

  12. Simulation Results (cont’d) • Different temperature threshold • Detection probability decreases when temperature threshold increases • Different fusion range: • A proper fusion range can increase the detection probability.

  13. Hardware Experiment in a Server Room • Setup: • A small cluster of two racks is used. • Overheating is created by a heater • Results:

  14. Summary • We place sensors intelligently in data centers • To reach a maximum hot server detection probability • Various overheating conditions are studied to guide sensor placement • CFD is used to analyze data centers under overheating condition. • Future consideration: • Integrate with thermal control approaches. • More detail CFD modeling.

  15. Q&A Thank You! • Acknowledgement • NSF CAREER Award CNS-0845390 • NSF under Grants CNS-0720663, CNS-0915959, CCF-1017336, and CNS-0954039 • Microsoft Research under a Power-Aware Computing Award

  16. Appendix A • Sensor readings usually are corrupted by noise. • Overheating scenario detected when the measured temperature is larger than the threshold. • False alarm happens when the overheating detection is intrigued by noise only.

  17. Appendix B • Rack clustering • The closest distance of two monitored locations in two different clusters is larger than 2R. • Inverse Distance Weighting (IDW) interpolation:

More Related