170 likes | 286 Views
Towards Optimal Sensor Placement for Hot Server Detection in Data Centers. Xiaodong Wang , Xiaorui Wang, Guoliang Xing, Jinzhu Chen, Cheng-Xian Lin. and Yixin Chen. Outline. Introduction Related work Hot server detection problem CFD-guided sensor placement Evaluation Summary.
E N D
Towards Optimal Sensor Placement for Hot Server Detection in Data Centers Xiaodong Wang, Xiaorui Wang, Guoliang Xing, Jinzhu Chen, Cheng-Xian Lin. and Yixin Chen.
Outline • Introduction • Related work • Hot server detection problem • CFD-guided sensor placement • Evaluation • Summary
Introduction • Thermal monitoring is important in data center operation: • Overheating is harmful to data center. • Malfunction of hardware components. • Server shut down. • Excessive cooling energy is consumed. • Operation of cooling systems is not efficient enough. • Excessive energy consumption required by overcooling. • To have precise hot server detection: • Precise hot server detection can guide air conditioning system. • Thermal dynamics in data center need to be better studied. • Place more sensors to increase thermal visibility.
Related Work • Studies of thermal profile • [Choi et al. HPCA ‘07 ] studied thermal profile of a rack. • [Patel et al. IPACK ‘01] studied the air temperature specification of a data center in normal condition. • Improve thermal monitoring with sensor networks: • [Liang et al. SenSys ‘09] deployed sensor networks in data center to achieve a high-fidelity thermal monitoring. • [Moore et al. USENIX ‘05] and [Bash et al. USENIX, ’07] proposed to allocate server job and workload based on thermal readings from sensor networks. Not used to guide sensor placement. How to effectively place sensors?
Hot Server Detection Problem • Problem to solve: • To intelligently place sensors for a maximum hot server detection probability. • Problem formulation: • Given M locations to monitor and N (N<M) sensors to use: Subject to the constraint: • : Detection probability of overheating at monitored location • : False alarm rate of overheating at monitored location
Problem Solving Architecture • Overheating data center analysis. • Analyze the data center in overheating condition. • Obtain the temperature distribution for overheating cases. • Find the sensor placement solution. • Sensor readings usually are corrupted by noise. • Sensors need to collaboratively make hot server detection decision (data fusion) Overheating Analysis Sensor Placement Data fusion & placement algorithm Temperature Interpolation CFD Modeling
Overheating Data Center Analysis • Computational Fluid Dynamics (CFD) model for overheating data center • A finite volume method. • Example: Datacenter physical model temperature distribution CFD Modeling Temperature Interpolation Power consumption …… CRAC settings A/C in A/C In/Out A/COut A/C Out A/C Out
Overheating Data Center Analysis (cont’d) • Spatial temperature interpolation • Results from CFD are discrete in locations. • Granularity of CFD modeling is a tradeoff between accuracy and computational complexity. • Inverse Distance Weighting (IDW) interpolation: • Weighted average of the available temperature data. • Optimize sensor placement based on the overheating analysis • To achieve a maximized average overheating server detection probability.
CFD Guided Sensor Placement • Sensor placement with existing solver: • To decide the x, y, and z variables of each sensor location. • Constrained Simulated Annealing (CSA) • An existing solver with 3N variables. • Computational time increases exponentially. • Lightweight Sensor Placement (LSP): • Only searches placement solution at areas with clustered racks. • A greedy algorithm, which adds sensors one by one. • Search space and computational time are significantly reduced.
Simulation Setup • Experiment environment setup • CFD software packages: Gambit and Fluent • Server room size: 32m x 7m x 3m • 13 racks in the server room. • 4 monitored locations each rack (52 locations in total) • 14,400 watts power consumption for each overheating rack. • CRAC settings are collected by external sensor.
Simulation Results • Different sensor numbers • Baselines: • Uniformly Random, current practice. • CFD+ proportional. • Using more sensors increases the detection probability. • CFD+LSP (our solution) is closest to the optimal solution
Simulation Results (cont’d) • Different temperature threshold • Detection probability decreases when temperature threshold increases • Different fusion range: • A proper fusion range can increase the detection probability.
Hardware Experiment in a Server Room • Setup: • A small cluster of two racks is used. • Overheating is created by a heater • Results:
Summary • We place sensors intelligently in data centers • To reach a maximum hot server detection probability • Various overheating conditions are studied to guide sensor placement • CFD is used to analyze data centers under overheating condition. • Future consideration: • Integrate with thermal control approaches. • More detail CFD modeling.
Q&A Thank You! • Acknowledgement • NSF CAREER Award CNS-0845390 • NSF under Grants CNS-0720663, CNS-0915959, CCF-1017336, and CNS-0954039 • Microsoft Research under a Power-Aware Computing Award
Appendix A • Sensor readings usually are corrupted by noise. • Overheating scenario detected when the measured temperature is larger than the threshold. • False alarm happens when the overheating detection is intrigued by noise only.
Appendix B • Rack clustering • The closest distance of two monitored locations in two different clusters is larger than 2R. • Inverse Distance Weighting (IDW) interpolation: