420 likes | 548 Views
Thermal-Aware Scheduling in Environmentally Coupled Cyber-Physical Distributed Systems. Qinghui Tang Committee Dr. Sandeep Gupta Dr. Martin Reisslein Dr. Loren Schwiebert Dr. Cihan Tepedelenlioglu Dr. Junshan Zhang. Sponsors. Presentation Outline. Background and Motivation
E N D
Thermal-Aware Scheduling in Environmentally Coupled Cyber-Physical Distributed Systems Qinghui Tang Committee Dr. Sandeep Gupta Dr. Martin Reisslein Dr. Loren Schwiebert Dr. Cihan Tepedelenlioglu Dr. Junshan Zhang Sponsors
Presentation Outline Background and Motivation Unified Thermal-Aware Approach Applications of Thermal-Aware Scheduling Summary of Research Results Conclusions
Background and Motivation What are Cyber-Physical Systems (CPS) Computing systems tightly coupled with physical world Environmentally coupled CPS (ECCPDS).) Applying interference on system itself and the surrounding environment Increasing deployment of distributed system Sensor networks Pervasive computing Grid/cluster computing Existing approaches and methodology did not take into account the interference and interactions among systems and environment Emerging new systems require new methodology and approach Cross disciplinary, more complicated applications
Environmentally Coupled Distributed CPS Terminologies Interference the negative impact to the environment which Self-interference Environmental interference Cross-interference Interference models Quantitative model Temporal model Spatial model Comprehensive model Individual design approach Network/system operation approach Task scheduling
Thermal-Aware ECCPDS We focus on thermal related applications because Correlation between heat dissipation and power consumption (energy efficiency) Correlation between temperature change and reliability Importance of energy efficiency & system lifetime Direct impact on embedded environment Green technology is the new trend Energy efficient and environmentally friendly
Examples of Task Scheduling of Cyber-Physical Systems • Server farms inside data centers • Heat dissipation of one server may heat up other servers • task scheduling in spatial domain • Implanted biomedical sensor networks are used for prosthesis or monitoring • Sensor nodes work in shift to accomplish the assigned task • task scheduling in temporal domain 2 4 3 1
Unified Thermal-Aware Scheduling for ECCPDS (1) • A Cyber-Physical system with N nodes interacting with each others • A scheduler assigns the total task Ctotal into a task vector <C1, C2,…Cn>, resulting in a power consumption vector <P1, P2,…Pn> • Each node • performs a subset of the total task Ctotal • consumes power in certain rate • experiences temperature change Ti depending on other nodes’ power consumption • System objective function W depends on node temperatures (and task assignments) 7
Unified Thermal-Aware Scheduling for ECCPDS (2) Problem Formalization H(*) Fast Thermal Evaluation F(*) Power Profiling G(*) • Problem Statement: • How to divide the total task Ctotal into <C1, C2,…Cn> to minimize/maximize the objective function W • Generalized Approach • Step 1: Profiling the correlation between power consumption, task and temperature rise: function Gi() • estimation, measurement or profiling • Step 2: Characterizing the thermal interference: function Fi() and building fast thermal evaluation method • Step 3: Formalizing the objective function: function H() • Step 4: Exploring design space: find the best scheduling 8
Related Work • Previous research on minimizing thermal interference • focused on individual design approach instead of system operation approach • Used numerical method for thermal evaluation, and was not appropriate for online and real-time scheduling • failed to consider the cross interference applied by neighboring nodes 9
Dissertation Contributions • Proposed a unified methodology and analytical technique of analyzing and designing interference-minimized distributed systems • Verified in two thermal applications • Can be applied to other forms of interference (i.e. sonic • Verified the methodology by applying the approach on two vastly different applications • Built an abstract heat model for fast thermal evaluation and power consumption prediction • Thermal-aware task scheduling for biomedical sensor networks • IEEE Tran. Biomedical Eng. 07 • DCOSS’05 • Minimizing data center cooling energy cost through thermal-aware task placement • IEEE TPDS special issue on Power-aware Parallel and Distributed Computing • DASC’06, ICISIP’06, Cluster’07, COMSWARE’07 10
Dissertation Contributions (cont.) • Thermal-aware task scheduling for biomedical sensor networks • Modeling thermal interference of implanted biosensors • Identifying factors that minimize thermal effects • Time-Space function for fast thermal evaluation • Minimizing data center cooling energy cost through thermal-aware task placement • Homogeneous data center with a single task • Thermal-aware algorithm based interference characterization • Heterogeneous data center with heterogeneous tasks • Multiple tasks with different timing information
Application Example : Task Scheduling of Biosensor Networks 12
Biosensor Scheduling: Overview • Implanted biomedical sensor networks are used for prosthesis or monitoring • Sensor nodes work in shift to accomplish the assigned task • Environment interference should be minimized • It is task scheduling in temporal domain • Task assignments for multiple time slots • Ctotal = 1 • Each slot only one node performing the task 2 2 4 4 3 3 1 1 13
Biosensor Scheduling Step 1: Profiling the correlation • Profiling the correlation between power consumption and temperature rise Gi() with Pennes’ bioheat equation Heat by radiation Heat by power dissipation Heat transfer by conduction Heat accumulated Heat by metabolism Heat transfer by convection 14
Biosensor Scheduling Step 2: Characterizing Thermal Interference F() • Characterizing cross interference between node i and node j as a function of spatial distance and temporal distance 2 4 3 1 Spatial Distance 3 2 1 4 Temporal Distance 15
Biosensor Scheduling Step 3 and 4: Exploring Design Space • The objective function H(): • Searching the best scheduling sequence by using Genetic Algorithm 16
Application Example : Thermal Aware Task Scheduling of Data Center
Problem Statement of Task Scheduling in Data Centers 5 10 5 10 20 5 0 5 • Given a total task C, how to divide it among N server nodes to finish computing task with minimal cooling energy cost ? • Self-Interference and cross-interference lead to the temperature rise of inlet air, should be minimized • Environment interference (room temperature) is not critical • Task scheduling in spatial domain Data Center with 4 servers ? Task {30}
Conceptual overview ofthermal-aware task placement Different task assignments lead to different power consumption distributions Different power consumption distributions lead to different temperature distributions Different temperature distributions lead to different total energy costs Server task distribution Power consumption distribution Temperature distribution Energy cost
Data Center Preliminary: Layout Outlet temperature Tout Inlet temperature Tin Must less than 25C Cold supply temperature Ts 20
Data Center Preliminary: Scheduling vs. Cooling Cost Different demands for cooling capacity Inlet temperature distribution without Cooling Inlet temperature distribution with Cooling Scheduling 1 25C Scheduling 2 25C 21 Minimizing the peak inlet temperature equals to minimizing the cooling cost
Data Center Step 1: Profiling the Correlation Gi() Server Power Consumption Pi Depending on amount of computing task Outlet Airflow Inlet Airflow, a mixture of Supplied cold air and Recirculated hot air 22
Data Center Step 2: Characterizing Cross Interference F() • Heat Recirculation Coefficients • Analytical • Matrix-based • Characterizing process • Running CFD with various power consumption scenario • Calculating recirculation coefficients based on Law of Conservation of Matter and Energy • Using coefficients to predict temperature without running CFD Tin Tsup D P + × = heat distribution powervector inlettemperatures supplied airtemperatures
Benefit: Fast Thermal Evaluation P Extracttemperatures Run CFD simulation (days) Give workload Courtesy: Flometrics D Tsup Tin × + Yieldstemperatures Give workload Compute vector (seconds)
Data Center Step 4: Explore Solutions Homogeneous data center with a single task Naïve algorithms without considering cross interference Thermal-aware algorithm based interference characterization Heterogeneous data center with heterogeneous tasks Multiple tasks with different timing information
Recirculation Coefficients • Consistent with data center observations • Large values are observed along diagonal • Strong recirculation among neighboring servers, or between bottom servers and top servers 1-4 46-50 1-5 1-10 20 40 45 10 50 5 9 4 8 Victims Sources 3 7 2 46 6 1-40 1
Fast Thermal Evaluation Results Thermal Evaluation • Fast thermal evaluation • Acceptable predict error less than normal temperature fluctuation Energy Efficiency Consistently provide optimal or near-optimal energy efficiency • Energy savings by 5%~30% depending on utilization rate
Heterogeneous Data Center with Heterogeneous Tasks 10 10 5 10 5 10 5 10 Tasks {35, 30} 5 10 10 10 20 5 0 5 Data Center with 4 servers Change of solution: Vector to matrix 28 Change of constraints
Multiple Tasks with Different Timing Parameters Data Center with 4 servers Tasks {35, 30} Change of objective function Change of constraints
Conclusion and Future Work • Increasingly tightly coupled Cyber-Physical Systems require new methodology to apply on new applications • Proposed approach • Characterizing complicated interference between systems and embedded environment • Minimizing thermal effects • Real-time online decisions • Future work in biosensor networks • Thermal-aware scheduling for multiple clusters • Cross-cluster interference • Applying interference minimization to coverage and topology applications 30
Conclusion and Future Work (cont.) • Future work in data center management • Overall data center operation cost • Trade-off between cooling cost & computing cost • Hardware reliability model, trade-off between energy cost and hardware cost • Multiple tasks with different priorities and deadlines • Estimation of execution time • Other Challenges in Environmentally Coupled Cyber-Physical Systems • Online characterization • without interrupting normal operation • For the case where it is impossible to conduct test and verification • Unknown environment • Investigate the applicability of using the methodology on other non-thermal interference • Chemical sensors to monitor enzyme reaction • Minimizing the chance of being detected in a hostile environment • Different approaches of modeling interference • For the case where interference can not be measured directly 31
Experience Obtained Verify solution Performance comparison Relax assumption Formalizing problem Explore solutions • Cross disciplinary research problems • Challenging and promising • Incremental Research Approach • Extensive survey to identify existing problems and gaps between existing solutions • Start with simplified system model, gradually relax system assumption and obtain a more realistic one Modeling Interference Characterizing Interference Identify interference source & impact Problem Investigation
System Model Interference cause undesired Temperature rise Heat Exchange System performance depends on the thermal distribution
Characterizing the Interference Function F() • Characterizing the interference applied to neighboring nodes and the environment • Building heat model to characterize • Power consumption of each node • Heat dissipation of each node • Thermal interference to other nodes • Conducting fast thermal evaluation • Replacing traditional numerical method to predict thermal performance in realtime A Task scheduling result Numerical Simulation Fast thermal evaluation 37 Temperature prediction
Application Background What are data centers • Server farms, IT centers, computer rooms Why they are important • Centralized management, powerful computation capabilities • Backbones of Internet Infrastructure Why thermal management is important • Improve reliability • Reduce system down time • Save energy cost !! • $400,000 annually to power a 1,000 volume server-unit data center, then how much for this • More than 40% are cooling cost
Data Center Step 2: Characterizing Cross Interference F() The amount of heat in outlet air: some recirculates to other inlets Recirculation coefficients • Quantified description of recirculation some returns to AC Characterizing process • Running CFD with various power consumption scenario • Calculating recirculation coefficients based on Law of Conservation of Matter and Energy • Using coefficients to predict temperature without running CFD Power Consumption The amount of heat in inlet air consists of cold supply air and recirculated heat
Data Center Step 2: Fast Thermal Evaluation Fast Thermal Evaluation • Based on “Law of Conservation of Energy”, after some mathematical derivation, we have Power Consumptions Supplied cold air Inlet Temperature Recirculation coefficients Constants depends on hardware specifications and constant properties of air
Data Center Step 3: Formalizing the Minimization Problem H() Minimizing the maximal inlet temperature Can be converted into Linear or Non-linear optimization problems Problem Formalization H(*) Power Profiling G(*) Fast Thermal Evaluation F(*) 41
Airflow Inside Data Centers • Observation • Airflow patterns are stable (confirmed through CFD simulations) • Hypothesis • The amount of recirculated heat is stable, can be quantified as recirculation coefficients • Define ij as the percentage of recirculated heat from node i to node j Courtesy Flomerics