Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology

Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY 14623 xi.he@mail.rit.edu THERMAL-AWARE RESOURCE MANAGEMENT FRAMEWORK

Outline • Introduction • Motivation • Thermal-aware Resource Management Framework • Motivational Examples • System Model and Problem Definition • Thermal-aware Task Scheduling Algorithm • Conclusion

Introduction Distributed Collaborative Experiment

61 billion kilowatt-hours of power in 2006, 1.5 percent of all US electricity use costing around $4.5 billion. Energy usage doubled between 2000 and 2006. Energy usage will double again by 2011[1]. 61 billion kilowatt-hours of power in 2006, 1.5 percent of all US electricity use costing around $4.5 billion. [1] http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf Introduction

Introduction Job Scheduling Middleware Level Virtual Machine Scheduling Virtualization Software Level Dynamic Voltage Scaling Hardware Level Dynamic Frequency Scaling Cooling System Data Center Level

Motivation • Why thermal-aware resource management framework? • To allow end users easily collaborate with each other and get access to remote resources. • To implement Green Computing. • To monitor temperature situation in Data Center.

Architecture Overview

Different types of task-temperature profiles Motivational Examples

Task-temperature profile (Buffalo Data Center) Motivational Examples 9

Motivational Examples job1=(0,2,20,f(job1)) job2=(0,1,40,f(job2)) node1=40C node1=40C job1node2 node2=32C job1node4 node2=40C node3=34C job2node3 node3=40C node4=32C node4=40C max=40C job1node1 σ=0 job1node2 job2node3 node1=48C node2=40C node3=40C Node4=32C Max=48C Σ=5.6

Where, nodei indicates ith node in the data center; Each node has a temperature-time profile that indicates the node’s temperature value over time. System Model

Where, tstart indicates the starting time of job; The job needs nodenum processors and lasts texe; ftemp(t) is a function caused by the execution of the job based on the execution time of the job. System Model

Given a set of jobs. Find an optimal schedule to assign each job to the nodes to minimize computing nodes’ temperature deviation. Where, ΔTemp is the temperature increase that jobk causes. Problem Definition

We use standard deviation as the metric for measuring the temperature distribution. Problem Definition

Algorithm

Algorithm • Select the node which has the lowest “current” temperature. • Sort jobs in descending order of the temperature rise they • caused. • For each job • Assign the job to the selected node. • Update the node’s temperature-time profile. • Select the node which has the lowest “current” temperature. • End For • If a node’s temperature exceed the threshold, don’t choose it in the next round and let it cool down.

Experiment Task temperature profile Temperature Execution Time(s) 17

Experiment iCore7 cooling profile Temperature Time(s) 18

Result N indicates the number of job groups M indicated the number of jobs in each group

Related Work • In [1], [2], power reduction is achieved by the power- aware task scheduling on DVS-enabled commodity systems which can adjust the supply voltage and support multiple operating points. • [1] K. H. Kim, R. Buyya, and J. Kim, “Power aware scheduling of bag-of- tasks applications with deadline constraints on dvs-enabled clusters,” in CCGRID, 2007, pp. 541–548. • [2] R. Ge, X. Feng, and K. W. Cameron, “Performance-constrained distributed dvs scheduling for scientiﬁc applications on power-aware clusters,” in SC, 2005, p. 34.

Related Work • In [3], [4] thermodynamic formulation of steady state hot spots and cold spots in data centers is examined and based on the formulation several task scheduling algorithms are presented to reduce the cooling energy consumption. • [3] Q. Tang, S. K. S. Gupta, and G. Varsamopoulos, “Thermal-aware task scheduling for data centers through minimizing heat recirculation,” in CLUSTER, 2007, pp. 129–138. • [4] J. D. Moore, J. S. Chase, P. Ranganathan, and R. K. Sharma, “Making scheduling ”cool”: Temperature-aware workload placement in data centers,” in USENIX Annual Technical Conference, General Track, 2005, pp. 61–75.

CONCLUSION My accomplishment in the research: Grid computing and Cloud computing literature review Make an analyzing study on Buffalo data center operation. Scheduling algorithms literature review

Conclusion • A novel framework to solve resource management problem. • A thermal-aware task scheduling for data center, which will save a lot of cooling energy cost. • Future work • Investigate other thermal characteristic of data centers. • Continue the development of thermal-aware resource management framework. 24

PUBLICATION G. von Laszewski, F. Wang, A. Younge, X. He, Z. Guo, and M. Pierce, “Cyberaide javascript: A javascript commodity grid kit,” in GCE08 at SC’08. Austin, TX: IEEE, Nov. 16 2008. [Online]. Available: http://cyberaide.googlecode.com/svn/trunk/papers/ 08- javascript/vonLaszewski- 08- javascript.pdf G. von Laszewski, A. Younge, X. He, K. Mahinthakumar, and L. Wang, “Experiment and workﬂow management using cyberaide shell,” in 4th International Workshop on Workﬂow Systems in e-Science (WSES 09) in conjunction with 9th IEEE International Symposium on Cluster Computing and the Grid. IEEE, 2009.

Appendix 26

Appendix 27

Appendix 28

Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology