500 likes | 724 Views
The past, present, and future of Green Computing. Kirk W. Cameron SCAPE Laboratory Virginia Tech. Enough About Me. Associate Professor Virginia Tech Co-founder Green500 Co-founder MiserWare Founding Member SpecPower Consultant for EPA Energy Star for Servers
E N D
The past, present, and future ofGreen Computing Kirk W. Cameron SCAPE Laboratory Virginia Tech SCAPE Laboratory Confidential
Enough About Me • Associate Professor Virginia Tech • Co-founder Green500 • Co-founder MiserWare • Founding Member SpecPower • Consultant for EPA Energy Star for Servers • IEEE Computer “Green IT” Columnist • Over $4M Federally funded “Green” research • SystemG Supercomputer
What is SCAPE? • Scalable Performance Laboratory • Founded 2001 by Cameron • Vision • Improve efficiency of high-end systems • Approach • Exploit/create technologies for high-end systems • Conduct quality research to solve important problems • When appropriate, commercialize technologies • Educate and train next generation HPC CS researchers SCAPE Laboratory Confidential
The Big Picture (Today) • Past: Challenges • Need to measure and correlate power data • Save energy while maintaining performance • Present • Software/hardware infrastructure for power measurement • Intelligent Power Management (CPU Miser, Memory Miser) • Integration with other toolkits (PAPI, Prophesy) • Future: Research + Commercialization • Management Infra-Structure for Energy Reduction • MiserWare, Inc. • Holistic Power Management
Prehistory 1882 - 2001 • Embedded systems • General Purpose Microarchitecture • Circa 1999 power becomes disruptive technology • Moore’s Law + Clock Frequency Arms Race • Simulators emerge (e.g. Princeton’s Wattch) • Related work continues today (CMPs, SMT, etc)
Server Power 2002 • IBM Austin • Energy-aware commercial servers [Keller et al] • LANL • Green Destiny [Feng et al] • Observations • IBM targets commercial apps • Feng et al achieve power savings in exchange for performance loss
Earth Simulator12 Megawatts $800,000 per yearper megawatt! High-speed train10 Megawatts $4,000/yr $12,000/yr $680,000/yr $8 million/yr $9.6 million/yr Intel ASCI Red.850 Megawatts Residential A/C.015 Megawatts Conventional Power Plant300 Megawatts TM CM-5 .005 Megawatts HPC Power 2002 • My observations • Power will become disruptive to HPC • Laptops outselling PC’s • Commercial power-aware not appropriate for HPC
HPPAC Emerges 2002 • SCAPE Project • High-performance, power-aware computing • Two initial goals • Measurement tools • Power/energy savings • Big Goals…no funding (risk all startup funds)
Cluster Power 2003 - 2004 • IBM Austin • On evaluating request-distribution schemes for saving energy in server clusters, ISPASS ‘03 [Lefurgy et al] • Improving Server Performance on Trans Processing Workloads by Enhanced Data Placement. SBAC-PAD ’04 [Rubio et al] • Rutgers • Energy conservation techniques for disk array-based servers. ICS ’04 [Bianchini et al] • SCAPE • High-performance, power-aware computing, SC04 • Power measurement + power/energy savings
Power/Energy Profiling Data Baytech Management unit Baytech Powerstrip Multi-meter BaytechPower Strip AC AC Power from outlet Multi-meter Multi-meter Multi-meter DC Singlenode Data Log DC Power from power supply Multi-metercontrol Data Analysis MM Thread MM Thread MM Thread Multi-meter Control Thread Data Repository DVScontrol Applications Microbenchmarks PowerPack libraries (profile/control) DVS Thread DVS Thread DVS Thread DVS Control Thread PowerPack Measurement 2003 - 2004 Scalable, synchronized, and accurate. Hardware power/energy profiling Data collection High-performancePower-aware Cluster Software power/energy control
PowerPack Framework(DC Power Profiling) If node .eq. root then call pmeter_init (xmhost,xmport) call pmeter_log (pmlog,NEW_LOG)endif <CODE SEGMENT> If node .eq. root then call pmeter_start_session(pm_label)endif <CODE SEGMENT> If node .eq. root then call pmeter_pause() call pmeter_log(pmlog,CLOSE_LOG) call pmeter_finalize()endif Multi-meters + 32-node Beowulf
Power Profiles – Single Node • CPU is largest consumer of power typically (under load)
Power Profiles – Single Node Power Consumption for Various Workloads CPU-bound memory-bound network-bound disk-bound
NAS PB FT – Performance Profiling compute reduce(comm) compute all-to-all(comm) About 50% time spent in communications.
One FFT Iteration SCAPE Laboratory Confidential
Intuition confirmed 2005 - Present
HPPAC Tool Progress 2005 - Present • PowerPack • Modularized PowerPack and SysteMISER • Extended analytics for applicability • Extended to support thermals • SysteMISER • Improved analytics to weigh tradeoffs at runtime • Automated cluster-wide, DVS scheduling • Support for automated power-aware memory
Predicting CPU Power 2005 - Present
Predicting Memory Power 2005 - Present
Correlating Thermals BT 2005 - Present
Correlating Thermals MG 2005 - Present SCAPE Laboratory Confidential
Tempest Results FT 2005 - Present
SysteMISER 2005 - Present • Our software approach to reduce energy • Management Infrastructure for Energy Reduction • Power/performance • measurement • prediction • control The Heat Miser.
Power-aware DVS scheduling strategies 2005 - Present • CPUSPEED Daemon • [example]$ start_cpuspeed • [example]$ mpirun –np 16 ft.B.16 • Internal scheduling • MPI_Init(); • <CODE SEGMENT> • setspeed(600); • <CODE SEGMENT> • setspeed(1400); • <CODE SEGMENT> • MPI_Finalize(); • External Scheduling • [example]$ psetcpuspeed 600 • [example]$ mpirun –np 16 ft.B.16 NEMO & PowerPack Framework for saving energy
Normalized Energy and Delay with CPU MISER for FT.C.8 normalized delay 1.20 normalized energy 1.00 0.80 0.60 0.40 0.20 0.00 auto 600 800 1000 1200 1400 CPU MISER CPU MISER Scheduling (FT) 2005 - Present 36% energy savings, less than 1% performance loss See SC2004, SC2005 publications.
Where else can we save energy? 2005 - Present • Processor – DVS • Where everyone starts. • NIC • Very small portion of systems power • Disk • A good choice (our future work) • Power-supply • A very good choice (for a EE or ME) • Memory • Only 20-30% of system power, but…
The Power of Memory 2005 - Present
Memory Management Policies 2005 - Present Dynamic Default Static Memory MISER = Page Allocation Shaping + Allocation Prediction + Dynamic Control
Memory MISER Evaluation of Prediction and Control 2005 - Present Prediction/control looks good, but are we guaranteeing performance?
Memory MISER Evaluation of Prediction and Control 2005 - Present Stable, accurate prediction using PID controller. But, what about big (capacity) spikes?
Memory MISER Evaluation of Prediction and Control 2005 - Present Memory MISER guarantees performance in “worst” conditions.
Memory MISER Evaluation Energy Reduction 2005 - Present 30% total system energy savings,less than 1% performance loss
SystemG Stats • 325 Mac Pro Computer nodes, each with two 4-core 2.8 gigahertz (GHZ) Intel Xeon Processors. • Each node has eight gigabytes (GB) random access memory (RAM). Each core has 6 MB cache. • Mellanox 40Gb/s end-to-end InfiniBand adapters and switches. • LINPACK result: 22.8 TFLOPS (trillion operations per sec) • Over 10,000 power and thermal sensors • Variable power modes: DVFS control (2.4 and 2.8 GHZ), Fan-Speed control, Concurrency throttling,etc. (Check: /sys/devices/system/cpu/cpuX/Scaling_avaliable_frequencies.) • Intelligent Power Distribution Unit: Dominion PX (remotely control the servers and network devices. Also monitor current, voltage, power, and temperature through Raritan’s KVM switches and secure Console Servers.)
Deployment Details * 13 racks total, 24 nodes on each rack and 8 nodes on each layer. * 5 PDUs per rack. Raritan PDU Model DPCS12-20. Each single PUD in SystemG has an unique IP address and Users can use IPMI to access and retrieve information from the PDUS and also control them such as remotely shuting down and restarting machines, recording system AC power, etc. * There are two types of switch: 1) Ethernet Switch: 1 Gb/sec Ethernet switch. 36 nodes share one Ethernet switch. 2) InfiniBand switch: 40 Gb/sec InfiniBand switch. 24 nodes (which is one rack) share one IB switch.
Data collection system and Labview Sample diagram and corresponding front panel from Labview:
Published Papers And Useful Links Papers: 1. Rong Ge, Xizhou Feng, Shuaiwen Song, Hung-Ching Chang, Dong Li, Kirk W. Cameron, PowerPack: Energy profiling and analysis of High-Performance Systems and Applications, IEEE Transactions on Parallel and Distributed Systems, Apr. 2009. 2. Shuaiwen Song, Rong Ge, Xizhou Feng, Kirk W. Cameron, Energy Profiling and Analysis of the HPC Challenge Benchmarks, The International Journal of High Performance Computing Applications, Vol. 23, No. 3, 265-276 (2009) NI system set details: http://sine.ni.com/nips/cds/view/p/lang/en/nid/202545 http://sine.ni.com/nips/cds/view/p/lang/en/nid/202571
The future… Present - 2012 • PowerPack • Streaming sensor data from any source • PAPI Integration • Correlated to various systems and applications • Prophesy Integration • Analytics to provide unified interface • SysteMISER • Study effects of power-aware disks and NICs • Study effects of emergent architectures (CMT, SMT, etc) • Coschedule power modes for energy savings
Outreach • See http://green500.org • See http://thegreengrid.org • See http://www.spec.org/specpower/ • See http://hppac.cs.vt.edu SCAPE Laboratory Confidential
Acknowledgements • My SCAPE Team • Dr. Xizhou Feng (PhD 2006) • Dr. Rong Ge (PhD 2008) • Dr. Matt Tolentino (PhD 2009) • Mr. Dong Li (PhD Student, exp 2010) • Mr. Song Shuaiwen (PhD Student, exp 2010) • Mr. Chun-Yi Su, Mr. Hung-Ching Chang • Funding Sources • National Science Foundation (CISE: CCF, CNS) • Department of Energy (SC) • Intel
Thank you very much. http://scape.cs.vt.edu cameron@cs.vt.edu Thanks to our sponsors: NSF (Career, CCF, CNS), DOE (SC), Intel