160 likes | 306 Views
A Disk and Thermal Emulation Model for RAMP. Zhangxi Tan and David Patterson. Outline. Introduction and retrospective overview Improvement since June 06 Disk and temperature emulation Future work. June 06 status . Internet in a box Version 0 3 Xilinx XUP board ($299*3) with 12 processors
E N D
A Disk and Thermal Emulation Model for RAMP Zhangxi Tan and David Patterson
Outline • Introduction and retrospective overview • Improvement since June 06 • Disk and temperature emulation • Future work
June 06 status • Internet in a box Version 0 • 3 Xilinx XUP board ($299*3) with 12 processors • uClinux and research application (i3) • Limitations • Software base is poor • No MMU, no fork, no full version of linux • Every software need porting • Processor is too slow (100 MHz vs 3 GHz) • No local storage per nodes
Agenda • Introduction and retrospective overview • Improvement since June 06 • Disk and temperature emulation • Future work
Disk and Thermal Emulation • Local disk is an essential part for datacenter • Local physical storage • Variable disk specifications (VM only have a function module) • In the context of real workload • Temperature is a critical issue in DC • Cooling, reliability • How the workload will affect the temperature in datacenter is an interesting topic
Methodology • HW Emulator (FPGA): 32-bit Leon3 with, 50MHz, 90 MHz DDR memory, 8K L1 Cache (4K Inst and 4K Data) • Target system: Linux 2.6 kernel, 50 MHz – 2 GHz • PC – storage, trace logger and model solver (offline or online) • Emulating IDE disk with Ethernet based network storage (ATA over Ethernet) + DiskSim • AoE: Encapsulate IDE command in Ethernet packet • DiskSim: widely used disk simulator (provide access timing based on disk specification) • Thermal emulation is done by Mercury suite (ASPLOS’ 06) • Sample CPU/disk activities periodically and send to a central emulator • Emulator takes system configuration and predict temperature based on Newton’s laws of cooling • Disk state will help power estimation • Time dilation makes “target” looks faster • Reprogram HW timer to make ‘jiffies’ longer in terms of wall clock • Slow down memory accordingly, when speeding up processor
Experiments • Thermal emulation model (validated in Mercury) • Physical layout from Dell PowerEdge 2850 • 3 GHz Xeon, 10K RPM SCSI • Emulated disk model (validated disk model in Disksim) • Seagate Cheetah 9LP • 10K RPM, 5 ms avg seek time • Several programs run in target system with various time dilation factors • Dhrystone: CPU intensive benchmark • Postmark: A file system benchmark (disk intensive) • Unix command with pipe (both disk and CPU intensive) • cat alargefile | grep ‘a search pattern’ > searchresultfile • 100 MB file size • Emulation output • Performance statistics • System temperature
Dhrystone result (w/o memory TD) How close to a 3 GHz x86 ~8000 Dhrystone MIPS? Memory, Cache, CPI
Dhrystone w. Memory TD Keep the memory access latency constant -90 MHz DDR DRAM w. 200 ns latency in all target (50MHz to 2GHz)- Latency is pessimistic, but reflect the trend
Postmark file system benchmark • Speed-up factor is larger than TDF (overhead) • How close to modern SATA disk? Twice throughput if run the same benchmark.
Disk emulation performance • Overhead analysis • <1.4ms sending packet (no zero-copy, VM) • Burst of requests (service time < 10ms, including Disksim), AoE protocol segmentation • Larger TDF offset overhead • Overall emulated disk time still a little longer than simulated timing in disksim (~2.8 ms)
Emulated disk R/W time in target • Pretty deterministic result with different TDF
CPU Temperature Emulation • Need calibration to get correct absolute value • Trend is accurate 50 MHz 250 MHz 500 MHz 1 GHz 2 GHz
Disk Temperature Emulation 50 MHz 250 MHz 500 MHz 1 GHz 2 GHz
Limitations and Conclusion • Limitations • AoE limits the maximum number of RW sectors to 2! (Ethernet packet limitation) • Naïve memory dilation (constant delay) • Conclusion • Doing disk emulation in SW is pretty “lightweight”, if • Time dilation makes SW disk fast enough • Having separate network channel for disk emulation • Future work • Better statistic time dilation model (CPI, distribution), still simple HW • Emulate real-life disk controller (e.g. Intel ICH) less overhead