Caching for Bursts ( C-Burst ): Let Hard Disks Sleep Well and Work Energetically

Caching for Bursts (C-Burst): Let Hard Disks Sleep Well and Work Energetically Feng Chen and Xiaodong Zhang Dept. of Computer Science and Engineering The Ohio State University

Power Management in Hard Disk • Power management is a requirement in computer system design • Maintenance cost • Reliability & Durability • Environmental effects • Battery life • Hard disk drive is a big energy consumer, for I/O intensive jobs • e.g. hard disk drives account for 86% of total energy consumption in EMC Symmetrix 3000 storage systems. • As multi-core CPUs become more energy efficient, disks are less.

Standard Power Management • Dynamic Power Management (DPM) • When disk is idle, spin it down to save energy • When a request arrives, spin it up to service the request • Frequently spin up/down a disk incurs substantial penalty: high latency and energy • Disk energy consumption is highly dependent on the pattern of disk accesses (periodically sleep and work)

Ideal Access Patterns for Power Saving • Ideal Disk Power Saving Condition • Requests to disk form a periodic and burst pattern • Hard disk can sleep well and work energetically Disk accesses Long Disk Sleep Interval • IncreasingBurstiness of disk accesses is thekey tosaving disk energy time Disk Accesses in bursts

Buffer Caches Affect Disk Access Patterns • Buffer caches in DRAM are part of hard disk service • Disk data are cached in main memory (buffer cache) • Hits in buffer caches are fast, and avoid disk accesses • The buffer cache is able to filter and change disk access streams Applications requests Buffer Cache Prefetching Caching disk accesses Hard Disk

Existing Solution for Burst in Disks • Forming Burst Disk Accesses with Prefetching • Predict the data that are likely to be accessed in the future • Preload the to-be-used data into memory • Directly condense disk accesses into a sequence of I/O bursts • Both energy efficiency and performance could be improved • Limitations • Buffer caching share the same buffer space. • Energy-unaware replacement can easily change burst patterns created by prefetching • Aggressive prefetching shrinks available caching space and demands highly effective caching • Energy-aware caching policy can effectively complement prefetching • No work has been done. Reference – Papathanasiou and Scott, USENIX’04

Caching can Easily Affect Access Patterns • An example Original Disk Accesses Buffer Cache a • Solely relying on prefetching is sub-optimal, • Energy-aware caching policyis needed to create burst disk accesses b time time BurstyDisk Accessesorganized by Prefetching c a b c Disk still cannot sleep well 3 blocks to be evicted by Energy-Unaware Caching Unfortunately, these blocks to be accessed in a non-bursty way

Caching Policy is Designed for Locality • Standard Caching Policies • Identify the data that are unlikely to be accessed in the future (LRU) • Evict the not-to-be-used data out from memory • They are performance-oriented and energy-unaware • Most are locality-based algorithms, e.g. LRU (clock), LIRS (clock-pro), • Designed for reducing the number of disk accesses • No consideration of creating burst disk access pattern • C-Burst (Caching for Bursts) • Our objectives: effective buffer caching • To create burst disk accesses for disk energy saving • To retain the performance (high hit ratios in buffer cache)

Outlines • Motivation • Scheme Design • History-based C-Burst (HC-Burst) • Prediction-based C-Burst (PC-Burst) • Memory Regions • Performance Loss Control • Performance Evaluation • Programming • Multimedia • Multi-role Server • Conclusion

Restructuring Buffer Caches • Buffer cache is segmented into two regions • Priority region (PR) • Hot blocks are managed using LRU-based scheme • Blocks w/ strong locality are protected in PR • Overwhelming memory misses can be avoided • Retain the performance • Energy-aware region (EAR) • Cached blocks are managed using C-burst schemes • Non-burst accessed blocks are kept here • Re-accessed block (strong locality) is promoted into PR • Region size is dynamically adjusted • Both performance and energy saving are considered Buffer Cache Priority Region (LRU) Buffer Cache (LRU) Energy Aware Region (C-Burst) Our focus in this talk

History-based C-Burst (HC-Burst) • Distinguish different streams of disk accesses • Multiple tasks run simultaneously in practice • History record can help us to distinguish them • Various tasks feature very different access patterns • Burst – e.g. grep, CVS, etc. • Non-Burst – e.g. make, mplayer, etc. • Accesses reaching the hard disk is a mixture of both burst and non-burst accesses • In aggregate, the disk access pattern is determined by the most non-burst one

Basic Idea of History-based C-Burst (HC-Burst) I E A J F B I C G K J D H L K L grep bursty access long disk Idle intervals non-bursty access short disk Idle Period Aggregate Results: non-bursty access Buffer Cache • To cache the blocks being accessed in a non-burst pattern, toreshape disk accesses to a burst pattern time time time make a b c g d e f Grep + make A B C D b c d f g E F G H a e

epoch History-based C-Burst (HC-Burst) • Epoch • Application’s access pattern may change over time • Execution is broken into epochs, say T seconds for each • Too small or too large are both undesired • Our choice – T = (Disk Time-out Threshold ) / 2 • No disk spin-down happens during one epoch with disk accesses • Distribution of disk accesses during one epoch can be ignored A I E J B F G K C D L H grep time time make f a b c d e g

History-based C-Burst (HC-Burst) Block Groups epoch I A E J B F G K C D H L grep time time make a b c d e f g 15 • Block Group • Blocks accessed during one epoch by the same IOC are grouped into a block group • Each block group is identified by an process ID and an epoch time • The size of a block group indicates the burtiness of data access pattern of one application • The larger a block group is, the more bursty disk accesses are

History-based C-Burst (HC-Burst) • HC-Burst Replacement Policy • Two types of blocks should be evicted • Data blocks that are unlikely to be re-accessed • Blocks with weak locality (e.g. LRU blocks) • Data blocks that can be re-accessed with little energy • Blocks being accessed in bursty pattern • Victim block group – the largest block group • Blocks that are frequently accessed would be promoted into PR • Large block group often holds infrequently accessed blocks • Blocks that are accessed in a bursty pattern stay in a large BG • Large block group holds blocks being accessed in bursts

History-based C-Burst (HC-Burst) • Multi-level Queues of Block Groups • 32-level queues of block groups • A block group of N blocks stays in queue • Block groups on one queue are linked in the order of their epoch times • Block groups may move upwards/downwards, if # of blocks changes • The victim block group is always the LRU block group on the top queue w/ valid block groups make # of blks= 1023 Epoch # 8 Grep # of blks= 1024 Epoch ID = 10 Victim block group LRU + MB Grep # of blks= 1023 Epoch ID = 10 Block is promoted to PR Make # of blks= 1024 Epoch # 8 Block is demoted to EAR Most Bursty (MB) Level 10 Level 9 Burtiness Level 1 Level 0 Least Bursty (LRU) Least Recent Used (LRU) Most Recent Used (MRU) Epoch Time

Prediction-based C-Burst (PC-Burst) • Main idea • Certain disk access events are known and can be predicted. • Evicting a block that is to be accessed during a short interval and close to a deterministic disk access short disk idle interval • With deterministic disk accesses and block reaccess time , selectively evicting blocks to be accessed in a short intervals and holding blocks to be accessed in long idle intervals time long disk idle interval deterministic disk accesses predicted block accesses Block B Block A Holding Block B can avoid Breaking a long idle interval

Prediction-based C-Burst (PC-Burst) • Multi-level Queues of Block Groups in PC-Burst • 32-level queues of block groups – each level has two queues • Prediction Block Group (PBG) – blocks to be accessed in the same future epoch time • History Block Group (HBG) – blocks being accessed in the same history epoch time • Reference Points (RP) - resent deterministic disk accesses • Victim block group • The PBG on the top level, in the shortest interval, closest to a RP, to be accessed in the furthest future • If no PBG is found, search the same level queue of HBG Victim block group Level 0 Level 10 long Interval Shortest Interval

Priority Region Priority Region A memory miss Check Ghost Buffer Energy Aware Region Energy Aware Region Performance Loss Control • Why there is performance loss? • Increase of memory misses due to energy-oriented caching policy • How to control performance loss • Basic Rule • Control the size of Energy Aware Region • Estimating performance loss • Ghost buffer (GB) • LRU replacement policy • The increase of memory misses(M) • Blocks not found in EAR, but found in GB • The average memory miss penalty (P) • Observed average I/O latency • Performance loss • L = M x P • Automatic tune Energy Aware Region size • L < Tolerable Performance Loss • Enlarge EAR region size • L > Tolerable Performance Loss • Shrink EAR region size Perf. Loss < Tolerable Rate Enlarge Region size Perof. Loss > Tolerable Rate Shrink Region Size Main Memory Priority Region Energy Aware Region Ghost Buffer (LRU) Hit in Ghost Buffer Perf. Loss ++

Performance Evaluation • Implementation • Linux Kernel 2.6.21.5 • 5,500 lines of code in buffer cache management and generic block layer • Experimental Setup • Intel Pentium 4 3.0GHz • 1024 MB memory • Western Digital WD160GB 7200RPM hard disk drive • RedHat Linux WS4 • Linux 2.6.21.5 kernel • Ext3 file system

Performance Evaluation • Methodology • Workloads run on experiment machine • Disk activities are collected in Kernel on experiment machine • Disk events are sent via netconsole to an monitor machine • Disk energy consumption is calculated based on collected log of disk events using disk power models off line disk activities log Gigabit LAN Experiment machine Monitoring machine

Performance Evaluation • Emulated Disk Models • Hitachi DK23DA Laptop Disk • IBM UltraStar 36Z15 SCSI Disk

Performance Evaluation • Eight applications • 3 applications w/ bursty data accesses • 5 applications w/ non-bursty data accesses • Three Case Studies • Programming • Multi-media Processing • Multi-role servers

Performance Evaluation • Case Study I – programming • Applications: grep, make, and vim • Grep – bursty workload • Make, vim – non-bursty workload • C-Burst schemes protects data set of make from being evicted by grep • Disk idle intervals are effectively extended

Performance Evaluation • Case Study I – programming • Applications: grep, make, and vim • Grep – bursty workload • Make, vim – non-bursty workload • C-Burst schemes protects data set of make from being evicted by grep • Disk idle intervals are effectively extended Nearly 0% intervals > 16 sec over 30% energy saving Over 50% intervals > 16 sec

Performance Evaluation • Case Study II – Multi-media Processing • Applications: transcode, mpg123, and scp • Mpg123 – its disk accesses serve as deterministic accesses • Scp – bursty disk accesses • Transcode – non-bursty accesses • PC-Burst achieves better performance than HC-Burst • PC-Burst can accommodate deeper prefetching by efficiently using caching space Around 30% energy saving Over 70% intervals > 16 sec

Performance Evaluation • Case Study III – Multi-role Server • Applications: TPC-H # 17, and CVS • TPC-H – non-bursty disk accesses ( very random I/O ) • CVS – bursty disk accesses • Dataset of TPC-H is protected in memory • Performance of TPC-H is significantly improved No improvement on disk idle interval Reduced I/O latency over 35% energy saving

Our Contributions • Design a set of comprehensive energy-aware caching policies, called C-Burst, which leverages the filtering effect of buffer cache to manipulate disk accesses • Our scheme does not rely on complicated disk power models and requires no disk specification data, which means high compatibility to different hardware • Our scheme does not assume using any specific disk hardware, such as multi-speed disks, which means our scheme is beneficial to existing hardware • Our scheme provides flexible performance guarantees to avoid unacceptable performance degradation • Our scheme is fully implemented in Linux kernel 2.6.21.5, and experiments under realistic scenarios show up to 35% energy saving with minimal performance loss

Conclusion • Energy efficiency is a critical issue for computer system design • Increasing disk access burstiness is the key to achieving disk energy conservation • Leveraging filtering effect of buffer cache can effectively shape the disk accesses to an expected pattern • HC-Burst scheme can distinguish different access pattern of tasks and create a bursty stream of disk accesses • PC-burst scheme can further predict blocks’ re-access time and manipulate the timing of future disk accesses • Our implementation of C-Burst schemes in Linux kernel 2.6.21.5 and experiments show that C-Burst schemes can achieve up to 35% energy saving with minimal performance loss

References • [USENIX04] A. E. Papathanasiou and M. L. Scott, Energy Efficient prefetching and caching. In Proc. of USENIX’04 • [EMC99] EMC Symmetrix 3000 and 5000 enterprise storage systems product description guide. http://www.emc.com, 1999.

Memory Regions • Buffer cache is segmented into two regions • Priority region (PR) • Hot blocks are managed using LRU-based scheme • Blocks w/ strong locality are protected • Overwhelming memory misses can be avoided • Energy-aware region (EAR) • Cold blocks are managed using C-burst schemes • Victim blocks are always reclaimed from EAR • Re-accessed block is promoted into PR • Accordingly, the coldest block in PR is demoted • Region size is tuned on line • Both performance and energy saving are considered Buffer Cache Priority Region Promote a cold block Demote a hot block Energy Aware Region Insert a new block Evict a cold block

Motivation • Limitations • Energy-unaware caching policy can significantly affect periodic bursty patternscreated by prefetching • Improperly evicting a block may easily break a long disk idle interval • Aggressive prefetching shrinks available caching space and demands highly effective caching • Effective prefetching needs large volume of memory space, which raises high memory contention for caching • Energy-aware caching policy can effectively complement prefetching • When prefetching works unsatisfactorily, caching can give a hand by carefully selecting blocks for eviction

Motivation • Prefetching • Predict the data that are likely to be accessed in the future • Preload the to-be-used data into memory • Directly condense disk accesses into a sequence of I/O bursts • Both energy efficiency and performance could be improved • Caching • Identify the data that are unlikely to be accessed in the future • Evict the not-to-be-used data out from memory • Traditional caching policies are performance-oriented • Designed for reducing the number of disk accesses • No consideration of creating bursty disk access pattern

Prediction-based C-Burst (PC-Burst) 39 • Prediction of Deterministic Disk Accesses • Track each task’s access history and offer each task credits of [-32, 32] • Feed-back based Prediction • Compare observed interval with predicted interval • If prediction is proved wrong, reduce a task’s credits • If prediction is proved right, increase a task’s credits • Task w/ credit less than 0 is unpredictable • Repeated mis-prediction increases the charge of credits exponentially • Occasional system dynamics only charge a task’s credit slightly • Real patter change quickly decreases a task’s credits

Caching for Bursts ( C-Burst ): Let Hard Disks Sleep Well and Work Energetically