230 likes | 450 Views
Memory Resource Allocation for File System Prefetching -- From a Supply Chain Management Perspective. Zhe Zhang (NCSU ), Amit Kulkarni (NCSU) Xiaosong Ma (NCSU/ORNL), Yuanyuan Zhou (UIUC). Aggressive prefetching: an idea whose time has come *. Enlarging processor-I/O gap
E N D
Memory Resource Allocation for File System Prefetching-- From a Supply Chain Management Perspective Zhe Zhang (NCSU), Amit Kulkarni (NCSU) Xiaosong Ma (NCSU/ORNL), Yuanyuan Zhou (UIUC)
Aggressive prefetching: an idea whose time has come* • Enlarging processor-I/O gap • Processing power doubling every 18 to 24 months • Disparity between growth of disk latency and throughput • Latency improving 10% per year while throughput improving 40% per year [Hennessy 03] • Large memory cache sizes • Usually 0.05% ~ 0.2% of storage capacity [Hsu 04] * [Papathanasiou 05]
… and whose challenges follow • Systems facing large number of concurrent requests #1 Facebook How to manage file systems’ memory resource for aggressive prefetching? • Servers handling large number of clients #10 … Jaguar @ Oak Ridge National Lab Lustre 11,000 Compute nodes … 72 I/O nodes … 18 DDN S2A9500 couplets
All streams are not created equal • Allocating memory resource according to access rate? • Related work • Access pattern detection: rate not detected [Lee 87, Li 04, Soundararajan 08] • Aggressiveness control: based on sequentialty [Patterson 95, Kaplan 02, Li 05] • Multi-stream prefetching: rate not sufficient utilized [Cao 96, Tomkins 97, Gill 07] MP3 Youtube : 128 kbps : 200 kbps : 900 kbps Youtube HQ
Similar story in grocery stores! • Allocating storage resource according to consumption rate? • Studied in Supply Chain Management (SCM) • Demand rate measurement/analysis/prediction • Dated back to first wars • Yet active • Wal-Mart: $24M on satellite network for instant inventory control • Dell: aiming at “zero inventory” …… …… Milk : 200 per day Beer : 80 per day $300 Wine : 1 per year
Our contributions • A mapping between data prefetching and SCM problems • Novel rate-aware multi-stream prefetching techniques based on SCM heuristics • Implementation and performance evaluation • Modified Linux 2.6.18 kernel • Extensive experiments with modern server and multiple workloads • Coordinated multi-level prefetching • Based on multi-echelon inventory control • Extending application access pattern to lower level • Evaluation with combinations of state-of-the-art single level algorithms
Outline • Motivation • Background and problem mapping • Algorithms • Performance evaluation • Conclusions
Background – Inventory cycles • Inventory theory • Task: manage inventory for goods • Goal: satisfy customer demands order quantity Inventory level fast dem -and cycle inventory average demand slow demand reorder point safety inventory Time lead time
Background – Prefetching basics Memory cache trigger distance prefetch degree Disk
Background – Prefetching cycles • Prefetching techniques: • Task: manage the cache for data blocks • Goal: satisfy application requests order quantity prefetch degree Prefetched blocks fast dem -and cycle inventory average demand slow demand reorder point Tc trigger distance safety inventory Ts Time disk access time lead time
Challenges in mapping • Data requests Customer demands • Data blocks are unique • “Linear sequence of blocks” in detected streams GroceryStore::getMilk(); FileSystem::getNextBlock(); FileSystem::getBlock(Position p); • Prefetched data blocks Inventory • Accessed data blocks remain in the cache • But as “second class citizens” [Gill 05, Li 05]
Outline • Motivation • Background and problem mapping • Algorithms • Performance evaluation • Conclusions
Performance metrics and objectives • SCM optimization objective: improve fill rate • Fraction of demand satisfied from inventory • Prefetching optimization objective: improve cache hit rate • Dynamically adjust • Trigger distance • Prefetch degree • ESC: Expected Shortage per Cycle • Q: order quantity
Rate aware prefetching algorithms prefetch degree Prefetched blocks • Task: calculating Tc and Ts • Tc: lead time × average consumption rate • Ts: based on estimation of uncertainty cycle inventory average demand slow demand fast demand reorder point Tc trigger distance safety inventory Ts Time
Algorithm1: Equal Time Supplies (ETS) • Safety inventory for all goods set to the same time supply (e.g., amount of goods consumed in 5 days) • With “standard” distribution shapes, uncertainty is proportional to the mean value • Ts: set to be proportional to average data access rate trigger distance of streami average rate of streami total allowed trigger distance
Algorithm2: Equal Safety Factors (ESF) • Safety inventory set to maintain the same safety factor across all goods • Ts: set to be proportional to standard deviation of access rate standard deviation • Implementation challenges • Measurement and calculation overhead • Limited floating point calculation in kernel
Outline • Motivation • Background and problem mapping • Algorithms • Performance evaluation • Conclusions
Comparing with Linux native prefetching • Linux prefetching algorithm (kernel 2.6.18) • Trigger distance (T) = Prefetch degree (P) • Doubling T and P for each sequential hit • Upper bounds: • T = P = 32 (pages) • Implementation of SCM-based algorithms • Principle: maintaining same memory consumption as original algorithm • Default parameters • Tdefault = 24, Pdefault = 48 32-32 24-48
Experimental setup • Platform • Linux server • 2.33GHz quad-core CPU, 16GB memory • Comparing 32-32, 24-48, ETS and ESF algorithms • Workloads • Synthetic benchmarks • Linux file transfer applications • HTTP web server workload • Server benchmarks • SPC2-VOD-like (sequential) • TPC-H (random)
Two streams with different rates • Rate of stream 1 fixed at 1000 pages / second • Rate of stream 2 varying b/w 3000 to 7000 pages / second Rate of fast stream (pages/second ) Average response time ETS: 19%~25% improvement over 32-32 # of cache misses per prefetch cycle (ESC) ETS: same # of cycles as 24-48 and similar ESC as 32-32
Two streams with different deviations • SD of stream 1 fixed at square root of rate • SD of stream 2 varying b/w 3 to 7 times of the average rate SD of unstable stream SD of unstable stream Average response time ESF: 20%~35% improvement over ETS Response time of individual streams ESF: large improvement for unstable stream, small degradation for stable stream
Throughput of server benchmarks • SPC2-VOD-like (sequential streams) • TPC-H (random accesses) Random application throughput ETS: never worth than 32-32; 2.5% average improvement Sequential+random apps. throughput ETS: 6%~53% improvement over 32-32 Sequential+random apps. memory consumption
Conclusions and future work • Observations • File blocks can be managed as apples! • Simple approaches such as ETS seems to perform well • Future work • Awareness of both access rate and delivery time • Adjusting the prefetch degree • Acknowledgements • Anonymous reviewers • Our shepherd: George Candea • Our sponsors: NSF and DOE Office of Science