300 likes | 492 Views
A Hybrid Caching Strategy for Streaming Media Files. Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan November 2001. Outline. Characteristics of Streaming Media (SM) files Delivery of SM files
E N D
A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan November 2001
Outline • Characteristics of Streaming Media (SM) files • Delivery of SM files • Hypothesis and Assumptions • Previous Caching Policies • New Policy Performance Comparison • New Caching Policies • Conclusions and Future Work
Characteristics of SM Files • Large file size • cache on disk • Sustained I/Obandwidth • inserting and reading new content • Clients access partial files • initial portion • favored segment • base + variable number of layers of layered encoding
Delivery of SM Files • Unicast streaming: • server bandwidth is linear in client request rate • goal: maximize byte hit ratio • Multicast streaming • save bandwidth • cost sharing introduces new tradeoffs
Caching for Multicast Streams: Tradeoffs • example: 10 distributed proxy servers each serving a local region, 100 requests (on avg) arrive per region during a given popular video need 7 streams per region, or 12 streams at the remote server
Caching for Multicast Streams: Tradeoffs • caching popular content reduces the load on the remote server and network • delivering popular content from the remote server amortizes the cost of a stream over more clients • earlier portions of a popular video require more bandwidth and have less cost-sharing than later portions
New Caching Policies Research • Hypothesis: popularity-based strategy will outperform replacement-based strategy • significant fraction of requests to uncached files may be for files that are accessed very sporadically • Assumptions: • limited disk space implies limited disk bandwidth • proxy bandwidth for delivering cached streams is equal to min of proxy disk bw and proxy network bw (call this proxy disk bandwidth)
Current Web Caching Policies • Replacement based (cache on each miss) • Top replacement candidate is an ad-hoc combination of: • large files • least recently access or lower access frequency • miss penalty (server latency, bandwidth) • Cache whole file or none • Unicast • Ignore limited disk bandwidth
Previous SM Caching Policies • Interval Caching [DaSi93, KaRT95] • Resource Based Caching (RBC) [TVDS98] • Least Frequently Used (LFU) • Block-based insertion and deletion [AcSm00] • Popularity-based caching for layered encoding [RYHE00] • Prefix and Segment Caching for smoothing [SeRT99,WZDS98]
0 T S1 S2 S1 Time 0 T S2 S1 0 T S3 S2 S1 0 T Interval Caching • Cache smallest intervals • Target: memory caches (lots of insertions) Filef
Resource Based Caching • Cache entire files and intervals/runs • Goal: efficiently utilize the limited resource • limited space: cache smallest space requirement • limited bandwidth: cache smallest write overhead • Pre-allocate bandwidth to each cached entity • Complex algorithm • Complex implementation • High time complexity
Step 1: Selecting entity x {interval, run, file} of file i 1) If Ubw > Uspace + Choose the entity with lowest 2) If Uspace > Ubw + Choose the entity with minimum space requirement Si,x 3) If Uspace - < Ubw < Uspace + Choose the entity with largest Step 2: Caching decision for entity x 1) If enough unallocated space and unallocated bandwidth: Cache entity x 2) If enough unallocated space but bandwidth constrained: Use bandwidth goodness list to select candidates for eviction 3) If enough unallocated bandwidth but space constrained: Use space goodness list to select candidates for eviction 4) If both bandwidth and space constrained: Walk on both lists: at each step, remove entity from bandwidth goodness list or from space goodness list. Step 3: Allocate spaceandbandwidth for entity x RBC Algorithm
Least Frequently Used • Different implementation options: • What to do when receive first access to an object? • How to estimate frequency? • Version studied: Currently Most Popular (CMP) • Insert only most frequently accessed (file or segment) • On-line popularity estimate: future research
Previous comparison : RBC vs. CMP [TVDS98] • Fixed fileaccess frequencies • RBC outperforms CMP for all parameter values studied • Limited design space • e.g.: total cache size 16GB • Inconsistent results
New Performance Comparison • Re-evaluate byte hit ratio of CMP and RBC • Simulation with synthetic workload • Broad design space • New Pooled RBC • New simple hybrid CMP/interval caching (CMP/IC) policy
System Assumptions • Arrivals: Poisson() • extra experiments with Pareto(,k) • File access frequency: Zipf() • Perfect File popularity • extra experiments with approximate file popularity • Uniform file size and delivery rate • extra experiments with variable file size and delivery rate • Load balanced across multiple disks
System Parameters • n : number of files • : Zipf parameter • N : arrival rate (avg. number of requests per avg. file duration T) N = T • C : cache size (fraction of media data accessed)
B: normalized disk bandwidth (fraction of the average number of simultaneous streams needed to deliver data that is cached by CMP) B depends on N, , n, C and disk technology Relative performance of policies depends mainly on B B = 1.0 : CMP system is bandwidth balanced B 1.0 : CMP system is bandwidth deficient B 1.0: CMP system is bandwidth abundant System Parameters
Normalized Disk Bandwidth (B)Example • Ultrastar 72ZX disk : • disk space: 116.76hours of MPEG-1 video (73.4GB) • disk bandwidth: 108 MPEG-1 streams (22-37 MB/s ) • Assume: 100 requests / hour for cached files • If cache contains 2-hour movies: • Need 200 streams • B =108/200 = 0.54 • If cache contains 30-minute TV shows: • Need 50 streams for cache content • B =108/50 = 2.16
RBC vs. CMP • CMP outperforms RBC if B 1.0 • RBC slightly outperforms CMP if B 1.0 and small caches N = 450, n= 100, =0
Files Cached by RBC • Average fraction of each file cached by RBC (N = 450, n = 100, C=0.25) B = 0.75 B = 1.0 B = 2.0
Space and Bandwidth Utilization B = 0.75 B = 1.0 B = 2.0
Pooled RBC • Three improvements over RBC • simpler rule to select entity to cache • can keep cached intervals when deleting a full file • pool of pre-allocated bandwidth • Similar complexityas RBC
Pooled RBC, RBC and LFU • Pooled RBC CMP • BUT, Pooled RBC is much more complex than CMP N = 450, n= 100, =0
Hybrid CMP/IC Policies • Do interval caching on a separate (small) cache • Interval Cache in Main Memory: CMP/ICmem and Pooled RBC/ICmem • Interval Cache on Disk: CMP/ICdisk • e.g. 5% of disk cache
CMP/ICmem vs. Pooled RBC/ICmem N = 450, n= 100, =0 • Memory cache improves CMP and Pooled RBC • B 1.0 : greater improvement for CMP
CMP/ICdisk vs. Pooled RBC N = 450, n= 100, =0 • CMP/ICdisk Pooled RBC CMP
Conclusions • Simple CMP • simple to implement • performance similar to Pooled RBC, CMP/ICdisk (static file popularities) • Hybrid CMP/IC policy • Performance Pooled RBC • simple to implement • possibly more robust (imperfect and dynamic popularity measures)
Future Work • Develop on-line estimate of file popularity • Server log analysis • client behavior and workloads (NOSSDAV’01 paper) • More logs!!!! • Caching Policies for Multicast Streams • popular file has greater cache-sharing if not cached • determine cache content that minimizes per-client cost • caching principles / on-line policy • (coming up soon) • Prototype, experimental ( live ) workloads