380 likes | 516 Views
An Analysis of Facebook Photo Caching. by Huang et al., SOSP 2013. Presented by Phuong Nguyen. Some animations and figures are borrowed from the original paper and presentation. Photos on Facebook: Overview. Album. Feed. Profile. 250 billion photos, as of Sep 2013. 2. Storage Backend.
E N D
An Analysis ofFacebook Photo Caching by Huang et al., SOSP 2013 Presented by Phuong Nguyen Some animations and figures are borrowed from the original paper and presentation
Photos on Facebook: Overview Album Feed Profile 250 billion photos, as of Sep 2013 2
Storage Backend FB Cache Layers Photos on Facebook: Overview Akamai CDN Full-stack Study 3
Local Fetch Client Client Browser Cache Client-based Browser Cache 5
PoP Client Edge Cache Browser Cache Geo-distributed Edge Cache (FIFO) (Millions) (Tens) 6
PoP Data Center Client Browser Cache Edge Cache Origin Cache Single Global Origin Cache (FIFO) Hash(url) (Millions) (Tens) (Four) 7
PoP Data Center Client Backend (Haystack) Browser Cache Edge Cache Origin Cache Haystack Backend (Millions) (Tens) (Four) 8
PoP Data Center Client Backend (Haystack) Browser Cache Edge Cache Origin Cache Trace Collection Instrumentation Scope • Objective: collecting a representative sample that could permits correlation of events related to the same request 10
Sampling Strategies • Request-based: sampling requests randomly • Bias on popular content • Objected-based: focused on some subset of photos selected by a deterministic test on photoId • Fair coverage of unpopular photos • Cross stack analysis 11
Analysis Objectives • Traffic sheltering effects of caches • Photo popularity distribution • Geographic traffic distribution & collaborative caching • Can we make the cache better? • Impact of sizes & algorithm • Could we know which photos to cache? 13
PoP Data Center Client 65.5% 58.0% 31.8% 77.2M 11.2M 7.6M 26.6M 9.9% 65.5% 20.0% 4.6% Traffic Share Backend (Haystack) Browser Cache Edge Cache Origin Cache Traffic Sheltering R 15
Popularity Distribution • Skewness is reduced after layers of cache 17
ANALYSIS:GEOGRAPHIC TRAFFIC DISTRIBUTION & COLLABORATIVE CACHING 19
Miami Chicago NYC Atlanta 35% local 35% local 60% local Substantial Remote Traffic at Edge LA Dallas 18% local 50% local 20% local 20
Atlanta 5% NYC Substantial Remote Traffic at Edge 10% Chicago 35% D.C. 5% California • Atlanta has 80% requests served by remote Edges 5% Dallas 20% Miami 20% local 21
18% Impact of Using Collaborative Edge Collaborative Collaborative Edge increases hit ratio by 18% 23
Potential Improvement Study • Methodology: cache simulation • Replay the trace (25% warm up) • Evaluate using remaining 75% • Improvement factors: • Cache size • Caching algorithm • Evaluation metric: hit ratio 25
Edge Cache with Different Sizes & Algorithms Infinite Cache The same hit ratio can be achieved with a smaller cache and higher-performing algorithms 26
Edge Cache with Different Sizes & Algorithms Infinite Cache Sophisticated algorithm can achieve better hit ratio with the same cache size 27
Intuitions • Properties that intuitively associated with photo traffic: • The age of photos • The number of Facebook followers associated with the owner 29
Content Age Affect • Age-based cache replacement algorithm could be effective • Fresh content is popular and tends to be effectively cached throughout the hierarchy 30
Social Affect • The more popular photo owner is, the more likely the photo is to be accessed • Browser caches tend to have lower hit ratios for popular users (“viral” effect) 31
DISCUSSIONS 32
Discussions • Evaluation method: • Only consider desktop clients, excluding mobile clients • Trends by mobility of users • Sampling: object-based sampling might not represent realistic workload • Impact of caching done by Akamai CDN • Correlating requests method is not perfect • Latency issue • Evaluation mainly focuses on hit ratio & traffic sheltering, not latency • Latency of collaborative caching is note evaluated 33
Discussions (cont.) • Other potential improvements: • Improved caching algorithm taking into account metadata of photos • Optimal placement of resizing functionality along the stack • The use of Clairvoyant caching might be possible based on predicting future accesses • E.g., photos from the same album, photos appear on news feed, etc. • Solve geographical diversity by improving routing policy (e.g., put more weight into locality aspect) 34
THANK YOU! 35