1 / 50

How The Cloud Works

How The Cloud Works. Cornell University. Ken Birman. Consider Facebook. Popular social networking site (currently blocked in China), with > 1B users, growing steadily Main user interface: very visual, with many photos, icons, ways to comment on things, “like/dislike” Rapid streams of updates.

romney
Download Presentation

How The Cloud Works

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How The Cloud Works Cornell University Ken Birman

  2. Consider Facebook • Popular social networking site (currently blocked in China), with > 1B users, growing steadily • Main user interface: very visual, with many photos, icons, ways to comment on things, “like/dislike” • Rapid streams of updates

  3. Facebook Page • Page itself was downloaded from Facebook.com • They operate many data centers, all can serve any user • Page was full of URLs • Each URL triggered a further download. Many fetched photos or videos User’s wall: a continuously scrolling information feed with data from her friends, news, etc

  4. Facebook image fetching architecture Edge Cache Edge Cache Facebook Edge Edge Cache Facebook Edge local cache Facebook Edge Edge Cache Facebook Edge Akamai Resizer Cache Akamai AkamaiCloud Facebook Resizer Akamai Haystack

  5. The system... • Operates globally, on every continent • Has hundreds of Facebook Edge sites • Dozens of Resizer locations just in the USA, many more elsewhere • A “few” Haystack systems for each continent • Close relationships with Akamai, other network providers

  6. Things to notice • The cloud isn’t “about” one big system • We see multiple systems that talk easily to on-another all using web-page standards (XML, HTML, MPLS...) • They play complementary roles • Facebook deals with Akamai for caching of certain kinds of very popular images, AdNet and others for advertising, etc • And within the Facebook cloud are many, many, interconnected subsystems

  7. Why so many caches? • To answer, need to start by understanding what a cache does • A cache is a collection of web pages and images, fetched previously and then retained • A request for an image already in the cache will be satisfied from the cache • Goal is to spread the work of filling the Facebook web page so widely that no single element could get overloaded and become a bottleneck

  8. But why so many layers? • Akamai.com is a web company dedicated to caching for rapid data (mostly images) delivery • If Facebook uses Akamai, why would Facebook ever need its own caches? • Do the caches “talk to each other”? Should they? • To understand the cloud we should try and understand answers to questions like these

  9. Memcached: A popular cache • Stands for “In-Memory Caching Daemon” • A simple, very common caching tool • Each machine maintains its own (single) cache function get_foo(foo_id) foo = memcached_get("foo:" . foo_id) return foo if defined foo foo = fetch_foo_from_database(foo_id) memcached_set("foo:" . foo_id, foo) return foo end

  10. Should we use Memcached everywhere? • Cached data can be stale (old/incorrect) • A cache is not automatically updated when data changes; need to invalidate or update the entry yourself • And you may have no way to know that the data changed • When a cache gets full, we must evict less popular content. What policy will be used? (Memcached: LRU) • When applications (on one machine) share Memcached, they need to agree on the naming rule they will use for content • Otherwise could end up with many cached copies of Angelina Joli and Brad Pitt, “filling up” the limited cache space

  11. Fun with Memcached • There are systems built over Memcached that have become very important • Berkeley Spark system is a good example • Spark  Memcached + a nice rule for naming what the cache contains • Spark approach focuses on in-memory caching on behalf of the popular MapReduce/Hadoop computing tool

  12. MapReduce/Hadoop • Used when searching or indexing very large data sets constructed from collections of web data • For example, all the web pages in the Internet • Or all the friending relationships in all of Facebook • Idea is to spread the data over many machines, then run highly parallel computations on unchanging data • The actual computing tends to be simple programs • Map step: spreads computing out. Reduce: combines intermediary results. Result aggregated with exactly one copy of each intermediary output • Often iterative: second step depends on output of first step

  13. Spark: Memcached for MapReduce • Spark developers reasoned that if MapReduce uses files for the intermediary results, file I/O would be a peformance limiting cost • Confirmed this using experiments • It also turned out that many steps recompute nearly the identical thing (for example by counting words in a file) • Memcached can help... if • MapReduce can find the precomputed results • and if we “place” tasks to run where those precomputed results are likely to be found

  14. Spark “naming convention” • Key idea in Spark: rather than name intermediate results using URLs or file names, they use the “function that produced the result” • Represented in a functional programming notation based on the Microsoft LINQ syntax • In effect: “This file contains ((X))” • Since the underlying data is unchanging, e.g. a file, “X” has same meaning at all times • Thus ((X)) has a fixed meaning too • By cleverly keeping subresults likely to be reused, Spark obtained huge speedups, often 1000x or more!

  15. Spark has become very important • While idea is simple, providing a full range of control over what is in the cache and when it is searched and when things are evicted is complex • Spark functionality was initially very limited but has become extremely comprehensive • There is a user community of tens of thousands • In the cloud, when things take off, they “go viral”! • … even software systems

  16. Key insight • In-memory caching can have huge and very important performance implications • Caching, in general, is of vital importance in the cloud, where many computations run at high load and data rates and immense scale • But not every cache works equally well!

  17. Back to Facebook • Seeing how important Spark became for MapReduce, we can ask questions about the Facebook caching strategy • Are these caches doing a good job? What hit rate do they achieve? • Should certain caches “focus” on retaining only certain kinds of data, while other caches specialize in other kinds of data? When a Facebook component encounters a photo or video, can we “predict” the likely value of caching a copy?

  18. Using Memcached in a pool of machines • Facebook often has hundreds or thousands of machines in one spot, each can run Memcached • They asked: why not share the cache data? • Leads to a distributed cache structure • They built one using ideas from research experiences • A distributed hash table offers a simple way to share data in a large collection of caches

  19. How it works • We all know how a HashMap or HashTable works in a language like Java or C# or C++ • You take the object you want to save and compute a HashCode for it. This is an integer and will look “random” but is deterministic for any single object • For example, it could be the XOR of the bytes in a file • Hashcodes are designed to spread data very evenly over the range of possible integers

  20. Network communication • It is easy for a program on biscuit.cs.cornell.edu to send a message to a program on “jam.cs.cornell.edu” • Each program sets up a “network socket • Each machine has an IP address, you can look them up and programs can do that too via a simple Java utility • Pick a “port number” (this part is a bit of a hack) • Build the message (must be in binary format) • Java utils has a request

  21. Distributed Hash Tables • It is easy for a program on biscuit.cs.cornell.edu to send a message to a program on “jam.cs.cornell.edu” • ... so, given a key and a value • Hash the key • Find the server that “owns” the hashed value • Store the key,value pair in a “local” HashMap there • To get a value, ask the right server to look up key

  22. List of machines • There are several ways to track the machines in a network. Facebook just maintains a table • In each FB data center there is a big table of all machines currently in use • Every machine has a private copy of this table, and if a machine crashes or joins, it is quickly updates (seconds) • Can we turn our table of machines into a form of HashMap?

  23. From a table of machines to a DHT • Take the healthy machines • Compute the HashCode for each using its name, or ID • These are integers in range [Int.MinValue, Int.MaxValue] • Rule: an object with HashCode (O) will be be placed on the K machines closest to (O)

  24. Side remark about tracking members • This Facebook approach uses a “group” of machines and a “view” of the group • We will make use of this in later lectures too • But it is not the only way! Many DHTs track just log(N) of the members and build a routing scheme that takes log(N) “hops” to find an object (See: Chord, Pastry...) • The FB approach is an example of a 1-hop DHT • Cloud systems always favor 1-hop solutions if feasible

  25. Distributed Hash Tables hashmap kept by 123.45.66.782 dht.Put(“ken”,2110) (“ken”, 2110) 123.45.66.781 123.45.66.782 123.45.66.783 123.45.66.784 “ken”.hashcode()%N=77 IP.hashcode()%N=98 IP.hashcode()%N=77 IP.hashcode()%N=13 IP.hashcode()%N=175 dht.Get(“ken”) “ken”.hashcode()%N=77

  26. Facebook image “stack” • We decided to study the effectiveness of caching in the FB image stack, jointly with Facebook researchers • This stack’s role is to serve images (photos, videos) for FB’s hundreds of millions of active users • About 80B large binary objects (“blob”) / day • FB has a huge number of big and small data centers • “Point of presense” or PoP: some FB owned equipment normally near the user • Akamai: A company FB contracts with that caches images • FB resizer service: caches but also resizes images • Haystack: inside data centers, has the actual pictures (a massive file system)

  27. What we instrumented in the FB stack • Think of Facebook as a giant distributed HashMap • Key: photo URL (id, size, hints about where to find it...) • Value: the blob itself

  28. Facebook traffic for a week • Client activity varies daily.... • ... and different photos have very different popularity statistics

  29. Facebook cache effectiveness • Existing caches are very effective... • ... but different layers are more effective for images with different popularity ranks

  30. Facebook cache effectiveness • Each layer should “specialize” in different content. • Photo age strongly predicts effectiveness of caching

  31. Hypothetical changes to caching? • We looked at the idea of having Facebook caches collaborate at national scale… • … and also at how to vary caching based on the “busyness” of the client

  32. Social networking effect? • Hypothesis: caching will work best for photos posted by famous people with zillions of followers • Actual finding: not really

  33. Locality? • Hypothesis: FB probably serves photos from close to where you are sitting • Finding: Not really... • … just the same, ifthe photo exists, itfinds it quickly

  34. Can one conclude anything? • Learning what patterns of access arise, and how effective it is to cache given kinds of data at various layers, we can customize cache strategies • Each layer can look at an image and ask “should I keep a cached copy of this, or not?” • Smart decisions  Facebook is more effective!

  35. Strategy varies by layer • Browser should cache less popular content but not bother to cache the very popular stuff • Akamai/PoP layer should cache the most popular images, etc... • We also discovered that some layers should “cooperatively” cache even over huge distances • Our study discovered that if this were done in the resizer layer, cache hit rates could rise 35%!

  36. … many research questions arise • Can we design much better caching solutions? • Are there periods with bursts of failures? What causes them and what can be done? • How much of the data in a typical cache gets reused? Are there items dropped from cache that should have been retained?

  37. Overall picture in cloud computing • Facebook example illustrates a style of working • Identify high-value problems that matter to the community because of the popularity of the service, the cost of operating it, the speed achieved, etc • Ask how best to solve those problems, ideally using experiments to gain insight • Then build better solutions

  38. Learning More? • We have a paper with more details and data in the 2013 version of ACM Symposium on Operating Systems Principles, SOSP • First author is Qi Huang, a Chinese student who created the famous PPLive system, was studying for his PhD at WUST, then came to Cornell to visit • Qi will eventually earn two PhD degrees! One awarded by Cornell, one by WUST after he finishes • A very amazing and talented cloud computing researcher

  39. More about caching • Clearly, caching is central to modern cloud computing systems! • But the limitation that we are caching static data is worrying • MapReduce/Hadoop use purely static data • FB images and video are static data too • But in “general” cloud computing will have very dynamic kinds of data, rapidly changing

  40. Coherent Cache • We say that a cache is coherent if it is always a perfect real-time replica of the “true” data • True object could be in a database or file system • Or we could dispense with the true object and use only in the in-memory versions. In this case the cache isn’t really a cache but is actually an in-memory replication scheme • In the cloud, file system access is too slow! • So we should learn more about coherent caching

  41. What could a coherent cache hold? • A standard cache just has “objects” from some data structure • A coherent cache could hold the entire data structure! • A web graph with pages and links • A graph of social network relationships, Twitter feeds and followers, etc • Objects on a Beijing street, so that a self-driving car can safely drive to a parking area, park itself, and drive back later to pick you up

  42. Coherent data replication • Clearly we will need to spread our data widely: “Partitioning” is required • We partition data in space (like with a DHT) • Also in time (e.g. version of the database at time T, T+1, …) • Sometimes hierarchically (e.g. users from the US North East, US Central, US North West…) • Famous paper by Jim Gray & others: Dangers of Database Replication and a Solution • Shows that good partitioning functions are critically needed • Without great partitioning, replication slows a system down!

  43. Aspects of coherent replication • A partitioning method • Many servers, small subsets for each partition (“shard” has become the common term) • Synchronization mechanisms for conflicting actions • A method for updating the data • A way to spread the “read only” work over the replicas • Shard membership tracking • Handling of faults that cause whole shard to crash

  44. Does Facebook have coherency? • Experiments reveal answer: “no” • Create a Facebook account and put many images on it • Then form networks with many friends • Now update your Facebook profile image a few times • Your friends may see multiple different images of you on their “wall” for a long period of time! • This reveals that it takes a long time (hours) for old data to clear from the FB cache hierarchy!

  45. Inconsistencies in the cloud

  46. In fact this is common in today’s cloud • We studied many major cloud providing systems • Some guarantee coherency for some purposes but the majority are at best weakly coherent • When data changes they need a long time to reflect the updates • They cache data heavily and don’t update it promptly

  47. CAP Theorem • Proposed by Berkeley Professor Eric Brewer • “You can have just 2 of Consistency, Availability, Partitioning or Fault Tolerance” • He argues that consistency is the guarantee to relax • We will look at this more closely later • Many people adopt his CAP based views • This justifies non-coherent caching • But their systems can’t solve problems in ways guaranteed to be safe

  48. High assurance will need more! • Remember that we are interested in the question of how one could create a high assurance cloud! • Such a cloud needs to make promises • If a car drives on a street it must not run over people • If a power system reconfigures it must not explode the power generators • A doctor who uses a hospital computing system needs correct and current data

  49. So… we must look at coherent caching • In fact we will focus on “data replication” • Data that should be in memory, for speed • With a few shard members holding the information • But with guarantees: if you compute using it, the data is current • Why not a full database? • Our coherent replication methods would live in structures like the Facebook infrastructure: big, complex • We need to build these in ways optimized to the uses • Hence databases might be elements but we can’t just hand the whole question to a database system

  50. Summary • We looked at the architecture of a typical very large cloud computing system (Facebook) • We saw that it uses caching extremely aggressively • Caching is fairly effective, but could improve • Coherent caching needed in high assurance systems but seems to be a much harder challenge

More Related