1 / 58

Scalable Data Management@facebook

Scalable Data Management@facebook. Srinivas Narayanan 11/13/09. Scale. Over 300 million active users. > 3.9 trillion feed actions processed per day. >200 billion monthly page views. 100 million search queries per day. Over 1 million developers in 180 countries.

jesse
Download Presentation

Scalable Data Management@facebook

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Data Management@facebook • Srinivas Narayanan • 11/13/09

  2. Scale

  3. Over 300 million active users > 3.9 trillion feed actions processed per day >200 billion monthly page views 100 million search queries per day Over 1 million developers in 180 countries #2 site on the Internet (time on site) More than 232 photos… 2 billion pieces of content per week 6 billion minutes per day

  4. 300M Growth Rate 2009 Active Users

  5. Social Networks

  6. The social graph links everything

  7. Scaling Social Networks • Much harder than typical websites where... • Typically 1-2% online: easy to cache the data • Partitioning & scaling relatively easy • What do you do when everything is interconnected?

  8. name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, video thumbnail name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo name, status, privacy, profile photo

  9. System Architecture

  10. Database (slow, persistent) Load Balancer (assigns a web server) Web Server (PHP assembles data) Memcache (fast, simple) Architecture

  11. Memcache • Simple in-memory hash table • Supports get/set,delete,multiget, multiset • Not a write-through cache • Pros and Cons • The Database Shield! • Low latency, very high request rates • Can be easy to corrupt, inefficient for very small items

  12. Memcache Optimization • Multithreading and efficient protocol code - 50k req/s • Polling network drivers - 150k req/s • Breaking up stats lock - 200k req/s • Batching packet handling - 250k req/s • Breaking up cache lock - future

  13. Network Incast Memcache Memcache Memcache Memcache Switch Many SmallGet Requests PHP Client

  14. Network Incast Memcache Memcache Memcache Memcache Switch Many bigdata packets PHP Client

  15. Network Incast Memcache Memcache Memcache Memcache Switch PHP Client

  16. Network Incast Memcache Memcache Memcache Memcache Switch PHP Client

  17. Memcache Clustering Many small objects per server Many servers per large object Many small objects per server Many servers per large object

  18. Memcache Clustering Memcache 10 Objects PHP Client

  19. Memcache Clustering Memcache Memcache 5 Objects 5 Objects 2 round trips total1 round trip per server PHP Client

  20. Memcache Clustering Memcache Memcache Memcache 4 Objects 3 Objects 3 Objects • 3 round trips total1 round trip per server PHP Client

  21. Memcache Pool Optimization • Currently a manual process • Replication for obvious hot data sets • Interesting problem: Optimize the allocation based on access patterns

  22. Vertical Partitioning of Object Types Specialized Replica 1 Specialized Replica 2 Shard 1 Shard 1 Shard 2 Shard 2 General pool with wide fanout Shard 1 Shard 2 Shard 3 Shard n ...

  23. Scribe Scribe Scribe Scribe Scribe Scribe Scribe Scribe Scribe MySQL has played a role from the beginning • Thousands of MySQL servers in two datacenters

  24. MySQL Usage • Pretty solid transactional persistent store • Logical migration of data is difficult • Logical-Physical db mapping • Rarely use advanced query features • Performance • Database resources are precious • Web tier CPU is relatively cheap • Distributed data - no joins! • Sound administrative model

  25. MySQL is better because it is Open Source • We can enhance or extend the database • ...as we see fit • ...when we see fit • Facebook extended MySQL to support distributed cache invalidation for memcache INSERT table_foo (a,b,c) VALUES (1,2,3) MEMCACHE_DIRTY key1,key2,...

  26. SC Web Memcache Proxy SF Web East Coast Memcache Proxy MySql replication VA Web SC Memcache SF Memcache VA Memcache Memcache Proxy SC MySQL VA MySQL Scaling across datacenters West Coast

  27. Other Interesting Issues • Application level batching and parallelization • Super hot data items • Cachekey versioning with continuous availability

  28. Photos

  29. Photos + Social Graph = Awesome!

  30. Photos: Scale • 20 billion photos x4 = 80 billion • Would wrap around the world more than 10 times! • Over 40M new photos per day • 600K photos / second

  31. Photos Scaling - The easy wins • Upload tier - handles uploads, scales images, stores on NFS • Serving tier: Images served from NFS via HTTP • However... • File systems are not good at supporting large number of files • Metadata too large to fit in memory causing too many IOs for each file read • Limited by I/O not storage density • Easy wins • CDN • Cachr (http server + caching) • NFS file handle cache

  32. Photos: Haystack Overlay file system Index in memory One IO per read

  33. Data Warehousing

  34. Data: How much? • 200GB per day in March 2008 • 2+TB(compressed) raw data per day in April 2009 • 4+TB(compressed) raw data per day today

  35. The Data Age • Free or low cost of user services • Consumer behavior hard to predict • Data and analysis are critical • More data beats better algorithms

  36. Deficiencies of existing technologies • Analysis/storage on proprietary systems too expensive • Closed systems are hard to extend

  37. Hadoop & Hive

  38. Hadoop • Superior availability/scalability/manageability despite lower single node performance • Open system • Scalable costs • Cons: Programmability and Metadata • Map-reduce hard to program (users know sql/bash/python/perl) • Need to publish data in well known schemas

  39. Hive • A system for managing and querying structured data built on top of Hadoop • Components • Map-Reduce for execution • HDFS for storage • Metadata in an RDBMS

  40. Hive: New Technology, Familiar Interface • hive> select key, count(1) from kv1 where key > 100 group by key; • vs. • $ cat > /tmp/reducer.sh • uniq -c | awk '{print $2"\t"$1}‘ • $ cat > /tmp/map.sh • awk -F '\001' '{if($1 > 100) print $1}‘ • $ bin/hadoop jar contrib/hadoop-0.19.2-dev-streaming.jar -input /user/hive/warehouse/kv1 -mapper map.sh -file /tmp/reducer.sh -file /tmp/map.sh -reducer reducer.sh -output /tmp/largekey -numReduceTasks 1 • $ bin/hadoop dfs –cat /tmp/largekey/part*

  41. Hive: Sample Applications • Reporting • E.g.,: Daily/Weekly aggregations of impression/click counts • Measures of user engagement • Ad hoc Analysis • E.g.,: how many group admins broken down by state/country • Machine Learning (Assembling training data) • Ad Optimization • E.g.,: User Engagement as a function of user attributes • Lots More

  42. Hive: Server Infrastructure • 4800 cores, Storage capacity of 5.5 PetaBytes, 12 TB per node • Two level network topology • 1 Gbit/sec from node to rack switch • 4 Gbit/sec to top level rack switch

  43. Hive & Hadoop: Usage Stats • 4 TB of compressed new data added per day • 135TB of compressed data scanned per day • 7500+ Hive jobs on per day • 80K compute hours per day • 200 people run jobs on Hadoop/Hive • Analysts (non-engineers) use Hadoop through Hive • 95% of jobs are Hive Jobs

  44. Hive: Technical Overview

  45. Hive: Open and Extensible • Query your own formats and types with your own Serializer/Deserializers • Extend the SQL functionality through User Defined Functions • Do any non-SQL transformations through TRANSFORM operator that sends data from Hive to any user program/script

  46. Hive: Smarter Execution Plans • Map-side Joins • Predicate Pushdown • Partition Pruning • Hash based Aggregations • Parallel execution of operator trees • Intelligent Scheduling

  47. Hive: Possible Future Optimizations • Pipelining? • Finer operator control (controlling sorts) • Cost based optimizations? • HBase

  48. Spikes: The Username Launch

  49. System Design • Database tier cannot handle the load • Dedicated memcache tier for assigned usernames • Miss => Available • Avoid database hits altogether • Blacklists: bucketize, local tier cache • timeout

  50. Username Memcache Tier • Parallel pool in each data center • Writes replicated to all nodes • 8 nodes per pool • Reads can go to any node (hashed by uid) PHP Client ... UN0 UN1 UN7 Username Memcache

More Related