280 likes | 400 Views
-Nagarjuna K. CORONA. What is happening in FaceBook. 1,000 people accessing (custom built-in data infrastructure) Technical & Non Technical > 500TB/day data arrival ad-hoc queries (Hive) custom MR data pipelines. What is happening in FaceBook. Largest cluster > 100PB/day
E N D
-Nagarjuna K CORONA nagarjuna@outlook.com
What is happening in FaceBook • 1,000 people accessing (custom built-in data infrastructure) • Technical & Non Technical • > 500TB/day data arrival • ad-hoc queries (Hive) • custom MR • data pipelines nagarjuna@outlook.com
What is happening in FaceBook • Largest cluster > 100PB/day • More than 60,000 queries/day • datawarehouseNow = 2500 X datawarehousepast nagarjuna@outlook.com
Limitations of Hadoop MR scheduling nagarjuna@outlook.com Job Tracker Responsibilities Managing Cluster Resources Scheduling All user Job Job Tracker unable to handle dual responsibilities adequately At Peak Load, cluster utilization dropped precipitously due to scheduling overhead.
Limitations of Hadoop MR scheduling nagarjuna@outlook.com Another problem: Pull based scheduling Task trackers provide a heartbeat status to the job tracker in order to get tasks to run. This is periodic Smaller Jobs => waste of time
Limitations of Hadoop MR scheduling nagarjuna@outlook.com Another problem: STATIC SLOT-BASED RESOURCE MANAGEMENT a MapReduce cluster is divided into a fixed number of map and reduce slots based on a static configuration. slots are wasted anytime the cluster workload does not fit the static configuration.
Limitations of Hadoop MR scheduling nagarjuna@outlook.com Another problem: Job tracker design required hard downtime (all running jobs are killed) during a software upgrade Every software upgrade resulted in significant wasted computation.
Limitations of Hadoop MR scheduling nagarjuna@outlook.com Another problem: Traditional analytic databases have advanced resource-based scheduling for a long time. Hadoop needs this.
A better Scheduling Frame Work • Better scalability and cluster utilization • Lower latency for small jobs • Ability to upgrade without disruption • Scheduling based on actual task resource requirements rather than a count of map and reduce tasks nagarjuna@outlook.com
CORONA nagarjuna@outlook.com Cluster Manager Track nodes and free resources in the cluster Job Tracker A dedicated job tracker for each and every job Client process separate process in the cluster.
CORONA nagarjuna@outlook.com Push based implementations Cluster manager gets resource requests from Job Tracker CM pushes back resource grants back to Job Tracker Job Tracker then creates tasks and pushes to task trackers for execution. No Periodic Heat-Beat. Scheduling latency is minimized.
CORONA nagarjuna@outlook.com Cluster Manager doesn’t track the progress of jobs. Cluster Manager is agnostic abt MapReduce Job Tracker takes care. Job Trackers now track one job each less code complexity With this change, Manage many jobs simultaneously Better cluster utilization
Benefits of Corona • Greater scalability • Lower Latency • No downtime upgrades • Better resource management nagarjuna@outlook.com
Some Metrics run at FB • Avg time to refill lot • During the given period, MapReduce took around 66 seconds to refill a slot, while Corona took around 55 seconds (an improvement of approximately 17%) nagarjuna@outlook.com
Some Metrics run at FB • Cluster Utilization • In heavy workloads, the utilization in the Hadoop MapReduce system topped out at 70%. Corona was able to reach more than 95%. nagarjuna@outlook.com
Some Metrics run at FB • More improvements in • Scheduling fairness • Job Latency nagarjuna@outlook.com
More about CORONA • http://goo.gl/XJRNN nagarjuna@outlook.com
Why Not YARN nagarjuna@outlook.com
Corona Usage • Storage : 1oo PB of data • Analyzes : 105Tb/30 minutes nagarjuna@outlook.com
What abtNameNode • Facebook eliminated the single point of failure in the HDFS platform using a creation it calls AvatarNode • Later on Open Source came up with HA NameNode with similar concept • More abt Avatar : • http://gigaom.com/cloud/how-facebook-keeps-100-petabytes-of-hadoop-data-online/ • https://www.facebook.com/notes/facebook-engineering/under-the-hood-hadoop-distributed-filesystem-reliability-with-namenode-and-avata/10150888759153920 nagarjuna@outlook.com
Corona : Concerns • But Facebook will soon outgrow this cluster. • Those 900 million members are perpetually posting new status updates, photos, videos, comments, and — well, you get the picture. • What if 10,000 PB ? nagarjuna@outlook.com
Solutions • What if hadoop cluster across multiple data centers. • Feasibility • Network packets couldn’t travel b/w networks so fast • Limitation with present Arch : • All the machines of the cluster shud be close enough nagarjuna@outlook.com
Solutions • Feasibility • Introducing tens of milliseconds of delay slowing down the system nagarjuna@outlook.com
Prism nagarjuna@outlook.com A single light ray => refract to multiple rays Replicates and moves data wherever it’s needed across a vast network of computing facilities Physically separate but logically same
Prism • Can move warehouses around • Not bound by limitations of the data center nagarjuna@outlook.com
Prism Status • Still in development • Not yet deployed nagarjuna@outlook.com
Time Line of this Technology • 23rd October • http://www.theregister.co.uk/2009/10/23/google_spanner/ • Google : Google Spanner — instamatic redundancy for 10 million servers? • Prism similar to Spanner ? • Very little known abtGoogle Spanner nagarjuna@outlook.com
Spanner, Facebook Prism could be used to instantly relocate data in the event of a data center meltdown. nagarjuna@outlook.com