310 likes | 1.01k Views
The Hadoop Fair Scheduler. Matei Zaharia Cloudera / Facebook / UC Berkeley. UC Berkeley. Outline. Motivation / Hadoop usage at Facebook Fair scheduler basics Configuring the fair scheduler Future plans. Motivation. Provide short response times to small jobs in a shared Hadoop cluster
E N D
The Hadoop Fair Scheduler Matei Zaharia Cloudera / Facebook / UC Berkeley UC Berkeley
Outline • Motivation / Hadoop usage at Facebook • Fair scheduler basics • Configuring the fair scheduler • Future plans
Motivation • Provide short response times to small jobs in a shared Hadoop cluster • Improve utilization over private clusters / HOD
Hadoop Usage at Facebook • Data warehouse running Hive • 600 machines, 4800 cores, 2.4 PB disk • 3200 jobs per day • 50+ engineers have used Hadoop
Facebook Data Pipeline Web Servers Scribe Servers Network Storage Hive Queries Analysts Summaries Hadoop Cluster MySQL Oracle RAC
Facebook Job Types • Production jobs: load data, compute statistics, detect spam, etc • Long experiments: machine learning, etc • Small ad-hoc queries: Hive jobs, sampling GOAL: Provide fast response times for small jobs and guaranteed service levels for production jobs
Outline • Motivation / Hadoop usage at Facebook • Fair scheduler basics • Configuring the fair scheduler • Future plans
Fair Scheduler Basics • Group jobs into “pools” • Assign each pool a guaranteed minimum share (split up among its jobs) • Split excess capacity evenly between jobs
Pools • Determined from a configurable job property • Default (before 0.20): mapred.queue.name • At Facebook: user.name (one pool per user) • Unmarked jobs go into a “default pool” • Pools have properties: • Minimum map slots • Minimum reduce slots • Limit on # of running jobs
Scheduling Algorithm • Divide each pool’s min share among its jobs • Divide excess capacity among all jobs* • When a slot needs to be assigned: • If there is any job below its min share, schedule it • Else schedule the job that we’ve been most unfair to (based on “deficit”) * Fair schedulers from Hadoop 0.20 on will share equally between pools, not jobs; patch available at https://issues.apache.org/jira/browse/HADOOP-4789
Scheduler Dashboard Reassign pool
Scheduler Dashboard Change priority
Scheduler Dashboard FIFO mode (for testing)
Additional Features • Job weights for unequal sharing: • Based on priority (each level is 2x more) • Based on size (mapred.fairscheduler.sizebasedweight) • Limits on # of running jobs: • Per user • Per pool
Outline • Motivation / Hadoop usage at Facebook • Fair scheduler basics • Configuring the fair scheduler • Future plans
Installing the Scheduler • Compile it: • ant package • Place it on the classpath: • cp build/contrib/fairscheduler/*.jar lib • Alternatively, add the JAR to HADOOP_CLASSPATH in conf/hadoop-env.sh
Configuration Files • Hadoop config (conf/hadoop-site.xml) • Contains scheduler options, pointer to pools file • Pools file (pools.xml) • Contains min share allocations and limits on pools • Reloaded every 15 seconds to allow reconfiguring pools at runtime
Minimal hadoop-site.xml <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</value> </property> <property> <name>mapred.fairscheduler.allocation.file</name> <value>/path/to/pools.xml</value> </property>
Minimal pools.xml <?xml version="1.0"?> <allocations> </allocations>
Configuring a Pool <?xml version="1.0"?> <allocations> <pool name="ads"> <minMaps>10</minMaps> <minReduces>5</minReduces> </pool> </allocations> • Any pools not configured in pools.xml will have minMaps=0 and minReduces=0
Setting Running Job Limits <?xml version="1.0"?> <allocations> <pool name="ads"> <minMaps>10</minMaps> <minReduces>5</minReduces> <maxRunningJobs>3</maxRunningJobs> </pool> <user name="matei"> <maxRunningJobs>1</maxRunningJobs> </user> </allocations>
Default Jobs Limit for Users <?xml version="1.0"?> <allocations> <pool name="ads"> <minMaps>10</minMaps> <minReduces>5</minReduces> <maxRunningJobs>3</maxRunningJobs> </pool> <user name="matei"> <maxRunningJobs>1</maxRunningJobs> </user> <userMaxJobsDefault>10</userMaxJobsDefault> </allocations>
Other hadoop-site.xml Properties mapred.fairscheduler.assignmultiple: • Assign a map and reduce on each heartbeat; improves ramp-up speed and throughput; recommendation: set to true
Other hadoop-site.xml Properties mapred.fairscheduler.poolnameproperty: • Which jobconf property to use to determine what pool a job is in • Default: mapred.queue.name (queue name) • Another useful option: user.name • Can also make up your own, e.g. “project”
Other hadoop-site.xml Properties mapred.fairscheduler.weightadjuster: • Allows modifying job weights through a plugin class; one useful example is provided – a new job booster to let short jobs finish faster: Please see README for details <property> <name>mapred.fairscheduler.weightadjuster</name> <value>org.apache.hadoop.mapred.NewJobWeightBooster</value> </property>
Outline • Motivation / Hadoop usage at Facebook • Fair scheduler basics • Configuring the fair scheduler • Future plans
Future Plans • Share equally between pools, not jobs (Hadoop 0.20 release, HADOOP-4789) • Preemption if a job is starved of its min or fair share for some timeout (HADOOP-4665) • Locality wait optimization (HADOOP-4667)
Future Plans • Simpler scheduling model (HADOOP-4803) • FIFO pools (HADOOP-4803, HADOOP-5186) • Delayed job initialization (HADOOP-5186) • Scalability and operational improvements
Thanks! • The Fair Scheduler is available in Hadoop 0.19; docs in src/contrib/fairscheduler/README • Hadoop 0.17 and 0.18 versions at http://issues.apache.org/jira/browse/HADOOP-3746 matei@cloudera.com