1 / 22

Spark and YARN: Better Together

Dive into the better management of Apache Spark clusters on YARN, focusing on optimizing resource usage, container sizing, dynamic resource allocation, and resilience to failures. Learn about CPU scheduling, label-based scheduling, and accessing Kerberized environments for improved performance and debugging ease.

gretac
Download Presentation

Spark and YARN: Better Together

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SparkandYARN:BetterTogether • JerryShao, JeffZhang • sshao@hortonworks.com, jzhang@hortonworks.com • August6,2016

  2. JerryShao • MemberofTechnicalStaff@Hortonworks • FocusonOpenSourceBigDataarea • ActiveApacheSparkContributor JeffZhang • MemberofTechnicalStaff@Hortonworks • ApachePig/TezCommitterand Spark Contributor • Open source evangelist

  3. SparkonYARNRecap

  4. Overview of SparkCluster Executor Executor Executor ClusterManager Driver

  5. SparkRunningOnYARN (Client Mode) ResourceManager NM NM NM Container Container Container Container Executor AM Executor Executor Driver Client

  6. SparkRunningOnYARN (Cluster Mode) ResourceManager NM NM NM Container Container Container Executor Executor Executor NM Container AM Client Driver

  7. DifferenceComparedtoOtherCluster Manager • Applicationhastobesubmittedintoaqueue • Jars/files/archivesaredistributedthroughdistributedcache • AdditionalApplicationMaster • …

  8. BetterRunSparkOnYARN

  9. WhatDoWeConcernAbout? • Betterusetheresources • Betterrunoncluster • Easytodebug

  10. CalculateContainerSize • WhatisthesizeofaContainer? • Memory • CPU# spark.executor.memory spark.memory.fraction(0.75) spark.memory.storageFraction (0.5) execution memory spark.yarn.executor.memoryOverhead(offheap) spark.memory.offHeap.size containermemory = spark executor memory + overhead memory yarn.scheduler.minimum-allocation-mb<= container memory <= yarn.nodemanager.resource.memory-mb container memory will be round toyarn.scheduler.increment-allocation-mb #DependonCPUschedulingisenabledornot

  11. Calculate Container Size (Cont’d) • Enable CPU Scheduling • Capacity Scheduler with DefaultResourceCalculator (default) • Only takes memory into account • CPU requirements are ignored when carrying out allocations • The setting of “--executor-cores” is controlled by Spark itself • Capacity Scheduler with DominantResourceCalculator • CPU will also take into account when calculating • Containervcores=executorcores container cores <= nodemanger.resource.cpu-vcores

  12. Isolate Container Resource Containers should only be allowed to use resource they get allocated, they should not be affected by other containers on the node • How do we ensure containers don’t exceed their vcore allocation? • What’s stopping an errant container from spawning bunch of threads and consume all the CPU on the node? CGroups With the setting of LinuxContainerExecutor and others, YARN could enable CGroups to constrain the CPU usage (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html).

  13. LabelBasedScheduling Howtospecifyapplicationstorunonspecificnodes? • Labelbasedschedulingiswhatyouwant. • Touseit: • EnablenodelabelandlabelschedulinginYARNside(Hadoop2.6+) • ConfigurenodelabelexpressioninSparkconf: • spark.yarn.am.nodeLabelExpression • spark.yarn.executor.nodeLabelExpression

  14. DynamicResourceAllocation Howtousetheresourcemoreeffectivelyandmoreresiliently? • Sparksupportsdynamicallyrequestingorreleasingexecutorsaccordingtothecurrentloadofjobs. • Thisisespeciallyusefulforlong-runningapplicationslikeSparkshell,ThriftServer,Zeppelin. • ToEnableDynamicResourceAllocation spark.streaming.dynamicAllocation.enabledtrue spark.shuffle.service.enabledtrue <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property>

  15. ResilienttoServiceRestart/Failure • ResilienttoResourceManagerrestart/failure • RMisasinglepointoffailure • Configuringyarn.resourcemanager.ha.enabledtoenableRMHA • Enableyarn.resourcemanager.recovery.enabledtoberesilienttoRMchangeorrestart • NonworkpreservingRMrestart(Hadoop2.4) • WorkpreservingRMrestart(Hadoop2.6) • ResilienttoNodeManagerrestart • Enableyarn.nodemanager.recovery.enabledtoberesilienttoNMrestart(Hadoop2.6)

  16. AccessKerberizedEnvironment • Spark on YARN supports accessingKerberized Hadoop environment. • It will automatically get tokens from NN if the Hadoop environment is security enabled. • To retrieve the delegation tokens for non-HDFS services when security is enabled, configure spark.yarn.security.tokens.{hive|hbase}.enabled. • You could also specify --principal and --keytabto let Spark on YARN to do kinit and token renewal automatically, this is useful for long running service like Spark Streaming.

  17. Fast Debug YARN Application • What we usually meet when running Spark on YARN? • Classpath problem • Java parameters does not work • Configuration doesn‘ttakeaffect. • yarn.nodemanager.delete.debug-delay-sec • ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}/container_${contid}

  18. FutureworksofSparkOnYARN

  19. More Advanced Dynamic Resource Allocation • The problem of current algorithm: • Current algorithm is based on the load of tasks, the more tasks submitted the more executor will be requested. This will introduce resource starvation for other applications. • Current algorithm is an universal algorithm doesn't consider the specific features of cluster manager. • To improve current dynamic resource allocation: • Make current algorithm pluggable, to be adapted to customized algorithms. • Incorporate more YARN specific features to improve the current algorithm • Container resizing • Cooperative preemption • …

  20. Integrate YARN Application Timeline Service • Timeline Service - Storage and retrieval of applications' current as well as historic information in a generic fashion is solved in YARN through the Timeline Server. This serves two responsibilities: • Generic information about completed applications • Per-framework information of running and completed applications • We’re working on integrating Spark’s history server with YARN application timeline service. • Already released with HDP,continueimprovingATSv2intheyarnsidetobringinbetterperformanceandstability.

  21. Pluggable Security Services • CurrentProblems: • HowtoaccesssecurityservicesotherthanHDFS,HBaseandHive • HowtosupportsecuritykeysotherthanDelegationToken • Toimprovecurrenttokenmanagementmechanism: • Provideaconfigurablesecuritycredentialmanager • Makesecuritycredentialproviderpluggablefordifferentservices. • SPARK-14743

  22. Thanks

More Related