1 / 24

Varys

Varys. Efficient Coflow Scheduling. Mosharaf Chowdhury, Yuan Zhong, Ion Stoica. UC Berkeley. Communication is Crucial. Performance. Facebook analytics jobs spend 33% of their runtime in communication 1. As in-memory systems proliferate,

maille
Download Presentation

Varys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

  2. Communication is Crucial Performance Facebook analytics jobs spend 33% of their runtime in communication1 As in-memory systems proliferate, the network is likely to become the primary bottleneck 1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’2011

  3. Optimizing Communication Performance: Networking Approach “Let systems figure it out” A sequence of packets between two endpoints Flow Independent unit of allocation, sharing, load balancing, and/or prioritization

  4. Optimizing Communication Performance: Systems Approach “Let users figure it out” # Comm. Params* 6 Spark 1.0.1 Hadoop 1.0.4 10 YARN2.3.0 20 *Lower bound. Does not include many parameters that can indirectly impact communication; e.g., number of reducers etc. Also excludes control-plane communication/RPC parameters.

  5. Optimizing Communication Performance: Systems Approach “Let users figure it out” Optimizing Communication Performance: Networking Approach “Let systems figure it out” A collection of parallel flows Completion time depends on the last flowto complete Distributed endpoints Each flow is independent

  6. Coflow1 A collection of parallel flows Completion time depends on the last flowto complete Distributed endpoints Each flow is independent 1. Coflow: A Networking Abstraction for Cluster Applications, HotNets’2012

  7. How to schedule coflows … … for faster #1 completion of coflows? … to meet #2more deadlines? 1 1 Coflow1 2 2 . . . . . . A collection of parallel flows Completion time depends on the last flowto complete Distributed endpoints Each flow is independent N N DC Fabric

  8. Varys Enables coflows in data-intensive clusters

  9. Benefits of Inter-Coflow Scheduling Coflow 2 Coflow 1 6 Units Link 2 3 Units 3-ε Units Link 1 Flow-level Prioritization1,2 The Optimal Fair Sharing L2 L2 L2 L1 L1 L1 Coflow1 comp. time = 3 Coflow2 comp. time = 6 Coflow1 comp. time = 6 Coflow2 comp. time = 6 Coflow1 comp. time = 6 Coflow2 comp. time = 6 2 2 2 6 6 6 4 4 4 time time time 1. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM’2012. 2. pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM’2013.

  10. Benefits of Inter-Coflow Scheduling Coflow 2 Coflow 1 6 Units Link 2 3 Units 3-ε Units Link 1 Flow-level Prioritization1 The Optimal Fair Sharing • Concurrent Open Shop Scheduling1 • Tasks on independent machines • Examples include job scheduling and caching blocks • Use a ordering heuristic L2 L2 L2 L1 L1 L1 Coflow1 comp. time = 3 Coflow2 comp. time = 6 Coflow1 comp. time = 6 Coflow2 comp. time = 6 Coflow1 comp. time = 6 Coflow2 comp. time = 6 2 2 2 6 6 6 4 4 4 time time time 1. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM’2012. 2. pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM’2013. 1. A note on the complexity of the concurrent open shop problem, Journal of Scheduling, 9(4):389–396, 2006

  11. Inter-Coflow Scheduling Coflow 2 Coflow 1 6 Units Link 2 3 Units 3-ε Units Link 1 Ingress Ports (Machine Uplinks) Egress Ports (Machine Downlinks) • Concurrent Open Shop Scheduling1 • Tasks on independent machines • Examples include job scheduling and caching blocks • Use a orderingheuristic 3 3 2 2 6 • 3-ε 3 1 1 DC Fabric 1. A note on the complexity of the concurrent open shop problem, Journal of Scheduling, 9(4):389–396, 2006

  12. Inter-Coflow Scheduling isNP-Hard Coflow 2 Coflow 1 6 Units Link 2 3 Units 3-ε Units Link 1 Ingress Ports (Machine Uplinks) Egress Ports (Machine Downlinks) with coupled resources • Concurrent Open Shop Scheduling • Flows on dependent links • Consider ordering and matchingconstraints ^ 3 3 2 2 6 Characterized COSS-CR Proved that list scheduling might not result in optimal solution • 3-ε 3 1 1 DC Fabric

  13. Varys Employs a two-step algorithm to minimize coflow completion times

  14. Ordering Heuristic :SEBF C1 ends C2 ends C2 ends C1 ends 4 1 1 P1 P1 P2 P2 2 2 2 4 P3 P3 3 3 3 5 9 4 9 4 Time Time Smallest- Effective- Bottleneck- First Shortest-First Narrowest-First Smallest-First

  15. Ordering Heuristic Allocation Algorithm :SEBF C1 ends C2 ends C2 ends C1 ends 4 1 1 P1 P1 P2 P2 2 2 2 4 P3 P3 3 3 3 5 9 4 9 4 Time Time Smallest- Effective- Bottleneck- First Shortest-First Narrowest-First Smallest-First

  16. Allocation Algorithm • Ensure minimum allocation to each flow for it to • finish at the • desired duration; • for example, • at bottleneck’s completion, or • at the deadline. MADD Finishing flows faster than the bottleneck cannot decrease a coflow’s completion time A coflow cannot finish before its very last flow

  17. Varys Enables frameworks to take advantage of coflow scheduling

  18. A 3000-node trace-driven simulation matched against a 100-node EC2 deployment Evaluation • Does it improve performance? • Can it beat non-preemptive solutions? YES

  19. Faster Jobs Comm. Heavy1 Comm. Improv. Job Improv. Avg. 3.16X 2.50X 1.85X 1.25X 95th 1.74X 1.15X 2.94X 3.84X 1. 26% jobs spend at least 50% of their duration in communication stages.

  20. Better than Non-Preemptive Solutions w.r.t. FIFO1 NO What About Perpetual Starvation Avg. 5.65X 95th ? 7.70X 1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’2011

  21. #1 Coflow Dependencies Multi-stage jobs Multi-wave stages • #2 • Unknown Flow Information • Pipelining between stages • Task failures and restarts Four Challenges #3 Decentralized Varys Master failure Low-latency analytics in the Context of Multipoint-to-Multipoint Coflows

  22. #4 Theory Behind “Concurrent Open Shop Scheduling with Coupled Resources”

  23. Varys Greedily schedules coflows without worrying about flow-level metrics • Consolidates network optimization of data-intensive frameworks • Improves job performance by addressing the COSS-CR problem • Increases predictability through informed admission control http://varys.net/ Mosharaf Chowdhury - @mosharaf

More Related