270 likes | 304 Views
Coflow. A Networking Abstraction For Cluster Applications. Mosharaf Chowdhury Ion Stoica. UC Berkeley. Cluster Applications. Multi-Stage Data Flows Computation interleaved with communication Computation Distributed Runs on many machines Communication Structured Between machine groups.
E N D
Coflow A Networking Abstraction For Cluster Applications Mosharaf Chowdhury Ion Stoica UC Berkeley
Cluster Applications Multi-Stage Data Flows • Computation interleaved with communication Computation • Distributed • Runs on many machines Communication • Structured • Between machine groups Driver
Communication Abstraction A Flow • Sequence of packets • Independent • Often the unit for network scheduling, traffic engineering, load balancing etc. Multiple Parallel Flows • Independent • Yet, semantically bound • Shared objective Driver Minimize Completion Time
Coflow ‘ A collection of flows between two groups of machines that are bound together by application-specific semantics Captures • Structure • Shared Objective • Semantics
We Want To… Better schedule the network • Intra-coflow • Inter-coflow Write the communication layer of a new application • Without reinventing the wheel Add unsupported coflows to an application, or Replace an existing coflow implementation • Independent of applications
Cluster Applications Coflow API The Network (Physically or Logically Centralized Controller)
Coflow API Goals • Separate intent from mechanisms • Convey application-specific semantics to the network
Coflow API Jobfinishes terminate(handle) Shuffle finishes get(handle, id) content Driver put(handle, id, content) MapReduce create(SHUFFLE) handle
Choice of algorithms • Default • WSS1 Choice of mechanism • App vs. Network layer • Pull vs. Push Coflow Flexibility reducers shuffle mappers 1. Orchestra, SIGCOMM’2011
@driver bcreate(BCAST) … put(b, id, content) … terminate(b) Coflow Flexibility reducers shuffle mappers @mapper get(b, id) … broadcast driver (JobTracker)
@driver bcreate(BCAST) screate(SHUFFLE, ord=[b ~> s]) put(b, id, content) … terminate(b) terminate(s) Coflow Flexibility reducers shuffle mappers @mapper get(b, id) put(s, ids1) … broadcast driver (JobTracker)
Throughput-Sensitive Applications After 2 seconds Minimize Completion Time
Throughput-Sensitive Applications After 4 seconds After 7 seconds After 2 seconds Minimize Completion Time
Throughput-Sensitive Applications After 7 seconds Free up resources without hurting application-perceived communication time After 2 seconds Minimize Completion Time
Latency-Sensitive Applications HotNets 2012 Top-level Aggregator Mid-level Aggregators Workers
Latency-Sensitive Applications HotNets-XI: Home Page conferences.sigcomm.org/hotnets/2012/ The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together people with interest in computer networks to engage in a lively debate ... HotNets Workshop | acm sigcomm www.sigcomm.org/events/hotnets-workshop The Workshop on Hot Topics in Networks (HotNets) was created in 2002 to discuss early-stage, creative ... HotNets-XI, Seattle, WA area, October 29-30, 2012. HotNets-XI: Call for Papers conferences.sigcomm.org/hotnets/2012/cfp.shtml The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together researchers in computer networks and systems to engage in a lively ... Coflow accepted at HotNets'2012 www.mosharaf.com/blog/2012/09/.../coflow-accepted-at-hotnets201... Sep 13, 2012 – Update: Coflow camera-ready is available online! Tell us what you think! Our position paper to address the lack of a networking abstraction for ... HotNets 2012 Top-level Aggregator Meet Deadline1,2 Mid-level Aggregators Meet Deadline1,2 Workers Limit impact to as few requests as possible 1. D3, SIGCOMM’2011 2. PDQ, SIGCOMM’2012
One More Thing… • Critical Path Scheduling • OpenTCP • Structured Streams • …
Coflow • A semantically-bound collection of flows • Conveys application intent to the network • Allows better management of network resources • Provides greater flexibility in designing applications Mosharaf Chowdhuryhttp://www.mosharaf.com/ UC Berkeley
Critical Path Scheduling Communication of a cluster application is represented by a partially-ordered set of coflows Network allocation takes place among these partially-ordered sets of coflows A A S S B S S S
Coflow API
Throughput-Sensitive Applications Job finishes Reduce Stage Shuffle finishes Local shuffle finishes Local shuffle finishes Minimize Completion Time1 Map Stage MapReduce Framework Data Flow 1. Orchestra, SIGCOMM’2011
reducers reducers Coflow Resource Allocation 1. Weights [Across Apps] Weighted sharing between coflows @driver shuffle1 create(SHUFFLE, weight=1) shuffle2 create(SHUFFLE, weight=2) … shuffle2 shuffle1 mappers mappers Job 2 Job 1
reducers reducers Strict priorities @driver shuffle1 create(SHUFFLE, pri=3) shuffle2 create(SHUFFLE, pri=5) … Coflow Resource Allocation 2. Priorities [Across Apps] shuffle2 shuffle1 mappers mappers Job 2 Job 1
reducers reducers Coflow Resource Allocation 3. Dependencies [Within Apps] finishes_before (~>) @driver bcreate(BCAST) shuffle2create(SHUFFLE, ord=[b ~> shuffle2]) agg create(AGGR, ord=[shuffle2 ~> agg]) shuffle2 shuffle1 aggregation(agg) mappers mappers broadcast (b) driver Job 2 Job 1
Coflow Resource Allocation Communication of a cluster application is represented by a partially-ordered set of coflows Network allocation takes place among these partially-ordered sets of coflows A A S S S B S S