EventWave : Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications

EventWave: Programming Model and Runtime Support forTightly-Coupled Elastic Cloud Applications Wei-Chiu Chuang, Bo Sang, Sunghwan Yoo, Rui Gu, Charles Killian, Milind Kulkarni

Motivation clients world building building room room server room room # clients Response time Time

Motivation Scale up clients Elasticity is hard server # clients Response time Time

Objectives A programming model which supports: • Statefulcomputation Transparent elasticity Simple sequential semantics

Related Work MapReduce [Dean et. al. OSDI ‘04] stateless No simple sequential semantics Data Flow Dryad [Isard et. al. EuroSys ‘07] No Stateful Computation No transparent elasticity CIEL [Murray et. al. NSDI ‘11] Live Migration of Virtual Machines [Clark et. al. NSDI ‘05] Does not change scale: “split”/”merge” state Zephyr [Elmore et. al. SIGMOD ‘11] Live Migration Transactional, reconcile conflicts Orleans [Bykov et. al. SoCC ‘11] Scalable programming model

Event Driven Systems client Event queue server Event 1 commits Event 2 commits Event 3 commits 1 2 3 • Typical event driven systems are not scalable.

Context • Scalability comes from parallelism • Partition program state into `contexts` • An event accesses one or more contexts • Events accessing disjoint contexts can run in parallel world world building building building room room room Contexts enable implicit parallelism hallway room room

EventWave Event 1 commits Event 1 finishes • Stateful • Sequential semantics • Parallelism Context 1 Event 2 commits Event 2 finishes Context 2 3 2 1 Context 3 Event 3 finishes Enforce sequential ordering Event 2 can not commit until Event 1 commits Event 3 commits

Access Multiple Contexts • A player can move from one room to another • Remove it from source room • Insert it into destination room An event may access multiple contexts world Player list building building Room 1 Room 2 room room Bob Bob Alice Bob room room

Access Multiple Contexts • Must ensure • Sequential semantics • parallelism To be scalable, events can not access contexts arbitrarily Event 2 can’t start before event 1 finishes 2 1 Context 1 Event 2 commits Context 2 Event 1 finishes Context 3

Hierarchical Contexts • Contexts are not completely independent • The world has many buildings • A building has many rooms world Building<1> Building<2> Room<1> Room<1> Room<2> Room<2>

Wave of Events • Must access contexts from top to bottom The hierarchical access enables parallelism world Building<1> Building<2> Room<1> Room<1> Room<2> Room<2>

Wave of Events • Move a player from room 1 to room 2 world Building<1> Building<2> Room<1> Room<1> Room<2> Room<2>

Wave of Events Allow the next event to access Building<1> world Building<1> Building<2> Room<1> Room<1> Room<2> Room<2>

Wave of Events world Building<1> Building<2> Enter Room<1> Room<1> Room<1> Room<2> Room<2>

Wave of Events world Building<1> Building<2> Remove player Release exclusive access Room<1> Room<1> Room<2> Room<2>

Wave of Events world Building<1> Building<2> Enter Room<2> Room<1> Room<1> Room<2> Room<2>

Wave of Events world Event finishes, releasing all contexts Building<1> Building<2> Insert player Room<1> Room<1> Room<2> Room<2>

Wave of Events world Building<1> Building<2> Room<1> Room<1> Room<2> Room<2>

Wave of Events Event commits, releasing snapshot world Building<1> Building<2> Room<1> Room<1> Room<2> Room<2>

Distributed Execution • Scale more by executing events across multiple nodes • Map contexts • Head node world Head node Building<1> Building<2> Room<1> Room<3> Room<2> Room<4>

Distributed Execution • Logical Node: a set of physical nodes Server Logical Node Client Logical Node #1 Client Logical Node #2

Distributed Execution

Elasticity Request nodes from cloud scheduler Update context mapping world Building<2> Building<1> Room<1> Room<1> Room<2> Room<2>

Elasticity Transfer contexts to the new node world Building<2> Building<1> Room<1> Room<1> Room<2> Room<2>

Evaluation In the paper Key-value store Does it scale? • Microbechmarks • Scalability What is the cost of migration? • Microbechmarks • Migration latency Case study • Multi-player game server

Microbenchmark-Scalability Setup • One logical node, fixed context mapping • EC2 Small Instances • 1 vCPU, 1.7GB RAM, 160 GB local disk • Distribute 160 contexts to physical nodes Measures • Throughput

Microbenchmark-Scalability Takeaway: Throughput grows w.r.t. # of nodes P: workload

Microbenchmark-Migration Latency Setup • 2 x 8-core 2.0 GHz Xeon, 8GB RAM • 1Gb Ethernet connection Scale does not change

Microbenchmark-Migration Latency Migrate a 100MB context The migration event commits Measure • Throughput of events Finished events must wait for migration event

Multi-player Game Server Setup • Server logical node • 1 x Extra Large Instance (head) • 64 x Small Instances • Client logical nodes • 128 clients on 16 EC2 Small Instances Measure • Latency

Multi-player Game Server Synthetic workload Server contexts spread to 64 physical nodes Server contexts merge to 1 physical nodes

Conclusion • Elasticity is crucial for cloud applications. • Our programming model enables transparent elasticity for tightly-coupled applications • Case studies show EventWave is efficient http://www.macesystems.org

Backups

Language Construct state_variables{ Hallway hw; vector<Room> rooms; } context Hallway{ int x; } context Room<int>{ inty; } Declare implicit parallelism Mace [Killian et. al. PLDI ‘07] Hallway Hallway Room[0] Room[1] … Room<0> Room<1>

Event Handler upcall deliver(Message m){ } Annotation Specify what context to access Message(roomID = 2) upcall[Room<m.roomID>]deliver(Message m){ } Context Room<2>

Key-value store Setup • 2 x 8-core 2.0 GHz Xeon, 8GB RAM • 1Gb Ethernet connection Measure • Latency

Key-value store

Microbenchmark-Migration Latency Setup • 2 x 8-core 2.0 GHz Xeon, 8GB RAM • 1Gb Ethernet connection Scale does not change Context

Context Migration Update context-node mapping Head Event 3 goes to the new node 1 3 M Event 1 goes to the old node Old node Copy context state New node Replicate context state

EventWave : Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications

EventWave : Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications

Presentation Transcript

Applications and Runtime for multicore/manycore

Communication in Tightly Coupled Systems

Tightly-Coupled Opportunistic Navigation for Deep Urban and Indoor Positioning

Variability-Tolerance in Tightly-Coupled Parallel Computing Units

Securing Elastic Applications on Mobile Devices for Cloud Computing

Designing Energy Efficient Communication Runtime Support for Data Centric Programming Models

Programming Model Support for Dependable, Elastic Cloud Applications

Fault-Tolerant Communication Runtime Support for Data-Centric Programming Models

Elastic Applications in the Cloud

Variation-Tolerant OpenMP Tasking on Tightly-Coupled Processor Clusters

Tightly-Coupled Multi-Layer

Simulation of Tightly Coupled INS/GPS Navigator

Compiler (and Runtime) Support for CyberInfrastructure

Communication Libraries for Parallel-Programming-Model Runtime Systems

GridSuperscalar A programming model for GRID applications

Platforms for HPJava: Runtime Support for Scalable Programming in Java

AWS Elastic Cloud Computing

GridSuperscalar A programming model for GRID applications

Applications and Runtime for multicore/manycore

Platforms for HPJava: Runtime Support for Scalable Programming in Java

Tightly-Coupled Multi-Layer