Fault Tolerant Video-On-Demand Services

Fault Tolerant Video-On-Demand Services Tal Anker, Danny Dolev, Idit Keidar, The Transis Project

VoD Service VoD Service provider Movies disk(s) Video Stream Requests • VoD: Full VCR control • 1 video stream per client Client C1

High Availability • Multiple servers • at different sites • Fault tolerance: • servers can crash • Managing the load: • new servers can be brought up / down • load should be re-distributed “on the fly” • migration of clients

Failed Server Client C2 Client C2 server server VoD Service server VoD Service server server server Client C1 Client C1 The challenges • Low overhead • Transparency • How do clients know whom to connect to? • “abstract” service • Clients should be unaware of migration

Buffer Management andFlow Control • Overcome jitter, message re-ordering and migration periods • Re-fill buffers quickly after migration • avoid buffer overflow • Minimize buffers • minimize pre-fetch bandwidth • Dynamically adjust transmission rate to client capabilities • Re-negotiation of QoS

Features of our solution • Use group communication in the control plane • connection establishment • fault tolerance and migration • Flow control explicitly handles migration • Low overhead • ~1/1000 of the bandwidth • Negligible memory and CPU overhead • Commodity hardware and publicly available network technologies

Environment • Implementation • UDP/IP over 10 Mbit/s switched ethernet • Transis • Sun Sparc and BSDI PC’s as video servers • Win NT machines as video clients • MPEG1 & 2 hardware decoders • Machine and Network Failures

Implementing the abstract service • Use group communication • clients communicate with a well known group name (logical entity) • unaware of the number and identity of the servers in the group • Servers periodically share information about clients (every 1/2sec) • If a server crashes (or is overloaded), another server transparently takes over

Group Communication • Reliable Group Multicast (Group Abstraction) • Message Ordering • Dynamic Reconfiguration • Membership with Strong Semantics (Virtual Synchrony) Systems: Transis, Horus, Ensemble, Totem, Newtop, RMP, ISIS, Psync, Relacs

The group layout of the VoD service

Transis Allows Simple Design Group abstractionforconnection establishment and transparent migration Reliable group multicast allows servers to consistently share information Membership services detects conditions for migration Reliable messages for control • Server takes ~2500 C++ code lines • Client takes ~4000 C code lines (excluding GUI and display)

Flow Control • Feedback based flow-control (sparse): • FC messages are sent to the logical server (session group) • Clients determines the changes in the flow:

Emergency Flow Control • When the server receives an emergency message: • The server change the fps rate: fps = latest-known-fps + emergency quantity • The emergency quantity decays every second (by a factor) • While the quantity is above zero, the server ignores FC messages from the client

Performance Measurements • On HUJI Network (LAN) • Servers at TAU and clients at HUJI (WAN) • The measurements show the system is robust and support our transparency claims

Software Buffers

Hardware Buffers

Skipped Frames on LAN

Skipped Frames on WAN

Summary • Scalable VoD service • Load balancing • Tolerating machine and network failure • All the above are achieved practically for free: • ~1/1000 of the total bandwidth • Negligible memory and CPU overhead

Thanks to ... • Gregory Chockler • The other members of the Transis project

Fault Tolerant Video-On-Demand Services

Fault Tolerant Video-On-Demand Services

Presentation Transcript

Fault Tolerant and Resilient Web Services

Fault-Tolerant Broadcast

Fault-Tolerant Broadcast

Fault-Tolerant CORBA

FAULT TOLERANT CORBA

Fault Tolerant MPI

Fault-Tolerant Consensus

Fault Tolerant Backplane

Fault Tolerant Configuration

Making Services Fault Tolerant

FAULT-TOLERANT COMPUTING

FAULT-TOLERANT COMPUTING

P2VoD: Providing Fault Tolerant Video-on-Demand Streaming in Peer-to-Peer Environment

Fault-tolerant Control

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

fault-tolerant

Fault-tolerant routing

Fault-Tolerant Consensus

Fault-Tolerant Broadcast