430 likes | 545 Views
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems. Presented by: Raymond Leung Wai Tak Supervisor: Prof. Jack Lee Yiu-bun Department of Information Engineering The Chinese University of Hong Kong. Contents. 1. Introduction 2. Challenges
E N D
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Presented by: Raymond Leung Wai Tak Supervisor: Prof. Jack Lee Yiu-bun Department of Information Engineering The Chinese University of Hong Kong
Contents • 1. Introduction • 2. Challenges • 3. Server-less Architecture • 4. Reliability Analysis • 5. Performance Modeling • 6. System Dimensioning • 7. Multiple Parity Groups • 8. Conclusion
1. Introduction • Traditional Client-server Architecture • Clients connect to server and request for video • Server capacity limits the system capacity • Cost increases with system scale
1. Introduction • Server-less Architecture • Motivated by the availability of powerful user devices • Each user node (STB) serves both as a client and as a mini-server • Each user node contributes to the system • Memory • Processing power • Network bandwidth • Storage • Costs shared by users
1. Introduction • Architecture Overview • Composed of clusters
2. Challenges • Video Data Storage Policy • Retrieval and Transmission Scheduling • Fault Tolerance • Distributed Directory Service • Heterogeneous User Nodes • System Adaptation – node joining/leaving
3. Server-less Architecture • Storage Policy • Video data is divided into fixed-size blocks (Q bytes) • Data blocks are distributed among nodes in the cluster (data striping) • Low storage requirement and load balancing • Capable of fault tolerance using redundant blocks (discussed later)
3. Server-less Architecture • Retrieval and Transmission Scheduling • Round-based scheduler • Grouped Sweeping Scheduling1 (GSS) • Composed of macro rounds and micro rounds • Tradeoff between disk efficiency and buffer requirement 1P.S. Yu, M.S. Chen & D.D. Kandlur, “Grouped Sweeping Scheduling for DASD-based Multimedia Storage Management”, ACM Multimedia Systems, vol. 1, pp. 99 –109, 1993
3. Server-less Architecture • Retrieval and Transmission Scheduling • Data retrieved in current micro round will be transmitted immediately in next micro round • Each retrieval block is divided into b transmission blocks for transmission • Transmission block size: • Transmission lasts for one macro round
3. Server-less Architecture • Retrieval and Transmission Scheduling • Macro round length • Defined as the time required by all nodes transmitting one retrieval block • Number of requests served: N • Macro round length: • Micro round length • Each macro round is divided into g micro rounds • Number of requests served: N/g • Micro round length:
3. Server-less Architecture • Modification in Storage Policy • As the retrieval blocks are divided into transmission blocks for transmission • Video data is striped across transmission blocks, instead of retrieval blocks
3. Server-less Architecture • Fault Tolerance • Recover from not a single node failure, but multiple simultaneously node failures as well • Redundancy by Forward Error Correction (FEC) Code • e.g. Reed-Solomon Erasure Code (REC)
3. Server-less Architecture • Impact of Fault Tolerance on Block Size • Tolerate up to h simultaneous failures • To maintain the same amount of video data transmitted in each macro round, the block size is increased to Qr. • Similarly, the transmission block size is increased to Ur.
4. Reliability Analysis • Reliability Analysis • Find out the system mean time to failure (MTTF) • Assuming independent node failure/repair rate • Tolerate up to h failures by redundancy • Analysis by Markov chain model
4. Reliability Analysis • Reliability Analysis • With the assumption of independent failure and repair rate • Let Ti be the expected time the system takes to reach state h+1 from state i
4. Reliability Analysis • Reliability Analysis • By solving the above set of equations, the system MTTF (T0) is • With a target system MTTF, we can find the redundancy (h) required
4. Reliability Analysis • Redundancy Level • Defined as the proportion of nodes serving redundant data (h/N) • Redundancy level versus number of nodes on achieving the target system MTTF
5. Performance Modeling • Storage Requirement • Network Bandwidth Requirement • Buffer Requirement • System Response Time • Assumptions: • Zero network delay • Zero processing delay • Bounded clock jitters among nodes
5. Performance Modeling • Storage Requirement • Let SA be the combined size of all video titles to be stored in the cluster • With redundancy h, additional storage is required • The storage requirement per node (SN)
5. Performance Modeling • Bandwidth Requirement • Assume video bitrate of Rv bps • Without redundancy, each node transmits (N1) streams of video data to other nodes in the cluster, • Each stream consuming a bitrate of Rv/N bps • With redundancy h, additional bandwidth is required • The bandwidth requirement per node (CR)
5. Performance Modeling • Buffer Requirement • Composed of sender buffer requirement and receiver buffer requirement • Sender Buffer Requirement • Under GSS scheduling
5. Performance Modeling • Receiver Buffer Requirement • Store the data temporarily before playback • Absorb the deviations in data arrival time caused by clock jitter • Total Buffer Requirement • One data stream is for local playback rather than transmission • Buffer sharing for this local playback stream • Subtract b buffer blocks of size Ur from the receiver buffer
5. Performance Modeling • System Response Time • Time required from sending out request to playback begins • Scheduling delay + pre-fetch delay • Scheduling delay under GSS • Time required from sending out request to data retrieval starts • Can be analyzed using urns model • Detailed derivation available in Lee’s work2 2Lee, J.Y.B., “Concurrent push-A scheduling algorithm for push-based parallel video servers”, IEEE Transactions on Circuits and Systems for Video Technology, Volume: 9 Issue: 3 , April 1999, Page(s): 467 -477
5. Performance Modeling • Prefetch delay • Time required from retrieving data to playback begins • One micro round to retrieve a data block and buffering time to fill up the prefetch buffer of the receiver • Additional delay will be incurred due to clock jitter among nodes
6. System Dimensioning • Storage Requirement • What is the minimum number of nodes required to store a given amount of video data? • For example: • video bitrate: 4 Mb/s • video length: 2 hours • storage required for 100 videos: 351.6 GB • If each node can allocate 2 GB for video storage, then • 176 nodes are needed (without redundancy); or • 209 nodes are needed (with 33 nodes added for redundancy) • This sets the lower limit on the cluster size
6. System Dimensioning • Network Capacity • How many nodes can be connected given a certain network switching capacity? • For example: • video bitrate: 4 Mb/s • If the network switching capacity is 32Gbps, and assume 60% utilization • up to 2412 nodes (without redundancy) • Network switching capacity is not a bottleneck
– maximum round service time -- fixed overhead – maximum seek time for k requests W-1 – rotational latency rmin – minimum transfer rate Qr – data block size 6. System Dimensioning • Disk Access Bandwidth • Determine the value of Q and g to evaluate the buffer requirement and the system response time • Finite disk access bandwidth limits the value of Q and g • Disk Model on Disk Service Time • Time required to retrieve data blocks for transmission • Depends on seeking overhead, rotational latency and data block size • Suppose k requests per GSS group • The maximum service time in worst case scenario
6. System Dimensioning • Constraint for Smooth Data Flow • Disk service round to be finished before transmission • Disk service time shorter than micro round length
6. System Dimensioning • Buffer Requirement • Decreasing block size (Qr) and increasing number of groups (g) to achieve minimum system response time, provided that the smooth data flow constraint is satisfied
6. System Dimensioning • System Response Time • System response time versus number of nodes in the cluster
6. System Dimensioning • Scheduling Delay • Relatively constant while system scales up • Prefetch Delay • Time required to receive the first group of blocks from all nodes • Increases linearly with system scale – not scalable • Ultimately limits the cluster size • What is the Solution? • Multiple parity groups
7. Multiple Parity Groups • Primary Limit in Cluster Scalability • Prefetch delay in system response time • Multiple Parity Groups • Instead of single parity group, the redundancy is encoded with multiple parity groups • Decrease the number of blocks required to receive before playback • Playback begins after receiving the data of first parity group • Reduce the prefetch delay
7. Multiple Parity Groups • Multiple Parity Groups • Transmission of different parity groups are staggered
7. Multiple Parity Groups • Impact on Performance • Buffer requirement • System response time • Redundancy requirement • Buffer Requirement • The number of blocks within same parity group is reduced • Receiver buffer requirement is reduced
7. Multiple Parity Groups • System Response Time • Playback begins after receiving the data of first parity group • System response time is reduced
7. Multiple Parity Groups • Redundancy Requirement • Cluster is divided into parity groups with less number of nodes • Higher redundancy level to maintain the same system MTTF • Tradeoff between response time and redundancy level
7. Multiple Parity Groups • Performance Evaluation • Buffer requirement and system response versus redundancy level at a cluster size of 1500 nodes • Both system response time and buffer requirement decrease with more redundancy (i.e. more parity groups)
7. Multiple Parity Groups • Cluster Scalability • What are the system configurations if the system a. achieves a MTTF of 10,000 hours, and b. keeps under a response time constraint of 5 seconds, and c. keeps under a buffer requirement of 8/16 MB?
7. Multiple Parity Groups • Cluster Scalability • The cluster is divided into more parity groups if it exceeds either • the response time constraint, or • the buffer constraint • The redundancy level keeps relatively constant as the increased cluster size results in improved redundancy efficiency that compensates for the increased redundancy overhead incurred by the multiple parity group scheme (eg. 16 MB buffer constraint)
7. Multiple Parity Groups • Shifted bottleneck in Cluster Scalability • Transmission buffer increases linearly with cluster scale and cannot be reduced by multiple parity group scheme • The system is forced to divided into more parity groups to reduce the receiver buffer requirement to stay within the buffer constraint • The redundancy overhead is sharply increased and the system response system is sharply reduced (eg. 8 MB buffer constraint) • Eventually the total buffer requirement exceeds the buffer constraint even the cluster is further divided into more parity groups • Scalability Bottleneck Shifted to the Buffer Requirement • System can be further scaled up by forming autonomous clusters
8. Conclusion • Server-less Architecture • Scalable • Acceptable redundancy level to achieve reasonable response time in a cluster • Further scale up by forming new autonomous clusters • Reliable • Fault tolerance by redundancy • Comparable reliability as high-end server by the analysis using Markov chain • Cost-Effective • Dedicated server is eliminated • Costs shared by all users
8. Conclusion • Future Work • Distributed Directory Service • Heterogeneous User Nodes • Dynamic System Adaptation • Node joining/leaving • Data re-distribution
End of Presentation Thank you Question & Answer Session.