490 likes | 681 Views
Sections 14.1 - 14.4 Streaming Media on Demand and Live Broadcast Multimedia over IP and wireless networks: compression, networking, and systems Mihaela van der Schaar & Philip A. Chou. Presented by H. Mark Okada CMPT 820 February 18, 2009. Streaming Media.
E N D
Sections 14.1 - 14.4Streaming Media on Demand and Live BroadcastMultimedia over IP and wireless networks: compression, networking, and systemsMihaela van der Schaar & Philip A. Chou Presented by H. Mark Okada CMPT 820 February 18, 2009
Streaming Media • Media on demand: a user scenario characterised by audio or videoplayback locally from a CD or DVD • interactive controls: fast forward, pause, seek, etc. • Live broadcast: a user scenario characterised by tuning into a radio or television program • only has ability to join or leave a session • Both are prevalent in the internet today Eg. • interactive music and video playback • internet radio • chapter 14 looks at how these services are available • Sections 14.2-14.4 will only cover media on demand
Overview Section 14.2 • Overview of • Architectures • Protocols • Format issues Section 14.3 • Buffering and timing fundamentals Section 14.4 • How media data is communicated for streaming on demand NOT COVERED - Section 14.5 • Live broadcast
Architectures - 14.2.1 • Streaming media on demand and live broadcast require different architectures Figure 14.1
Streaming media on demand • source of media is encoded off line to a media file • streaming using different protocols (Section 14.2.2) • media file may be specialized to support various modes of streaming (discussed in Section 14.2.3) • client temporarily buffers encoded media into decoder buffer • temporarily buffers decoded media in a render buffer • fairly short (a frame or two) as it has large decoded frames • enable experience through playback commands • play, FF, stop, seek Communication between server & client tailored to • client’s resources • network connection Figure 14.1a
Progressive downloading • type of streaming - media can be streamed faster than playback. i.e. downloading entire file • If able to decode sequentially • progressive downloading can be done through simple file transfer protocols • eg. FTP, HTTP both over TCP/IP (i.e. over FTP or through a web server) • If limited buffer • progressive downloading can be done using simple TCP flow control • allows client to accept data from TCP only if there is space in media buffer • popularised by SHOUTcast, an early music streaming service network bandwidth > media content bit rate (the source coding rate)
Progressive downloading • type of streaming - media can be streamed faster than playback. i.e. downloading entire file • need to account for network jitter, temporary interferences • want highest possible source coding rate (not less than worst case network bandwidth) • These are much of the issues for media on demand, and the communication protocol between the client and server network bandwidth > media content bit rate (the source coding rate)
Live broadcast • encoder may be directly connected to the server through an encoder buffer • encoder buffer contains limited data to maintain fixed and short end-to-end delay • server accesses data at the playback point, not in any arbitrary data in a file • restricts adaptivity, important for multiple receivers • not possible to have interactive access to media • difficult to adapt transmission rate of varying clients** • difficult for server to use retrans- mission-based error control • due to negative acknowledgement (NAK) implosion problem • error becomes delicate issue for live broadcast **receiver-driven layered multicast (RLM) allows adaptation of transmission rate Also see: S. R. McCanne. Scalable Compression and Transmission of Internet Multicast Video. Ph.D. thesis, The University of California, Berkeley, CA, December 1996. S. R. McCanne, V. Jacobson, and M. Vetterli. “Receiver-Driven Layered Multicast,” in Proc. SIGCOM, pages 117–130, Stanford, CA, August 1996. ACM.
Protocols - 14.2.2 • streaming on demand requires many protocols at different levels This section covers a subset of the protocols described in week 2 of this class • RTP: Real-Time Protocol • RTSP: Real-Time Streaming Protocol • RTCP: Real-Time Control Protocol • SIP: Session Initiation Protocol
Real-time streaming protocol (RTSP) • RFC 2326 At the topmost level: • application level protocol • protocols for content discovery • connection to specific streaming media server Content discovery is done “out of band” eg. http://www.microsoft.com/directory/contentname.asx http://www.realnetworks.com/directory/contentname.ram http://www.apple.com/directory/contentname.mov • URL pointing to metadata that references a separate file on a webserver • different for each type: asx, ram, mov Client contacts server using URL for the content. eg. rtsp://wms.microsoft.com/directory/contentname.wmv rtsp://helixserver.example.com/audio1.rm?start=55&end=1:25 rtsp://qtserver.apple.com/directory/contentname.mov • Prefix: indicates the streaming protocol used • Suffix: info to the server, eg. seek, play speed, etc.
Example of auxiliary file Microsoft ASX file <ASX Version="3.0"> <ENTRY> <REF HREF="mms://streamingmedia/studios/0505/24721/MTV_XBOX_preview_160k.wmv" /> </ENTRY> <ENTRY> <REF HREF="mms://winmedianw/studios/0505/24721/MTV_XBOX_preview_160k.wmv" /> </ENTRY> </ASX> RealNetworks RAM file # First URL that opens a related info pane. rtsp://helixserver.example.com/video3.rm?rpcontextheight=350 &rpcontextwidth=300&rpcontexturl="http://www.example.com/relatedinfo2.html" &rpcontexttime=5.5&rpvideofillcolor=rgb(30,60,200) # # Second URL that keeps the same related info pane, # but changes the media playback pane’s background color. rtsp://helixserver.example.com/video4.rm?rpcontexturl=_keep &rpvideofillcolor=red Figure 14.2
Streaming protocol • commands typically sent reliably over TCP connection (many forms) • Real Time Streaming Protocol (RTSP) is widely adopted (RFC 2326) • Idea is simple but SET_PARAMETER can be complicated • a media file may have multiple streams for audio and video for different languages, subtitles, source coding rates, etc.
Real-time protocol (RTP) • Client is able to specify which lower level data transport protocol to use • data transport is usually either • RTP over UDP, or • RTP over TCP • Both are preferred for bandwidth efficiency • RTP over UDP - must be a means of transmission rate and error control • no standard means of transmission rate and error control for RTP • HTTP over TCP may be used when avoiding firewall issues
Real time control protocol (RTCP) • RFC 3551 • often used with RTP • often receivers provide statistical feedback to sender (reports) • the interoperable and proprietary features limit the use as a standard
Windows Media system • RTP over UDP • normally transmission rate control based on source coding rate of content • client can detect congestion • signal server to lower or increase source coding rate
Alternative methods of transmission rate control 1) TFRC: TCP-friendly rate control 2) TCP-like congestion control algorithm • Both are being standardised as two profiles in Datagram congestion control protocol (DCCP) • Must be paired with a source coding algorithm so that coding rate is same as transmission rate… • Source coding rate control algorithm • Eg. rate-distortion optimised (RaDiO) scheduling algorithm • error control in Windows Media use selective retransmission • gaps sends a NAK to the server (negative acknowledgement), causing retransmission • audio has higher priority than video • Windows media players stalls if missing audio packets and waits for arrival
File formats - 14.2.3 Challenging to adapt fixed media file to various network and client conditions • encoding must be done before streaming (no knowledge of context) • allow flexibility into media file Unrealistic to: • compress or transcode to needs of every client • best way is to allow server to select which parts of the file to stream
Some streaming formats The Major players • MPEG-4 format • QuickTime format (MPEG-4 is based) • RealMedia format • Microsoft Advanced streaming format (ASF) All have ability to contain/multiplex multiple media and versions of each medium • recorded into a track (MPEG-4/QT) or stream (ASF) • data units: made of chunks (MPEG-4/QT) or packets (ASF)
Streaming formats • Each has a header containing metadata relating to overall file and specific tracks or streams • title, author, date, encryption, right managements, table of contents, track/stream enumeration & their descriptions • Information on individual track/stream properties • start time, duration, bit rate, buffer size, sampling rate, picture size, scalability capabilities • Time-varying metadata can be associated with each track/stream • network packetisation, decoding and presentation time stamps, SMPTE time codes, key frame, switch frame • Two types of metadata • static metadata: size independent of length of data, inexpensive to transmit over the network • time-varying metadata: size grows with data, expensive to transmit
Streaming formats • … • provides a structure to allow a method to select parts of data to transmit Either • course grained: server streams only a particular subset of streams to client • fine grained: in addition allows fraction of the data to be chosen • Can set a Lagrange multiplier parameter which determines which data units are not transmitted
Encoding media into a stream Two methods 1) Multibit rate (MBR) • multiple independent encodings (each with varying coding rates) are stored in separate streams (in same file) • choice in which streams to play 2) scalable coding • later on section 14.3.3
Data units • use packets • eg. H.264/AVC use Network Adaption Layer (NAL) • In general, local playback/storage not suitable for streaming • hard for server to choose the right portions of the file to stream • difficult to randomly access (seek) arbitrary points in the stream
Overview Section 14.2 • Overview of • Architectures • Protocols • Format issues Section 14.3 • Buffering and timing fundamentals Section 14.4 • How media data is communicated for streaming on demand NOT COVERED - Section 14.5 • Live broadcast
Fundamental abstractions - 14.3 Fundamental abstractions of streaming media on demand (Section 14.3) • Section covers • leaky bucket models of bit streams • constant bit rate (CBR) vs. variable bit rate (VBR) • compound (multiple media) streams • preroll delay • playback speed timing • timing • clocks • decoder and presentation timestamps • Should know when it is safe for client to begin playback
Buffering and leaky bucket models Scenario 1 - constant bit rate (CBR) • isochronous** noiseless communication channel • encoder buffer in between encoder and channel • decoder buffer in between channel and decoder • schedule – sequence of bits which successive bits in an encoded bit stream pass a given point in pipeline **isochronous - equal amounts of data are communicated in equal amounts of time Figure 14.3 Figure 14.4 B bits = Encoding buffer + Decoding buffer Encoding buffer Decoding buffer
Buffer tube • Can view previous as a buffer tube • Characterised with 3 parameters • R - slope • B - height in bits • Fe - offset/fullness from bottom of tube • Or by Fd - offset from top of tube • Fd = B - Fe Can view previous as a buffer tube • From a buffer point of view • overflow in of encoder buffer => decoder buffer underflow • underflow in of encoder buffer => decoder buffer overflow • B = encoder buffer + decoder buffer • Fe - initial fullness of encoder buffer • managed by a rate control algorithm • assigns a number of bits b(n) to each frame n
Buffer tube • Managed by a rate control algorithm • assigns a number of bits b(n) to each frame n • B = encoder buffer + decoder buffer • Fe - initial fullness of encoder buffer • De initial delay before entering channel De = Fe/R • Dd = Fd/R delay after data extracted by the decoder from the channel (R,B,F) tube Aim to keep decoder buffer delay Dd = Fd/R low Figure 14.5
Variable bit rate stream (VBR) Scenario 2 - variable bit rate stream (VBR) • Unlike CBR, VBR has a variable amount of data per time segment • higher bitrate for complex segments • lower bitrate for less complex segments • tend to have wider buffer streams => larger start-up delay • part of an overall problem: difficult to determine the average bit rate of system
Variable bit rate stream (VBR) • Recall the (R,B,F) tube • each parameter is not unique for a given bit stream Definitions of average rate is non trivial • fit the closest slope along the stairwell, or • number of bits in stream / duration of stream
Variable bit rate • encoder does not use channel continuously • channel has peak transmission rate R higher than average stream bit rate • when needed, sends packets at rate R • otherwise at 0 • typical of packet network and shared channels • best modelled by leaky bucket Defined by (R, B, Fe) • n: frame number • b(n): number of bits placed in leaky bucket • τ(n): time that frame n is processed • R: bit rate of data leaked out of bucket • Fe(n) fullness of en. buffer before frame n added • Be(n) fullness of en. buffer after frame n added • has schedule
Leaky bucket • Be(n) fullness of encoder buffer after frame n added to bucket • Fe(n) fullness of encoder buffer before frame n added to bucket • Be(n) < B for all n = 0, 1, … N • Aim is to find smallest decoder buffer size and smallest decoder buffer delay
Leaky bucket For a given stream, define: • Minimum bucket capacity with leak rate R and given initial fullness Fe Bmin(R,Fe) = minnBe(n) • Initial decoder buffer fullness • Derives that there is a minimum capacity B as well as minimum decoder buffer delay Dd = Fd / R, provided it starts with initial fullness Fe = Femin (R) • Source coding rate (Rc): maximum leak rate R such that a leaky bucket (R, B, Fe) does not underflow with initial fullness Fe = Femin(R) • larger leak rates R => smaller required capacity
Leaky bucket • If transmission rate R > source coding rate Rc • Decoder buffer reduced • Decoder buffer delay also reduced • client can determine required buffer size and preroll delay • use functions Bmin(R) and Fdmin(R) • computed off line at set of transmission rates R, R1 < R2 < · · · < RL • stored in the bit stream header as a set of leaky bucket parameters (Ri , Bi , Fi ) • where Bi = Bmin(Ri) and Fi = Fdmin(Ri) • each i ∈ L represents the breakpoints in piecewise linear function in Bmin(R) and Fdmin(R) • can estimate by linear interpolation (and extrapolation at ends) at any point R can estimate Bmin(R) and Fdmin(R) Figure 14.7
Leaky bucket Linear interpolation of Bmin(R) and Fdmin(R)
Compound streams (section 14.3.2) • Compound streams encapsulate many streams meant to played and streamed concurrently • view as a single compound stream and a set of leaky buckets • a leaky bucket (B,F,R) is the sum of its component leaky buckets • eg. If audio has bucket (Ra,Ba,Fa), and video has bucket (Rv,Bv,Fv), then parameters sum: • R = Ra + Rv • B = Ba + Bv • F = Fa + Fv • Find a combination of each leaky bucket s.t. the combined leaky bucket won’t overflow
Compound streams • Find a combination of each leaky bucket s.t. the combined leaky bucket won’t overflow • combination of i in La and j in Lv • minimising using Lagrangian shows that there are at most La + Lv index pairs, that lie on set • can extend this into M concurrent media streams
Multibit rate (MBR) • multiple independent encodings (each with varying coding rates) are stored in separate streams (in same file) • choice in which streams to play • mutually independent, each at different source coding rates • combining all possible mutually exclusive streams (eg. audio Na and video Nv) each with a different leaky bucket • most combinations of Na × Nv not likely, typically are Na + Nv • use distortion rate approach
Distortion-rate approach Decide which streams to pair • assign a distortion Dia and source coding rate Ria to each audio stream in i = 0… Na • assign a distortion Djv and source coding rate Rjv to each video stream in j = 0… Nv • For each (i,j) combined stream, define distortion and source coding rate • Where α: arbitrary weight relative to video distortion • using Lagrangian again, can find the lowest total distortion among all combinations with same or lower total bit rate • can extend this to other sets of media
Temporal coordinate systems and timestamps (section 14.3.4) • Each frame has a decoder timestamp (DTS) in (MPEG terminology) • instructs client when to decode it • also acts as a decoding deadline • presentation bufferholds decoded frames before the renderer • assigned presetation timestamp (PTS), instructs when to play • critical in synchronising different streams • PTS are a layer above the DTS • Note that presentation order ≠ decoding order • Eg. I0, B1, B2, P3, B4, B5, P6, ... (presentation order) I0, P3, B1, B2, P6, B4, B5, ... (decoding order) • assumed that frames are time stamped with DTS and PTS • book will only use DTS
clocks (temporal coordinate system) • media time τ: clock for device used to capture and timestamp original content (real time) • client time t: clock for device playing content eg. • τDTS(0), τDTS(1), etc. • tDTS(0), tDTS(1), etc. Converting is done by • Where • v is the playback rate (v=2 => playing 2x the speed) • t0 and τ0 are common initial events (first frame after seeking/rebuffering)
Leaky bucket update • Leaky bucket update becomes where • R´ = Rv is the arrival rate of bits into client (unit: bits/client time) • R = R´/v rate that must be used to compute required buffer size Bemin(R) and initial decoder buffer fullness • preroll delay is Fdmin(R)/R´ = Fdmin(R)/Rv • larger playback speed => smaller preroll delay
Overview Section 14.2 • Overview of • Architectures • Protocols • Format issues Section 14.3 • Buffering and timing fundamentals Section 14.4 • How media data is communicated for streaming on demand NOT COVERED - Section 14.5 • Live broadcast
Packet networks - 14.4 • RC: source coding rate • RS: sending rate - rate at which data injected into transport layer • Measured in bits/s of client time • RX: transmission rate - rate which data injected into network layer (TCP or UDP) • RX - RS = error control overhead • RS / RX = channel coding rate • Ra: arrival rate • assumed to be RS • usually set to Ra = vRc Decoupling Rc and Ra has advantages Figure 14.8a
Decoupling Ra = vRc • Adjusting source coding rate defined by problem source coding rate control • Choose Rc as a function of Ra • Change client buffer duration and history • Have variety of average bit rates R(1), R(2), … • Each with tight buffer tube (R(i),B(i),Fe(i)) • Can delay playback to ensure guaranteed continuous playback
Control theoretic model - 14.4.2.1 • Client buffer - gap between frame arrival time ta(n) and its playback deadline td(n) • Overflow when gap too large • Underflow when gap too small • If gap shrinks, must reduce Rc to adjust tb(n) Figure 14.9
Control Objective - 14.4.2.2 • Underflow prevented by previous section • Quality fluctuates to complexity of content • Target schedule has a margin of safety • Introduces a penalty to the cost function • Deviation of buffer tube from target schedule • Coding rate difference between successive frames
Target schedule design - 14.4.2.3 • Want smallest client buffer duration • Start with small delay, and increase gap • Slope is the average source coding rate to the average arrival rate • If upper bound aligns with target schedule • tb(n) = tT(n) Eventually want logarithmic growth of buffer Figure 14.10
Controller design - 14.4.2.4 • Adjust source coding rate • Controller needs to change n+2 frame at time n • Uses notion of an error e(n) and a vector feedback gain G • Optimal G* is solved
Controller interpretation - 14.4.2.6 • Virtual frame rate is used to reduce feedback rate and as it is difficult to specify a frame rate for merged streams • Start with source coding rate 1/2 of arrival rate to build up the client buffer duration Figure 14.11a