90 likes | 104 Views
This article discusses RTP packet headers, timestamp and sequence number in RTP, playout delay compensation, and the implementation of playout buffers to handle jitter and reordering. It also explores methods to predict future late loss and excessive delay.
E N D
RTP and playout delay compensation Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003
RTP packet header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RTP: timestamp • Timestamp measured in sample units • reflects nominal sampling time of first sample in packet • e.g., 20 ms block size of 8,000 Hz audio 160 timestamp units per packet • always 90 kHz for video • e.g., 3000 timestamp units per packet for 30 fps • 3600 for 25 fps • 3750 for 24 fps • even if real system clock is slower or faster • note: 32 bit integer may wrap around • if start at 0, after about 6 days for audio, ½ day for video • but starting value is supposed to be random
RTP sequence number • Counts packets actually sent • Wraps around much quicker • e.g., for 20 ms packets, in about 22 minutes • Also uses random starting value
RTP timestamp vs. sequence number • Related, but different purposes • timestamp for timing reconstruction: • playout delay compensation (later) • synchronization with other sources (later) • sequence number for loss measurements and gap detection • t = s*b + c • where t = timestamp • s = sample units per packet • offset c is constant within a talkspurt, but changes after each talkspurt or after transmission gap
Playout delay • Converts variable network delay (“jitter”) into fixed delay • thus, end-to-end delay is max(jitter) + propagation delay • or, if willing to tolerate some late packets: • delay < 95% of jitter + propagation delay • Propagation delay is invisible • and hard to measure without synchronized clocks • about 5 ms/1000 km one way • Total delay should be less than 150 ms one-way • End-to-end delay must remain constant within a talkspurt • otherwise gaps
Playout delay playout delay packet jitter late = lost time
Logically infinite buffer Implemented as “circular buffer”, with wrap around Takes care of jitter and re-ordering based on RTP timestamp t Playout point p = t*b + c p = buffer position, measured in samples (typically, 16 bits if decoding is done before playout) b = buffer positions per sample (usually, = 1) c = offset Usually, best to think of each talkspurt as an independently schedulable unit p = p0 + (t – t0) * b t0 = timestamp for first packet in talkspurt p0 = position for first packet in talkspurt Playout buffer silence decoder (G.729 L16)
Thus, hard part is computing insertion point for first packet in talkspurt Trying to predict future late loss vs. excessive delay Conceptually, two approaches: look at current playout point when first packet arrives then, leave some margin of error may be too conservative compute based on last talkspurt and change c avoids overestimation due to slow first packet deals less well with jumps in delay after long pauses Simple method: assume roughly normal distribution and take n times the variance of the delay (= jitter) this becomes the extra delay Other mechanisms: spike detection optimal value for last talkspurt Playout buffer, cont’d. insert play t t=140 t=100