1 / 23

A Pragmatic Spatial-Random-Access-Enabled Video Coding Scheme

A Pragmatic Spatial-Random-Access-Enabled Video Coding Scheme. Piyush Agrawal EE398A – Project Presentation. High resolution video - challenges. Rise of high resolution videos Better digital imaging sensors Increasing storage capacity

kimo
Download Presentation

A Pragmatic Spatial-Random-Access-Enabled Video Coding Scheme

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Pragmatic Spatial-Random-Access-Enabled Video Coding Scheme Piyush Agrawal EE398A – Project Presentation

  2. High resolution video - challenges • Rise of high resolution videos • Better digital imaging sensors • Increasing storage capacity • Algorithms and systems for stitching ultra-high resolution videos using multiple cameras • Challenges in using such videos • Lack of network bandwidth • Lack of high resolution display screens • Solution: Interactive Region-of-Interest video streaming

  3. Agenda • Spatial-random-access enabled video coding • Related work • Proposed schemes • Experimental results • Discussion on pros and cons of different schemes

  4. Spatial-random-access enabled video coding – why? • One way of providing interactive region-of-interest streaming • Decode entire high resolution video on the fly, for each user • Crop relevant part of the high resolution frame • Encode the relevant part again and transmit • Drawbacks • Multiple encodings required • Not scalable with increasing no. of simultaneous viewers • Required • A scheme which performs encoding only once and then can serve any user, any no. of times

  5. Related work – Eusipco’07 Source: Aditya Mavlankar, Pierpaolo Baccichet, David Varodayan, and Bernd Girod. Optimal slice size for streaming regions of high resolution video with virtual pan/tilt/zoom functionality. In Proc. 15th European Signal Processing Conference (EUSIPCO07), pages 1275–1279, 2007.

  6. Related work – PCS’09 Source: Aditya Mavlankar, Peer-to-Peer video streaming with interactive region-of-interest, PhD Dissertation (un-published) • 85% reduction in storage requirements, compared to Eusipco’07 scheme

  7. Drawbacks • Not compliant to any video coding standards • Require custom encoder and decoder • Decoder complexity • Cross resolution layer dependencies • Scaling operation for rendering each frame • Difficult to implement multi-thread/process/CPU parallel encoder for real-time encoding • Entire frame processed as a whole

  8. Proposed scheme - ViewXtreme

  9. Parallel encoder

  10. Optimization to ViewXtreme scheme using Adaptive Skip Mode

  11. How to detect static segments? • Consecutive frame differencing • Calculate mean pixel difference value • If below a fixed threshold, declare as static • Smoothing • Video shot in bad lighting conditions – too much noise • Leads to high frame difference even with no “actual” motion • Apply Gaussian smoothing filter to each tested frame • How to find the fixed threshold • MSE of 1 gives PSNR = 48 dB • Consider two consecutive frames as original and reconstructed signal respectively • PSNR of 48 dB means the two signals look alike, i.e no motion between the two frames • Other ideas • Structural Similarity Index Measure (SSIM)

  12. Experimental setup • Compare 4 schemes • ViewXtreme • ViewXtreme Adaptive Skip • UpwardPredictionOnly (EUSIPCO’07) • BE-LTMMCP (PCS’09) • Test video: 600 frames, classroom scene • Results only for highest resolution layer (1920x1080) • Slice size: 480x270 pixels • QP for base layer = 27 • Effects performance of BE-LTMMCP and UpwardPredictionOnly schemes • GOP size = 30 frames • Effects performance of ViewXtreme and ViewXtreme Adaptive Skip schemes • Encoded video: 30 frames per second

  13. Coding efficiency

  14. Coding efficiency – benefits of skip mode

  15. Encoding speed Encoding done on a quad core machine, with 4GB RAM

  16. Pros and cons of proposed schemes • Pros • Standard compliant encoder and decoder • Simplified decoder • Highly parallel encoder possible using off-the-shelf encoding tools • Significantly better (~66%) coding efficiency, leading to small network bandwidth required • Cons • Expectedto provide lower degree of spatial-temporal-random access as compared to other schemes • Use of motion compensated prediction coding Can we confirm this?

  17. A deeper dive into random access • Logical operations performed to render a random frame • Download bits required to decode the single random frame • Decode bits and create the reconstructed frame in memory • Render reconstructed frame on the client’s display • Rendering of reconstructed frame (step 3) – independent of coding scheme – can be ignored • ViewXtreme and BE-LTMMCP schemes differ in step 1 and 2

  18. Differences • BE-LTMMCP • Each frame independent of another frame (on same resolution layer) • Encoded bits corresponding to only the single random frame to be downloaded • Decoding of single frame needed • Mean size of a random frame can be estimated from bits per pixel for different quality levels • ViewXtreme • No. of required bits (to be downloaded) depend on GOP structure

  19. ViewXtreme – dependence on GOP • Frames to be downloaded • Frame 1: 1 I-frame • Frame 4: 1 I-frame + 2 P-frames + 1 B-frame • Frame 6: 1 I-frame + 4 P-frames + 1 B-frame • All frames of a GOP equally likely to be requested • Estimate no. of bits to be downloaded if median frame is requested • No B-frames used in experiments, GOP size = 30 frames • For 15th frame: 1 I-frame + 14 P-frames required • Mean size of a single I-frame and P-frame measured in experiments

  20. Data needed to render single random frame

  21. Effect of decoding multiple frames • For a random frame to be displayed • On average, 15 frames to be decoded (GOP size = 30) • Benchmark on a single core (2.4Ghz) client • Decoding rate upto 500 fps for a 480x270 pixel video encoded using H.264 • Time needed to decode 15 frames = 15 * 1/500 seconds = 30 msec • Less than inter-frame interval of 33 msec (for playing video at 30 fps) • Decoding time negligible compared to data download time • Conclusion: ViewXtreme scheme provides higher degree of spatial-temporal random access

  22. Conclusions • Proposed 2 new coding schemes for spatial-random-access • Compared with two state-of-the-art schemes • Showed that the proposed schemes outperform other schemes in terms of • Coding efficiency • Standard compliance • Encoder and decoder complexity • Degree of spatial-temporal random access • Future work • Better ways of detecting static segments • Better architectural designs for encoders running on commodity machines

  23. Acknowledgements • Prof. Bernd Girod • Mina Makar • Aditya Mavlankar • Derek Pang Thank You!

More Related