1 / 52

Scalable Media Coding Applications and New Directions

Scalable Media Coding Applications and New Directions. David Taubman School of Electrical Engineering & Telecommunications UNSW Australia. Overview of Talk. Backdrop and Perspective interactive remote browsing of media JPEG2000, JPIP and emerging applications Approaches to scalable video

nell
Download Presentation

Scalable Media Coding Applications and New Directions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Media CodingApplications and New Directions David Taubman School of Electrical Engineering & TelecommunicationsUNSW Australia

  2. Overview of Talk • Backdrop and Perspective • interactive remote browsing of media • JPEG2000, JPIP and emerging applications • Approaches to scalable video • fully embedded approaches based on wavelet lifting • partially embedded approaches based on inter-layer prediction • important things to appreciate • Key challenges for scalable video and other media • focus on motion, depth and natural models • Early work on motion for scalable coding • focus: reduce artificial boundaries • Current focus: everything is imagery • depth, motion, boundary geometry • scalable compression of breakpoints (sparse innovations) • estimation algorithms for joint discovery of depth, motion and geometry • temporal flows based on breakpoints • Summary Taubman; PCS’13, San Jose

  3. Emerging Trends • Video formats • QCIF (25 Kpel), CIF (100 Kpel),4CIF/SDTV (½ Mpel), HDTV (2 Mpel)UHDTV 4K (10 Mpel) and 8K (32 Mpel) • Cinema: 24/48/60 fps; UHDTV: potentially up to 120 fps • HFR: helmet cams & professional cameras now do 240fps • Displays • “retina” resolutions (200 to 400 pixels/inch) • what resolution video do I need for an iPad? (2048x1536?) • Internet and mobile devices • vast majority of internet traffic is video • 100’s of separate non-scalable encodings of a video • http://gigaom.com/2012/12/18/netflix-encoding • New media: multi-view video, depth, … Taubman; PCS’13, San Jose

  4. Embedded media • One code-stream, many subsets of interest Compressed bit-stream Low res Medium res High res • Heterogeneous clients (classic application) • match client bandwidth, display, computation, …. • Graceful degradation (another classic) • more important subsets protected more heavily, … • Robust offloading of sensitive media • backup most important elements first Low quality Medium quality High quality Low frame rate Medium frame rate High frame rate Taubman; PCS’13, San Jose

  5. Our focus: Interactive Browsing • Image quality progressively improves over time • Video quality improves each time we go back • Region (window) of interest accessibility • in space, in time, across views, … • can yield enormous effective compression gains • prioritized streaming based on “degree of relevance” • some elements contribute only partially to the window of interest • Related application: Early retrieval of surveillance • can’t send everything over the air • can’t wait until the plane lands Taubman: PCS’13, San Jose

  6. Scalable images – things that work well • Multi-resolution transforms • 2D wavelet transforms work well • Embedded coding • Successive refinement through bit-plane coding • Multiple coding passes/bit-plane improve embedding 2 coding passesper bit-plane Bit-plane coding(truncation) ECDZQ R-D curve(step size modulation) • Accessibility through partitioned coding of subbands • Region of interest access without any blocking artefacts Taubman; PCS’13, San Jose

  7. JPEG2000 – more than compression Decoupling and embedding embedded code-block bit-streams LL2 HL2 HL1 HH2 LH2 embedded code-block bit-streams LH1 HH1 Taubman; PCS’13, San Jose

  8. JPEG2000 – more than compression Spatial random access Taubman; PCS’13, San Jose

  9. JPEG2000 – more than compression Quality and resolution scalability LL2 HL2 HL1 HH2 LH2 layer 3 layer 2 layer 1 LH1 HH1 quality layers Taubman; PCS’13, San Jose

  10. JPEG2000 – JPIP interactivity (IS15444-9) JPIP stream + response headers • Client sends “window requests” • spatial region, resolution, components, … • Server sends “JPIP stream” messages • self-describing, arbitrarily ordered • pre-emptable, server optimized data stream • Server typically models client cache • avoids redundant transmission • JPIP also does metadata • scalability & accessibility for text, regions, XML, … window Application JPIP Server JPIP Client status window request window imagery Target(file or code-stream) Cache Model Client Cache Decompress/render Taubman; PCS’13, San Jose

  11. What can you do with JPIP? • Highly efficient interactive navigation within • large images (giga-pixel, even tera-pixel) • medical volumes • virtual microscopy • window of interest access, progressive to lossless • interactive metadata • Interactive video • frame of interest • region of interest • frame rate and resolution of interest • quality improves each time we go back over content Aerial Demo Catscan Demo Album Demo Campus Demo Panoramic Video Demo Taubman; PCS’13, San Jose

  12. Overview of Talk • Backdrop and Perspective • interactive remote browsing of media • JPEG2000, JPIP and emerging applications • Approaches to scalable video • fully embedded approaches based on wavelet lifting • partially embedded approaches based on inter-layer prediction • important things to appreciate • Key challenges for scalable video and other media • focus on motion, depth and natural models • Early work on motion for scalable coding • focus: reduce artificial boundaries • Current focus: everything is imagery • depth, motion, boundary geometry • scalable compression of breakpoints (sparse innovations) • estimation algorithms for joint discovery of depth, motion and geometry • temporal flows based on breakpoints • Summary Taubman; PCS’13, San Jose

  13. Wavelet-like approaches Motion Compensated Temporal Lifting • Lifting factorization exists for any FIR temporal wavelet transform • Warp frames prior to lifting filters, j • motion compensation aligns frame features • invertibility unaffected (any motion model) Even frames Low-pass frames warpframes *l2 warpframes *lL * l1 warpframes *lL-1 warpframes Odd frames High-pass frames Taubman; PCS’13, San Jose

  14. Motion Compensated Temporal Lifting • Equiv. to filtering along motion trajectories • subject to good MC interpolation filters • subject to existence of 2D motion trajectories • Handles expansive/contractive motion flows • motion trajectory filtering interpretation still valid for all but highest spatial freqs • subject to disciplined warping to minimize aliasing effects • Need good motion! • smooth, bijective mappings wherever possible • avoid unnecessary motion discontinuities (e.g., blocks) Taubman; PCS’13, San Jose

  15. H1 H1 SpatialDWT SpatialDWT MCLifting MCLifting H2 H2 SpatialDWT SpatialDWT MCLifting MCLifting input frames L2 L2 SpatialDWT SpatialDWT Spatio-Temporal Subbands t+2D Synthesis t+2D Analysis t+2D Structure with MC Lifting Temporalresolution Dimensions of Scalability: Quality NB: other interestingstructures exist Spatial resolution Taubman; PCS’13, San Jose

  16. Inter-layer prediction (SVC/SHEVC) 4 3 4 High ResolutionLayer 2 4 1 2 0 0 • Multiple predictors • pick one? • blend? 0 4 Implications for embedding … 3 4 2 4 1 2 0 0 0 Low ResolutionLayer Taubman: PCS’13, San Jose

  17. Important Principles • Scalability is a multi-dimensional phenomenon • not just a sequential list of layers • Quality scalability critical for embedded schemes • low res subset needs much higher quality at high res • Physical motion scales naturally • not generally true for block-based approximations • but, 2D motion discontinuous at boundaries • Prediction alone is sub-optimal • full spatio-temporal transforms preferred • improve the quality of embedded resolutions and frame rates • noise orthogonalisation • ideally orthogonal transforms • note body of work by Flierl et al. (MMSP’06 thru PCS’13) Taubman: PCS’13, San Jose

  18. Redundant spanningof low-pass content byboth channels  High-pass quantization noise has unnecessarilyhigh energy gain. 1 1 ½ ½ 1 1 1 0 0 -½ -½ Temporal transforms: Why prediction alone is sub-optimal Bi-directionalprediction evenframes residual oddframes forward transform quantization reverse transform Taubman: PCS’13, San Jose

  19. Reduced noise power through lifting • Inject –ve fraction of high band into low band synthesis path • removes low freq. noise power from synthesized high band • Add compensating step in the forward transform • does not affect energy compacting properties of prediction evenframes oddframes 1 0 0 Taubman: PCS’13, San Jose

  20. Overview of Talk • Backdrop and Perspective • interactive remote browsing of media • JPEG2000, JPIP and emerging applications • Approaches to scalable video • fully embedded approaches based on wavelet lifting • partially embedded approaches based on inter-layer prediction • important things to appreciate • Key challenges for scalable video and other media • focus on motion, depth and natural models • Early work on motion for scalable coding • focus: reduce artificial boundaries • Current focus: everything is imagery • depth, motion, boundary geometry • scalable compression of breakpoints (sparse innovations) • estimation algorithms for joint discovery of depth, motion and geometry • temporal flows based on breakpoints • Summary Taubman; PCS’13, San Jose

  21. Motion for Scalable Video • Fully scalable video requires scalable motion • reduce motion bit-rate as video quality reduces • reduce motion resolution as video resolution reduces • First demonstration (Taubman & Secker, 2003) • 16x16 triangular mesh motion model • Wavelet transform of mesh node vectors • EBCOT coding of mesh subbands • Model-based allocation of motion bits to quality layers • Pure t+2D motion-compensated temporal lifting Taubman; PCS’13, San Jose

  22. Scalable motion – very early results 32 34 CIF Bus CIF Bus at QCIF resolution 32 30 30 28 28 26 Luminance PSNR (dB) Luminance PSNR (dB) 26 Non-scalable Non-scalable 24 Brute-force Brute-force 24 Model-based Model-based Lossless motion Lossless motion 22 22 20 20 Bit-Rate (kbit/s) Bit-Rate (kbit/s) 18 18 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 38 H.264 high complexity 36 CIF mobile • H.264 results • CABAC • 5 prev, 3 future ref frames • multi-hypothesis testing • (courtesy of Marcus Flierl) 34 32 30 Luminance PSNR (dB) 5/3 temporal lifting 28 26 1/3 (hierarchical B-frames) 24 22 20 Bit-Rate (Mbit/s) 2.0 1.8 1.6 1.4 0.6 1.2 0.4 1.0 0.2 0.8 Taubman; PCS’13, San Jose

  23. Motion challenges • Issues: • smooth motion fields scale well • mesh is guaranteed to be smooth and invertible everywhere • but, real motion fields have discontinuities • Hierarchical block-based schemes • produce a massive number of artificial discontinuities • not invertible – i.e., there are no motion trajectories • non-physical – hence, not easy to scale • but, easy to optimize for energy compaction • particularly effective at lower bit-rates • Depth/disparity has all the same issues as motion • both tend to be piecewise smooth media • NB: bandlimited sampling considerations may not apply • boundary discontinuity modeling more important for these media Taubman; PCS’13, San Jose

  24. Aliasing challenges Fundamental constraint:(for perfect reconstruction) 1 half-band filter 0 0 Analysis filter responses of the popular 9/7 wavelet transform Spatial aliasing Extract LLsubband Taubman; PCS’13, San Jose

  25. Spatial scalability – t+2D • Temporal transform uses full spatial resolution temporal update • At reduced spatial resolution • Temporal synthesis steps missing high-resolution info • If motion trajectories wrong/non-physical  ghosting • If trajectories valid  temporal synthesis reduces aliasing • less aliasing than regular spatial DWT at reduced resolution temporal predict Taubman; PCS’13, San Jose

  26. Overview of Talk • Backdrop and Perspective • interactive remote browsing of media • JPEG2000, JPIP and emerging applications • Approaches to scalable video • fully embedded approaches based on wavelet lifting • partially embedded approaches based on inter-layer prediction • important things to appreciate • Key challenges for scalable video and other media • focus on motion, depth and natural models • Early work on motion for scalable coding • focus: reduce artificial boundaries • Current focus: everything is imagery • depth, motion, boundary geometry • scalable compression of breakpoints (sparse innovations) • estimation algorithms for joint discovery of depth, motion and geometry • high level view of the coding framework we are pursuing • Summary Taubman; PCS’13, San Jose

  27. Block-based schemes with merging (Mathew and Taubman, 2006) • Linear & affine models • encourages larger blocks • Merging of quad-tree nodes • encourages larger regions and improves efficiency • merging approach later picked up by the HEVC standard • Hierarchical coding • works very well with merging; provides resolution scalability leafmerging Pure translation Linear motion model Affine motion model Taubman; PCS’13, San Jose

  28. M2 M1 Boundary geometry and merging • Model motion & boundary • No merging • Hung et al. (2006) • Escoda et al. (2007) • With merging • Mathew & Taubman (2007) • separate quad-trees (2008) Motion Comp (M2) Motion only Motion + boundary Separate quad-trees Combined result Motion Comp (M1) Taubman; PCS’13, San Jose

  29. Indicative Performance • Things that reduce artificial discontinuities: • modeling geometry as well as motion • separately pruned trees for geometry and motion • merging nodes from the pruned quad-trees • These schemes are practical and resolution scalable • readily optimized across the hierarchy Two quad-trees:motion, geometry, merging Single quad-tree:motion + merging Single quad-tree:motion, geometry, merging Single quad-tree:motion only Taubman; PCS’13, San Jose

  30. Overview of Talk • Backdrop and Perspective • interactive remote browsing of media • JPEG2000, JPIP and emerging applications • Approaches to scalable video • fully embedded approaches based on wavelet lifting • partially embedded approaches based on inter-layer prediction • important things to appreciate • Key challenges for scalable video and other media • focus on motion, depth and natural models • Early work on motion for scalable coding • focus: reduce artificial boundaries • Current focus: everything is imagery • depth, motion, boundary geometry • scalable compression of breakpoints (sparse innovations) • estimation algorithms for joint discovery of depth, motion and geometry • temporal flows induced by breakpoints • Summary Taubman; PCS’13, San Jose

  31. Geometry from arc breakpoints • Breaks can represent segmentation contours • but don’t need a segmentation • breaks are all we need for adaptive transforms • direct RD optimisation is possible • Natural resolution scalability • finer resolution arcs embedded in coarser arcs Coarse grid Finer grid Gridpoints Arcs Taubman: PCS’13, San Jose

  32. Breakpoint induction and scalability Coarse grid • Explicitly identify subset of breakpoints – “vertices” • remaining breakpoints get induced • position of break on its arc impacts induction • Representation is fully scalable • resolution improves as we add vertices at finer scales • quality improves as we add precision to breakpoint positions • Breakpoint representation is image-like • vertex density closely related to image resolution • vertex accuracy like sample accuracy in an image hierarchy Finer grid direct inductiononto sub-arcs spatial inductionontoroot-arcs spatial inductiononto root-arcs Taubman: PCS’13, San Jose

  33. Breakpoint Adaptive Transforms (BPA-DWT) – sequence of non-separable 2D lifting steps P1-step U1-step P2-step U2-step Original field samples Arc BreakpointPyramid Field SamplePyramid • Breakpoints drive an adaptive DWT • Basis functions do not cross discontinuity along an arc • Max of one breakpoint per arc • Adaptive transform well defined Taubman: PCS’13, San Jose

  34. Arc-bands and sub-bands Sub-bands (BPA-transformed data) Arc-bands (vertices) Level 2 Level 1 codeblocks non-root arcs Level 0 root arcs Taubman: PCS’13, San Jose

  35. Embedded Block Coding– for scalability and ROI accessibility • Sub-band stream (field samples) • Sub-bands divided into code blocks • Coded using EBCOT (JPEG2000) • Bitplanes assigned to quality layers • Vertex Stream • Arc-bands divided into code blocks • Coding scheme similar to EBCOT • Bitplanes refine vertex locations • Bitplanes assigned to quality layers X X X X 0 0 0 0 λ1 0 0 0 0 1 0 0 0 λ2 1 1 1 0 0 0 0 0 λ3 LSB 1 1 0 1 Sub-band stream Vertex stream Taubman: PCS’13, San Jose

  36. Fully Scalable Depth Coding– field samples are depth; breaks are depth discontinuities • JPEG 2000, 50 k bits • Resolution scalable • Quality scalable • No blocks Poorly suited todiscontinuities indepth/motion fields • Proposed, 50 k bits • Resolution scalable • Quality scalable • No blocks Well suited todiscontinuousdepth/motion fields Taubman: PCS’13, San Jose

  37. R-D Results for Depth Coding JPEG2000: 5 levels of decomposition with 5/3 DWT Breakpoint-adaptive: vertex stream is truncated at a quality level sub-band stream decoded progressively Taubman: PCS’13, San Jose

  38. Model-based quality layering • Scaled by discarding sub-band and arc-band quality layers • fully automatic model-based quality layer formation • model-based interleaving of all quality layers for optimal embedding Taubman: PCS’13, San Jose

  39. Model-based quality layering • Compared with segmentation based approach (Zanuttigh & Cortelazzo, 2009) • not scalable; sensitive to initial choice of segmentation complexity Taubman: PCS’13, San Jose

  40. Fully scalable motion coding – preliminary • Field samples are motion vectors • motion coded using EBCOT after BPA-DWT • Breakpoints and motion jointly estimated • compression-regularised optical flow: • Coded length L provides the prior (regularisation) • reflects cost of scalably coding arc breakpoints • plus cost of coding motion wavelet coeffs after BPA-DWT • Distortion D provides the observation model Taubman: PCS’13, San Jose

  41. Optimisation framework • Approach based on loopy belief propagation • Breakpoints/vertices connected via inducing rules • turns out to be very efficient • complexity grows only linearly with image size • Breakpoints/motion connected via BPA-DWT • initial simplification: motion at each pixel location drawn from a small finite set (cardinality 5) • initial motion candidates generated by block search with varying block sizes • System always converges rapidly • Use modes of the marginal beliefs at each node Taubman: PCS’13, San Jose

  42. Initial scalable motion results • 2 frames, inter-predict only; occlusions removed • newer work eliminates restriction to finite motion set H.264 trunc motion bits trunc vertexand motion bits JPEG2000 Taubman: PCS’13, San Jose

  43. Visualising the breaks & motion Taubman: PCS’13, San Jose

  44. Temporal induction of motion– preliminary • Given: • motion from f1 to f2 • plus breakpoint fields in f1 and f2 • Induce: • motion from f2 to f1 • resolve ambiguities, find occlusions f1 f2 Taubman: PCS’13, San Jose

  45. Synthetic example: Inferred vertical motion Inferred occlusion Inferred horizontal motion Resolvedambiguities Taubman: PCS’13, San Jose

  46. Temporal induction of breaks– preliminary • Given: • motion from frame f1 to frame f3 • plus breakpoint fields in f1 and f3 • plus motion from f1 to f2(real or estimated) • Induce: • breakpoint field in frame f2 • hence estimate all other motion fields • note: induced occlusions are temporal breaks f1 f2 f3 Taubman: PCS’13, San Jose

  47. Synthetic example: Horizontal breaksFrame 2 induced Horizontal breaksFrame 3 Horizontal breaksFrame 1 Taubman: PCS’13, San Jose

  48. Things we are working on • Message approximation strategies for joint inference of breakpoint fields with motion • compression regularized optical flow • continuous motion, notionally at infinite precision • Advanced breakpoint coding techniques • including differential coding schemes • i.e., geometry transforms • including inter-block conditional coding schemes that do not break the EBCOT paradigm • Breakpoint adaptive spatio-temporal transforms • breakpoints potentially address most of the open issues surrounding motion compensated lifting schemes • Integration within our JPEG2000 SDK (Kakadu) • interactive access, JPIP for remote browsing, etc. Taubman: PCS’13, San Jose

  49. Demo placeholder • Interactive browsing of breakpoint media over JPIP Taubman: PCS’13, San Jose

  50. Arc breakpoints vs graph edges • arcs = possible edges on spatial graph • breaks = missing edges • main point of divergence • arc breakpoints have locations (key to induction & scalability) • graph edges may be weighted (equivalent to “soft” breaks) • Related approaches: “Graph based transforms for depth video coding” Kim, Narang and Ortega, ICASSP’12 “Video coder based on lifting transforms on graphs” Martinez-Enriquez, Diaz-de-Maria & Ortega, ICIP’11 • Edges (binary breaks) from segmentation (JBIG encoded) • Motion (block based) produces temporal edges • Spatio-temporal video transform adapts to graph • hierarchical, but not scalable in the usual sense Taubman: PCS’13, San Jose

More Related