1 / 38

Thread-Parallel MPEG-2, MPEG4 and H.264 Video Encoders for SoC Multi-Processor Architecture

Thread-Parallel MPEG-2, MPEG4 and H.264 Video Encoders for SoC Multi-Processor Architecture. Tom R. Jacobs, Vassilios A. Chouliars, and David J. Mulvaney. IEEE Transactions on Consumer Electronics . Outline. Introduction Background knowledge Main purpose Previous work Methodology

eloise
Download Presentation

Thread-Parallel MPEG-2, MPEG4 and H.264 Video Encoders for SoC Multi-Processor Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thread-Parallel MPEG-2, MPEG4 and H.264 Video Encoders for SoC Multi-Processor Architecture Tom R. Jacobs, Vassilios A. Chouliars, and David J. Mulvaney IEEE Transactions on Consumer Electronics

  2. Outline • Introduction • Background knowledge • Main purpose • Previous work • Methodology • Experimental results • Conclusions

  3. IntroductionBackground Knowledge (1/5) • A number of lossy video compression standards have been developed. • MPEG-1, MPEG-2, MPEG4-PART2, H.264 • In order to maintain image quality and reduce bit-rates Additional computation and power consumption

  4. IntroductionBackground Knowledge (2/5) • Such processing-intense consumer application algorithms are generally implemented in System-On-Chip(SOC) devices. • Parallelism • DLP Data-Level Parallelism • TLP Thread-Level Parallelism

  5. IntroductionBackground Knowledge (3/5) • Data-Level Parallelism (DLP) • Distributing the data across different parallel processing nodes. Program: … if CPU="a" then low_limit=1; upper_limit=5 else if CPU="b" then low_limit=6; upper_limit=10 end if do i = low_limit , upper_limit Task on d(i) end do ... end program

  6. IntroductionBackground Knowledge (4/5) Processing node Processing node 1 2 7 10 3 4 5 6 8 9 Data array D of size 10

  7. IntroductionBackground Knowledge (5/5) • Thread-Level Parallelism (TLP) • TLP is the parallelism inherent in an application that runs multiple threads at once. • Benefit- • Distributing the workload of a single high-performance processor among a number of slower and simpler processor cores.

  8. IntroductionMain Purpose (1/2) • Utilizing Thread-Level Parallel (TLP) techniques to improve the performance on video coding. • Reduce DIC (Dynamic Instruction Count). • How to improve? • Workload distribution among a number of parallel-executing processors.

  9. IntroductionMain Purpose (2/2) • The results presented demonstrate that reductions in dynamic instruction count can be achieved.

  10. Previous Work • The majority of this research is focused on coarse-granularity TLP exploitation, with distribution the workload most commonly at GOP level. Little inter-node communication Multi-threading GOP GOP GOP GOP GOP GOP

  11. Previous Work • In 1995, K. Shen, L. A. Rowe, and E.J. Delp implemented parallel MPEG-1 at GOP level. • In 1996, S. Bozoki, S. J. P. Westen, R. L. Lagendijk and J. Biemond performed a comparison between GOP and slice level on MPEG-1.

  12. Previous Work • In 1997, A. Bilas, J. Fritts and J. P. Singh evaluated the performance of MPEG-2 decoders using shared memory system. • Akramullah, Ahmad and Liou implemented a threaded MPEG-2 encoder at the MB level by using local memory.

  13. MethodologyOverview • The threaded MPEG-2 , MPEG-4 and H.264 implemented were compiled on multi-context instruction simulator (MT-ISS) based on SimpleScalar infrastructure. • The most important issue • Data dependancies between processors. • Avoid race hazards.

  14. MethodologyRace hazards Expected condition Error condition Thread 1 Thread 2 Thread 1 Thread 2 1 0 0 1 0 1 1 2 i+1 i+1 i+1 i+1 Race hazards 0 1 1 2 1 0 Integer i Integer i

  15. MethodologyThread-parallel MPEG-2 (1/5) • Test model 5 (TM5) of MPEG-2 encoder is used. • Computation analysis (QCIF) • DIST1  52%~73% of total DIC for a search window of 6 to 62 pels respectively. • FullSearch  3.5%~23.2% of total DIC. • Can be improved by less complex algorithmic ME method. (such as 3-step, 4-step, diamond) • FDCT, and IDCT  2.1%~21% of total DIC.

  16. MethodologyThread-parallel MPEG-2 (2/5)

  17. MethodologyThread-parallel MPEG-2 (3/5) • Motion Estimation • Kernel implementation can take advantage of data parallel techniques. • Store the information in mbinfo structure for motion compensation. • Maintain exclusivity of all variables during the parallel sections.

  18. MethodologyThread-parallel MPEG-2 (4/5) • Forward transform • FDCT first scans the MBs on a row-by-row basis, process these MBs in a row individually. • Determine prediction error and applies the DCT to the block. • Thread-parallel transform function can be performed in block-level.

  19. MethodologyThread-parallel MPEG-2 (5/5) • Inverse transform • IDCT scans the MBs first row-by-row and then block-by-block. • Due to the absence of data dependencies between blocks  Can executed as parallel.

  20. MethodologyThread-parallel MPEG-4 (1/8) • The implementation is based on XviD project with Advanced Simple Profile (ASP). • Bidirectional frames • Quarter-pel motion compensation • Global motion compensation • Trellis quantization • Custom quantization matrices

  21. MethodologyThread-parallel MPEG-4 (2/8) • Computation analysis (QCIF)

  22. MethodologyThread-parallel MPEG-4 (3/8) • The nature of XivD encoder • Intra-frame encoding • Inter-frame encoding

  23. MethodologyThread-parallel MPEG-4 (4/8) • Intra-frame encoding • FrameCodeI (row-by-row for each MBs) • Parallelize the loop for encoding the MBs in a row of the image. • MB data structure  pMB. • Shared memory array. • The highest DIC metric in FrameCodeI is MBTransQuantIntra.

  24. MethodologyThread-parallel MPEG-4 (5/8) • MBTransQuantIntra • Forward transformation, quantization and inverse transformation. • Shared data structure  pEnc • Includes a count of quantization values. • Serial code section. • Transform specific MB pixel data into the frequency domain independently. • MBPrediction and MBCoding • Responsible for VLC and write to bitstream.

  25. MethodologyThread-parallel MPEG-4 (6/8) • Inter-frame encoding • FrameCodeP • Part 1  Motion Estimation • Part 2  Transformation  Quantization  MC

  26. MethodologyThread-parallel MPEG-4 (7/8) • Motion Estimation • Determine a MV for every MB and applies certain criteria to indicate when Intra coding should be used. • Scanning in raster line order. • Two kind of the process • Motion prediction from current frame. • ME relative to reference frames.

  27. MethodologyThread-parallel MPEG-4 (8/8) • Motion Prediction • Examining the MVs in neighbouring MBs and determining an initial estimate for ME. Ideal pattern typical pattern TLP pattern ● ● ● ● ● ● ● ● ● ●

  28. MethodologyH.264 (1/6) • Using x264 for implementation. • Frame slicing • Main problems of using MB-level • Wide variation in processor workload. • The modification of prediction algorithm is needed.

  29. MethodologyH.264 (2/6) • Slice group in H.264 • A group of MBs in a frame. • Can be encoded or decoded separatedly from the remainder of the frame. • Not allowing motion prediction cross slice boundaries. • Drawback • The required bit-rate increase.

  30. MethodologyH.264 (3/6) • Comparison of different slice number

  31. MethodologyH.264 (4/6) • Comparison of different slice number

  32. MethodologyH.264 (5/6) • Different resolution with 4 slices

  33. MethodologyH.264 (6/6) • Computation analysis

  34. Experimental ResultsMPEG-2 Search Range

  35. Experimental ResultsMPEG-4 Quality Setting

  36. Experimental ResultsH.264 Quantization Parameter

  37. Experimental ResultsComparative results

  38. Conclusions • The DIC metric of MPEG-2, MPEG-4, and H.264 can be greatly reduced by TLP. • For HD sequences, the improvement is around 84%, 92%, 96% respectively. • TLP has become more significant for each new generation of video encoders.

More Related