Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform

Data Compression Conference 2013 Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform • Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li

Outline • Introduction • Related Work • Proposed Method • Experimental Results • Conclusion

Introduction(1/2) • HEVC • coding tree unit (CTU)

Introduction(2/2) • Local parallel method (LPM) • Maximum parallelism of LMP is equal or less than 8. • independent Pus (IPUs) • Directed acyclic graph(DAG)

Related Work(1/2) • Local parallel method (LPM) [16] • Motion estimate region (MER) [16] Minhua Zhou, “AHG10: Configurable and CU-group level parallel merge/skip,” JCTVC-H0082, Feb. 2012

Related Work(2/2) • Local parallel method (LPM) • 123 • M = 16 or 8 8

Proposed Method • A. Data Dependency Analysis • B. DAG for CTUs • C. Highly Parallel Framework

Proposed Method.A(1/3) • Independent PUs (IPUs) • The IPU’s left boundary and MER’s left boundary do not overlap. • The IPU’s upper boundary and MER’s upper boundary do not overlap. • 123

Proposed Method.A(2/3)

Proposed Method.A(3/3) • Neighboring CTUs • left • upper • upper-left • upper-right

Proposed Method • A.Data Dependency Analysis • B. DAG for CTUs • C. Highly Parallel Framework

Proposed Method.B(1/4) • Generate a DAG to capture the dependency relationships of CTUs.

Proposed Method.B(2/4) • DAG • consists of a set of vertices V and edges E. • data dependency <=> an edge. • Processed <=> remove • 123

Proposed Method.B(3/4) • Condition matrix (CM)

Proposed Method.B(4/4)

Proposed Method • A. Data Dependency Analysis • B. DAG for CTUs • C. Highly Parallel Framework

Proposed Method.C(1/5)

Proposed Method.C(2/5) • Step1 : Initialize DQ and CM. DQ is a waiting queue. CM is designed to record the number of related CTUs for each CTU. • Step2 : When some values in the CM become zero, get the corresponding coordinates and push them into DQ.

Proposed Method.C(3/5) • Step3 : Get coordinates from DQ and process corresponding CTUs in parallel on many-core platform. • Step4 : Update CM. When a CTU with coordinate (i, j) inCM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus oneoperation. • Step5 : Repeat above steps 2~4 until each frame is over.

Proposed Method.C(4/5) • Maximum parallelism of CTU • 123 • Maximum parallelism of highly parallel framework • 123 • Average parallelism of highly parallel framework • 123

Proposed Method.C(5/5)

Experimental Results(1/5)

Conclusion(1/1) • Highly parallel framework provide sufficient parallelism for many-core platforms. • Use the DAG-based order to parallelize CTUs.

Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform

Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform

Presentation Transcript

MOTION ESTIMATION

Hash Codes for Motion Estimation

L1: Introduction CS 6235: Parallel Programming for Many-Core Architectures

Dense Motion Estimation

Dense Motion Estimation

Highly Parallel Mode Decision Method for HEVC

Motion Estimation

3D Motion Estimation

Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Estimation on how many channels needed for Endcaps

Motion estimation

Highly Parallel Line-Based Image Coding for Many Cores

Motion estimation

Motion estimation

Motion estimation

Analysis of Motion Estimation Algorithm ( HEVC), using Multi-core processing

Analysis of Motion Estimation Algorithm ( HEVC), using Multi-core processing

Independent Motion Estimation

Motion estimation

Motion estimation

Motion Estimation

Motion estimation