270 likes | 431 Views
Data Compression Conference 2013. Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform. Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li. Outline. Introduction Related Work Proposed Method Experimental Results Conclusion. Introduction (1/2). HEVC
E N D
Data Compression Conference 2013 Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform • Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li
Outline • Introduction • Related Work • Proposed Method • Experimental Results • Conclusion
Introduction(1/2) • HEVC • coding tree unit (CTU)
Introduction(2/2) • Local parallel method (LPM) • Maximum parallelism of LMP is equal or less than 8. • independent Pus (IPUs) • Directed acyclic graph(DAG)
Related Work(1/2) • Local parallel method (LPM) [16] • Motion estimate region (MER) [16] Minhua Zhou, “AHG10: Configurable and CU-group level parallel merge/skip,” JCTVC-H0082, Feb. 2012
Related Work(2/2) • Local parallel method (LPM) • 123 • M = 16 or 8 8
Proposed Method • A. Data Dependency Analysis • B. DAG for CTUs • C. Highly Parallel Framework
Proposed Method.A(1/3) • Independent PUs (IPUs) • The IPU’s left boundary and MER’s left boundary do not overlap. • The IPU’s upper boundary and MER’s upper boundary do not overlap. • 123
Proposed Method.A(3/3) • Neighboring CTUs • left • upper • upper-left • upper-right
Proposed Method • A.Data Dependency Analysis • B. DAG for CTUs • C. Highly Parallel Framework
Proposed Method.B(1/4) • Generate a DAG to capture the dependency relationships of CTUs.
Proposed Method.B(2/4) • DAG • consists of a set of vertices V and edges E. • data dependency <=> an edge. • Processed <=> remove • 123
Proposed Method.B(3/4) • Condition matrix (CM)
Proposed Method • A. Data Dependency Analysis • B. DAG for CTUs • C. Highly Parallel Framework
Proposed Method.C(2/5) • Step1 : Initialize DQ and CM. DQ is a waiting queue. CM is designed to record the number of related CTUs for each CTU. • Step2 : When some values in the CM become zero, get the corresponding coordinates and push them into DQ.
Proposed Method.C(3/5) • Step3 : Get coordinates from DQ and process corresponding CTUs in parallel on many-core platform. • Step4 : Update CM. When a CTU with coordinate (i, j) inCM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus oneoperation. • Step5 : Repeat above steps 2~4 until each frame is over.
Proposed Method.C(4/5) • Maximum parallelism of CTU • 123 • Maximum parallelism of highly parallel framework • 123 • Average parallelism of highly parallel framework • 123
Conclusion(1/1) • Highly parallel framework provide sufficient parallelism for many-core platforms. • Use the DAG-based order to parallelize CTUs.