60 likes | 268 Views
CE6.d: Parallel Intra Coding JCTVC-F605 Jie Zhao and Andrew Segall. Parallel Prediction Unit. Parallel Intra Prediction Intra-prediction is a serial bottleneck for high resolution applications Prediction is based on reconstructed pixels from left/top neighbors
E N D
CE6.d: Parallel Intra CodingJCTVC-F605Jie Zhao and Andrew Segall
Parallel Prediction Unit • Parallel Intra Prediction • Intra-prediction is a serial bottleneck for high resolution applications • Prediction is based on reconstructed pixels from left/top neighbors • Small blocks (4x4) are important for visual and R-D performance at high resolutions • We seek to improve parallelism to reduce complexity of worst case intra decode (high resolution with larger number of 4x4 blocks) • Our solution – Parallel Prediction Unit • Defines a CU size within a LCU that can be encoded/decoded in parallel • For CUs that are larger than a defined PPU size, traditional sequential prediction is performed • Benefit: Enables parallelism as dependencies increase within an LCU LCU CU CU CU/PU/TU CU/PU/TU CU CU/PU/TU CU/PU/TU PPU
Parallel Intra Prediction Second set blocks • Goal of CE6.d • Compare the performance of three configurations of parallel intra prediction • Configuration #1: Checker-board Partition • 2X parallelism • Configuration #2: Stripe Partition • 2X parallelism • Configuration #3: No 4x4 intra prediction • Suggested at previous meeting for comparison • 8x8 intra-prediction combined with 4x4 residual • Note: We originally used bi-directional prediction for block 0 in the first configuration. We have removed to ‘seek a simple/design structure’, as requested at the last meeting. First set blocks Fig. 1 Checker-board partition Fig. 2 Stripe partition Fig. 3 No 4x4 prediction
Result (PIP Stripe) • Results • “Stripe” partition performed best • All resolutions • Intra (HE/LC): 1.4% • RA (HE/LC): 0.5% • LD-B HE/LC): 0.2% • HD only • Intra (HE/LC): 1.0% • RA (HE/LC): 0.5% • LD-B HE/LC): 0.2%
Results • Results (for reference) • No 4x4 prediction Stripe partition Disable 4x4 prediction Parallel intra prediction out-performs disabling 4x4 prediction by 2.6% and 3.8% (AI-HE/LC, respectively).
Conclusions • Conclusions • Parallel intra prediction to reduce serial dependencies within current design • Provides parallelism for both encoder and decoder. • Parallelism is achieved by partitioning a PPU into two sets • First set predicted from boundaries of PPU • Second set predicted from all available pixels. • Impact on average BD rate (Stripe Configuration) • Intra (both HE and LC): 1.4% • Random access (both HE and LC): 0.5% • Low delay B (both HE and LC): 0.2% • Related Documents • Verification by Toshiba, Qualcomm and ETRI (JCTVC-F328, F583, F628) • Request to improve parallelization in current design from Zoran (JCTVC-F736) • Propose to adopt this technique to HM.