1 / 22

HW-Accelerated HD video playback under Linux

HW-Accelerated HD video playback under Linux. Zou Nan hai Open Source Technology Center. Thread Dispatcher. Video memory. Data port. Sampler. EU Kernel. Media Engine. 3D. Command Streamer. Indirect data. URB. Media (Video Front End). Thread Spawner. Thread payload. VFE or host.

khoi
Download Presentation

HW-Accelerated HD video playback under Linux

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HW-Accelerated HD video playback underLinux Zou Nan hai Open Source Technology Center

  2. Thread Dispatcher Video memory Data port Sampler EU Kernel Media Engine 3D Command Streamer Indirect data URB Media (Video Front End) Thread Spawner Thread payload

  3. VFE or host IDCT VLD IS IQ MC Mode of operation Coded data Output pixel EU Kernels

  4. IQ MC VLD IS IDCT Current XVMC implementation coded data Host Software per slice data per macroblock data Output pixel EU Kernels

  5. XVMC mpeg stream Media Application decode slice of macro blocks XVMC lib X Server DRI interface render , sync, resource management media commands, video memory management Graphic Hardware

  6. media surface media surface media surface surface state surface state surface state Video Memory Layout command stream VFE state binding tables media pointer command media object command Interface descriptors selected interface flush command EU kernel Instruction

  7. Execute Unit introduction • SIMD code (variable execute size up to 16) with prediction and control mask. • Float and integer data type • Region based direct and indirect register addressing • Support scalar and immediate source operand

  8. EU Registers • GRF (General Register File) • 256 bits per register (g0, g1, g2, gxx) • MRF (Message Register File) • 256 bits per register (m0, m1, m2, mx), write only, • Used to pass payload from thread to shared function unit. • ARF (Architecture Register File) • e.g null, ip and flag register • Immediate • encoded in instruction

  9. 12 4 6 5 0 2 3 13 1 14 8 9 10 11 15 7 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Register Region 2 1 0 0 g0 (256 bits) 1 origin regnum=5, subregnum=2 2 HorzStride=2 VertStride=16 Type=w Width=8 g5.2<16,8,2>w g15.3<16,16,1>UB Regnum.Subregnum<VertStride,Width, HorzStride>Type

  10. Y Y Y Y X X X X W W W W Z Z Z Z Y X W Z Y X Z W W Y Z X Y Z X W Data operation register 0 register 1 vector register 2 register 3 vector Structure of array ( pixel shader and media code) Array of structure ( vertex shader)

  11. Instruction sample register number VertStride prediction register subregister number (f0) add.sat(16) g28.0<2>ub g3.0<16, 16, 1>f g10.0<16, 16, 1>w {align1} HorizStride type Width Access mode execute size

  12. Instruction set • Normal SIMD instructions • add, mul, avg, mov etc • dp3, dp4 etc • Branch control instructions • If,else, do, while, jmpi etc • branch is needed in media code • Send instructions • communicate with shared function units • media kernel use it to control thread life cycle, read and write into surface

  13. Y X X X X X X X X X X X X X X Y X Y Y Y Y X Y Y Y Y Y Y Y Y Y Y + + + + + + + + + + + + + + + + Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Instruction example g3 g4 g10 g28 add.sat(16) g28.0<2>UB g3.0<16, 16, 1>f g10.0<16, 16, 1>W {align1}

  14. An example Input and output payload register passed from inline data, x, y, mv, field flags etc constant data input Y0-Y3 input U Indirect data payload input V reference Y reference U media read from reference surface reference V tmp registers media write to destination surface Result registers, organized in YUV420 format

  15. Planar data vs Packed data • Easy to handle by media kernel • Hard to apply some filters • Can not be directly used as a sampler source in hardware implementation

  16. Work flow B I P P I P slice of macroblocks inline data Indirect data Media write message DCT Data kernel forward reference frame kernel Destination surface kernel backward reference frame Media read message

  17. About XvMC API • Post processing missing in XvMC API design • Video output mixer.

  18. High Level Language • Why a high level language for media kernel is preferred ? • Easy to debug • Easy to reuse code • Hide platform details, easy to understand and maintain • Possible choice • GLSL is not OK • Simple C extension ?

  19. H.264 • Kernels became much more complex because of difference MC and DCT size combination. • Not suitable on slice level API, because of intra prediction. • Need schedule and dependency control ability for media threads because of intra prediction

  20. VAAPI • picture level API • cover mpeg2 h264 vc1 from different entry points • post processing and video output mixer is missing

  21. TODO • IDCT code optimize • Mpeg2 XVMC VLD extension • VAAPI for mpeg2 • VAAPI for AVC • Video post processing and mixer

  22. Q&A Thank You!

More Related