1 / 45

Performance tips for Windows Store apps using DirectX and C++

Performance tips for Windows Store apps using DirectX and C++. Max McMullen Principal Development Lead – Direct3D Microsoft Corporation 4-102. Agenda. Overview Measuring rendering performance Power efficient GPU characteristics Optimizing for power efficient GPUs. Overview.

rance
Download Presentation

Performance tips for Windows Store apps using DirectX and C++

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance tips for Windows Store apps using DirectX and C++ Max McMullen Principal Development Lead – Direct3D Microsoft Corporation 4-102

  2. Agenda • Overview • Measuring rendering performance • Power efficient GPU characteristics • Optimizing for power efficient GPUs

  3. Overview

  4. Optimizing for the Windows 8/RT OS • New form factors and platforms require new optimizations • Windows uses DirectX to get every pixel on screen • Direct3D 11.1 provides new APIs to optimize rendering

  5. Use optimized Windows 8/RT platforms • All Windows Store apps use DirectX for rendering • WWA & XAML optimized use of Direct2D and Direct3D 11.1 • Direct2D and Direct2D Effects fully leverage Direct3D 11.1 • But sometimes you really need to use Direct3D itself…

  6. What you should know Basics of building a C++ Windows Store app Direct3D fundamentals

  7. Measuring rendering performance

  8. How do you measure rendering performance? • Many useful tools for Windows performance optimization: • Visual Studio Performance Profiler, Visual Studio Graphics Diagnostics, hardware partner tools… • Two primary tools used to optimize Direct3D usage in the Windows 8/RT OS: • Basic: FPS/time measurement in app/microbenchmarks • Advanced: GPUView

  9. Frames per second (FPS) • Quick but sometimes misleading • C++/DirectX Windows Store apps sync to the display refresh • Measure render time, not present • Call ID3D11DeviceContext::Flush instead of IDXGISwapchain::Present • Infrequent output: file output • Frequent output: look at FPSCounter.cpp in the GeometryRealization sample

  10. Demo: FPS measurement

  11. GPUView • Part of the Windows Performance Toolkit • ETW Logging of CPU and GPU work • Measures graphics performance • FPS, startup time, glitching, render time, latency • Enables detailed analysis of CPU and GPU workloads and interdependencies

  12. GPUView – Record and Analyze • Install • x86: Windows Performance Toolkit • ARM: Windows Kits\8.0\Windows Performance Toolkit\Redistributables\WPTarm-arm_en-us.msi • Record • Run log.cmd to start • Perform action • Run log.cmd to stop • Analyze • Data captured in merged.etl, load in GPUView

  13. GPUView - Interface GPU Hardware Queue Flip Queue CPU Queues CPU Threads

  14. GPUView Interface: GPU Hardware Queue • The GPU Hardware Queue shows command buffers rendering on the GPU. • CPU Queue command buffers moved to the GPU Hardware Queue when the hardware is ready to receive more commands.

  15. Demo: GPUView

  16. Power efficient GPU characteristics

  17. What to expect with power efficient GPUs • Feature level 9_1 or 9_3 • Limited available bandwidth • Both immediate render and tiled render GPUs • Limited shader instruction throughput

  18. Feature Level 9.x (FL9.1, FL9.3) • Real-time render limitations generally occur before reaching these maximums

  19. GPU Memory Bandwidth • Baseline requirement: 1.9 GB/sec benchmarked • 7.5 I/O operation per screen pixel, 1366x768x32bpp@60hz

  20. Immediate render GPU shadercores Graphics memory Memory bus

  21. Tiled render GPU shader cores Graphics memory Memory bus

  22. Tiled render GPU shader cores Graphics memory Memory bus

  23. Tiled render GPU shader cores Graphics memory Memory bus

  24. Shader instruction throughput • Fill rates on GPUs depend on a number of factors • Memory bandwidth • Blend mode • Shader cores • Shader complexity • Etc • Power efficient GPUs become shader throughput bound at approximately ~4 pixel shader instructions

  25. Optimizing for low power GPUs

  26. Bandwidth optimization: basics • Render opaque objects front-to-back with z-buffering • Disable alpha blending for opaque objects • Use geometry to trim large transparent areas

  27. Bandwidth optimization: compress resources • Direct3D supports texture compression at all feature levels • BC1 4-bits/pixel for RGB formats - 6x compression ratio • BC2,3 8-bits/pixel for RGBA formats - 4x compression ratio • Smaller resources also means faster downloads of your app

  28. Bandwidth optimization: quantize resources • Use the 16 bit formats added to Direct3D 11.1: • DXGI_FORMAT_B5G6R5_UNORM • DXGI_FORMAT_B5G5R5A1_UNORM • DXGI_FORMAT_B4G4R4A4_UNORM

  29. Bandwidth optimization: flip present • Must use DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL • OS automatically uses “fullscreen” flips when: • Swapchain buffer dimensions match the desktop resolution • Swapchain format is DXGIFMT_B8G8R8A8_UNORM* • App is the only content onscreen • Buffer dimensions need to be converted correctly from device independent pixels (dips) • Just create the swapchain with zero width and height to get the right size

  30. using namespace Windows::Graphics::Display; • float ConvertDipsToPixels(float dips) • { • static const float dipsPerInch = 96.0f; • return floor(dips*DisplayProperties::LogicalDpi/dipsPerInch+0.5f); • } • … • Platform::Agile<Windows::UI::Core::CoreWindow> m_window; • float swapchainWidth= ConvertDipsToPixels(m_window->Bounds.Width); • float swapchainHeight= ConvertDipsToPixels(m_window->Bounds.Height);

  31. Demo: Optimized flip presents

  32. Bandwidth optimization: tiled render GPUs • Minimize command buffer flushes • Don’t map resources in use by the GPU, use DISCARD and NO_OVERWRITE • Minimize scene flushes • Visit RenderTargets only once per frame • Don’t update resources in use by the GPU from the CPU, use DISCARD and NO_OVERWRITE with ID3D11DeviceContext::CopySubresourceRegion1 • Use scissors when updating small portions of a RenderTarget

  33. Bandwidth optimization: tiled render GPUs • New Direct3D APIs provide hints to avoid unnecessary copies • Rendering artifacts if used incorrectly

  34. Bandwidth optimization: Discard* APIs • m_swapChain->Present(1, 0); // present the image on the display • ComPtr<ID3D11View> view; • m_renderTargetView.As(&view); // get the view on the RT • m_d3dContext->DiscardView(view.Get()); // discard the view Use ID3D11DeviceContext1::DiscardView and ID3D11DeviceContext1::DiscardResource1 to prevent unnecessary tile copies Artifacts if used incorrectly

  35. Tiled render GPU shader cores Graphics memory Memory bus

  36. Tiled render GPU shader cores Graphics memory Memory bus

  37. Shader instruction throughput • Power efficient GPUs have limited throughput for full precision • Minimum precision hints increase throughput when precision doesn’t matter • Specifies minimum rather than actual precision • min16float, min16int, min10int • Don’t change precision often • 20-25% improvement in practice with min16float

  38. Minimum precision • static constfloatbrightThreshold = 0.5f; • Texture2D sourceTexture : register(t0); • float4 DownScale3x3BrightPass(QuadVertexShaderOutput input) : SV_TARGET • { • float3brightColor = 0; • // Gather 16 adjacent pixels (each bilinear sample reads a 2x2 region) • brightColor = sourceTexture.Sample(linearSampler, input.tex, int2(-1,-1)).rgb; • brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1,-1)).rgb; • brightColor += sourceTexture.Sample(linearSampler, input.tex, int2(-1, 1)).rgb; • brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1, 1)).rgb; • brightColor /= 4.0f; • // Brightness thresholding • brightColor = max(0, brightColor - brightThreshold); • return float4(brightColor, 1.0f); • }

  39. Minimum precision • static constmin16floatbrightThreshold = (min16float)0.5; • Texture2D<min16float4> sourceTexture : register(t0); • float4 DownScale3x3BrightPass(QuadVertexShaderOutput input) : SV_TARGET • { • min16float3brightColor = 0; • // Gather 16 adjacent pixels (each bilinear sample reads a 2x2 region) • brightColor = sourceTexture.Sample(linearSampler, input.tex, int2(-1,-1)).rgb; • brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1,-1)).rgb; • brightColor += sourceTexture.Sample(linearSampler, input.tex, int2(-1, 1)).rgb; • brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1, 1)).rgb; • brightColor /= (min16float)4.0; • // Brightness thresholding • brightColor = max(0, brightColor - brightThreshold); • return float4(brightColor, 1.0f); • }

  40. Minimum precision – bad usage • static constmin16floatbrightThreshold = (min16float)0.5; • Texture2D<min16float4> sourceTexture : register(t0); • float4 DownScale3x3BrightPass(QuadVertexShaderOutput input) : SV_TARGET • { • min16float3brightColor = 0; • // Gather 16 adjacent pixels (each bilinear sample reads a 2x2 region) • brightColor = sourceTexture.Sample(linearSampler, input.tex, int2(-1,-1)).rgb; • brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1,-1)).rgb; • brightColor += sourceTexture.Sample(linearSampler, input.tex, int2(-1, 1)).rgb; • brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1, 1)).rgb; • brightColor /= (min10int)4.0; • // Brightness thresholding • brightColor = max(0, brightColor - brightThreshold); • return float4(brightColor, 1.0f); • }

  41. Wrap-up • Optimize! • Use the right tools and techniques to measure performance • Tune for power efficient GPUs’ unique performance characteristics • Direct3D 11.1 and Windows 8 provide the APIs to fully leverage power efficient GPUs

  42. Resources

  43. Build 2012 Talk: 3-113 Graphics with the Direct3D11.1 API made easy • Build 2012 Talk: 3-109 Developing a Windows Store app using C++ and DirectX • Visual Studio 2012 Remote Debugging: http://blogs.msdn.com/b/dsvc/archive/2012/10/26/windows-rt-windows-store-app-debugging.aspx • FPS Counter in GeometryRealization sample: http://code.msdn.microsoft.com/windowsapps/Geometry-Realization-963be8b7#content • GPUView: http://msdn.microsoft.com/en-us/library/windows/desktop/jj585574(v=vs.85).aspx • Direct3D11.1: http://msdn.microsoft.com/en-us/library/windows/desktop/hh404562(v=vs.85).aspx

  44. Resources • Develop: http://msdn.microsoft.com/en-US/windows/apps/br229512 • Design: http://design.windows.com/ • Samples: http://code.msdn.microsoft.com/windowsapps/Windows-8-Modern-Style-App-Samples • Videos: http://channel9.msdn.com/Windows Please submit session evals by using the Build Windows 8 app or at http://aka.ms/BuildSessions

More Related