The power of C++ Project Austin app

The power of C++ Project Austin app Ale Contenti Visual C++ | Principal Dev Manager 4-001

Diving deep into project Austin • What’s Austin • Why we built it • C++ at work • Go build amazing apps!

Austin • Austin is a digital note-taking app for Windows 8 • You can add pages to your notebook, delete them, or move them around • You can use digital ink to write or draw things on those pages • You can add photos from your computer, from SkyDrive, or directly from your computer's camera • You can share the notes you create to other Windows 8 apps such as Email or SkyDrive • Beautiful and simple

Austin: just a pen and a piece of paper

Austin: why we built it • We used Visual C++ 2012 to build an amazing app: • Written in “modern C++” • DirectX, XAML for UI • C++/CX to interact with WinRT • Auto-vectorizer for faster ink smoothing • C++ AMP for faster page curling • …and it was fun  (the code is available on codeplex, too) • Showcase the power of Windows 8, the native platform and C++

Modern C++DirectX and XAML UIC++/CX layer

Modern C++ • We strived to write Austin in a “modern” way: • C++ Standard Library, augmented with PPL and Boost • Smart pointers instead of raw pointers • Pervasive RAII pattern • Handle errors using C++ exceptions • Coding conventions inspired by Boost • No bare pointers, no delete

DirectX and XAML • DirectX to create an immersive, fluid user interface,that's built as a 3D scene with lights, shadows, and a camera • On the DirectX render target, we draw notebook's pages, photos, ink strokes, and background • A 3D engine library abstracts some of the DirectX complexity • DirectX for a fast, fluid, real-to-life experience XAML UI is used for the settings menu, the app bar, and the rest of the user interface The SwapChainBackgroundPanel to host the 3D scene inside the XAML UI page

C++/CX • C++/CX is used at the “boundary”, to interact with Windows, (via the WinRT objects) and to leverage XAML UI • Used for loading and saving images, file picker, camera, storage files and folders (SkyDrive, etc.), implementing the “share” contract • Very useful for XAML UI: UI elements and events hook-ups • We were careful in not having C++/CX code “bleed” too much in our Standard C++ code (15 files out of 350) • Windows is the RunTime

Ink smoothing and auto-vectorizer

Ink smoothing: the problem • We have in the order of 5ms or less to smooth the strokes In real time, please… 

Ink smoothing: the code The C++ compiler is obsessed with optimization: In this case, it will auto-vectorize the loop • for (int j=0; j<numPoints; j++) • { • float t = (float)j/(float)(numPoints-1); • smoothedPressure[j] = (1-t)*p2p + t*p3p; • smoothedPoints_X[j] = (2*t*t*t - 3*t*t + 1) * p2x • + (-2*t*t*t + 3*t*t) * p3x • + (t*t*t - 2*t*t + t) * L*(p3x-p1x) • + (t*t*t - t*t) * L*(p4x-p2x); • smoothedPoints_Y[j] = (2*t*t*t - 3*t*t + 1) * p2y • + (-2*t*t*t + 3*t*t) * p3y • + (t*t*t - 2*t*t + t) * L*(p3y-p1y) • + (t*t*t - t*t) * L*(p4y-p2y); • }

Auto-vectorizer(super simplified view) for (i = 0; i < 1000; i++) { C[i] = A[i]+B[i] } for (i = 0; i < 1000; i+=4) { C[i:i+3] = A[i:i+3]+B[i:i+3] } “addps xmm1, xmm0 “ xmm0 + xmm1 xmm1

Auto-vectorizer: info from the compiler When does the auto-vectorizer kick in? On the command line: /Qvec-report:1 will report the vectorized loops /Qvec-report2 will report both vectorizedand non-vectorized loops, and the reason why some loops were not vectorized Refer to the Vectorizer and ParallelizerMessages in MSDN • ink_renderer.cpp(1092) : info C5001: loop vectorized From the build output, with /Qvec-report1:

Auto-vectorizer: it’s not always easy • #include <vector> • void test1() • { • std::vector<int> a(100000), b(10000), c(10000); • for (int i = 0; i < a.size(); ++i) • { • a[i] = b[i] + c[i]; • } • } info C5002: loop not vectorized due to reason ‘501’

Auto-vectorizer: it’s not always easy • #include <vector> • void test1() • { • std::vector<int> a(100000), b(10000), c(10000); • for (int i = 0; i < a.size(); ++i) • { • a[i] = b[i] + c[i]; • } • }

Auto-vectorizer: it’s not always easy • #include <vector> • void test1() • { • std::vector<int> a(100000), b(10000), c(10000); • for (int i = 0, int iMax = a.size(); i < iMax; ++i) • { • a[i] = b[i] + c[i]; • } • } info C5001: loop vectorized

Auto-vectorizer at work in Austin • The compiler will analyze the loop and emit the right code • For the ink-smoothing algorithm, we got a 30% speed-up • For the first part of the page curling algorithm, we got a 175% speed-up • Auto-vectorizer can analyze very complex loops • Always measure with a profiler to understand which loops you need to speed up • Leveragethe Vectorizer and ParallelizerMessages guide for help

Page curling and C++ AMP

Page curling: calculating normals Lots of triangles: we have less than 15ms to “turn a page” in real time; we need to parallelize this algorithm • // pseudo-code • for each triangle{ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position; • Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos); • triangleNormal.normalize(); • vertex1.normal += triangleNormal; vertex2.normal += triangleNormal; vertex3.normal += triangleNormal;} C++ AMP is a good candidate, since the data size is pretty large

Page curling: calculating normals We’re looping over each triangle This set of operations is safe, because it works on a single triangle at each time, no races • // pseudo-code • for each triangle{ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position; • Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos); • triangleNormal.normalize(); • vertex1.normal += triangleNormal; vertex2.normal += triangleNormal; vertex3.normal += triangleNormal;} But here we’re updating vertexes which are shared between triangles -> race! This algorithm only works on a single thread

Page curling: split the loop to make it parallelizable for each triangle for each triangle Calculate triangle normals Calculate triangle normals cache triangle normals Calculate vertex normals for each vertex Calculate vertex normals

First, loop for each triangle… We use C++ AMP • c::array<b::float32, 2> tempTriangleNormals(3, (int)triangleCount()); • parallel_for_each(extent<1>(triangleCount), [=](index<1> idx) restrict(amp){ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position; • Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos); • triangleNormal.normalize(); • tempTriangleNormals[idx] = triangleNormal; • }); Same as before, we calculate the normals for each triangle We collect the normals into a temporary array, which stay in GPU memory

…then, loop for each vertex • parallel_for_each( extent<2>(vertexCountY, vertexCountX), [=](index<2> idx) restrict(amp){ Normal vertexNormal = vertexNormalView(idx); • // go find the normals from nearby trianglesvertexNormal+= sumTriangleNormals(idx); • vertexNormal.normalize(); • vertexNormalView(idx) = vertexNormal;}); We go over each vertex, so no races In sumTriangleNormals, we fetch the normals from tempTriangleNormals, i.e., the temporary we kept on the GPU memory

Page curling: C++ AMP at work • Massive Parallelism with GPU and WARP • Running this algorithm on the GPU yields between 3x and 7x speed-ups • CPU is now free to execute other code • Even when DirectX 11 capable GPU hardware isnot present, C++ AMP willfallback to WARP, whichleverages multi-core and SSE2

Key takeaways

Key takeaways • Use modern C++: RAII, r-value references, lambdas, const, Standard C++ Libraries, Boost, other 3rd party libraries, etc. • DirectX for fast and powerful graphics • XAML UI for standard UI elements • C++/CX to talk to Windows, to other components and to other languages (e.g., JS) • Auto-vectorizer and PPL to distribute work on the CPU • C++ AMP to leverage the GPU massively parallel compute power C++ Rocks! Go write great apps!! 

Resources

Related Sessions • Tue/5:45/B92 OdysseyConnecting C++ Apps to the Cloud via Casablanca • Wed/11:15/B92 OdysseyIt’s all about performance: Using Visual C++ 2012 to make the best use of your hardware • Wed/1:45/B92 StingerDirectX Graphics Development with Visual Studio 2012

Related Sessions • Wed/5:15/B33 CascadeDiving deep into C++ /CX and WinRT • Thu/5:15/B92 Nexus/NormandyBuilding a Windows Store app using XAML and C++ - Photo app, the hiloproject • Fri/12:45/B33 McKinleyThe Future of C++

Resources • vcblog • Project Austin Part 1 of 6: Introduction • Project Austin on CodePlex • Auto-Vectorizer in Visual Studio 2012 • C++ AMP in a nutshell • Parallel Patterns Library (PPL) • alecont@microsoft.com Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions

Participate in Design Research Experience development tools and features early in their design and development Influence future design decisions MICROSOFT DEVELOPER DIVISION DESIGN RESEARCH FILL IT ONLINE AT http://bit.ly/x6dtHt ENROLL TODAY!

Appendix

Ink smoothing: the math • Line must be contiguous, as well as first and second derivatives • We approximate with the “cardinal” spline solution • With auto-vectorizer, we get a nice 30% speed-up

Page curling: how do we turn the page • Brilliant paper by Hong et. al., Turning Pages of 3D Electronic Books • Turning a page of a physical book can be simulated as deforming a page around a cone • Each “page” in Austin is made of a bunch of triangles • In C++, we apply the page turning algorithm to all triangles • The auto-vectorizer comes to rescue again with a sweet 1.7x speed-up 

Page curling: vertex normals and shading • Vertex normals are typically calculated as the normalized average of the surface normals of all triangles containing the vertex • Using this approach, computing the vertex normals on the CPU simply involves iterating over all triangles depicting the page surface and accumulating the triangle normals in the normalsof the respective vertices • To me, the above screams “massive parallel” 

Page curling: C++ AMP • // first calculate the triangle normalsc::array<b::float32, 2> triangleNormals(3, (int)triangleCount()); • c::parallel_for_each(c::extent<1>(triangleCount()), [=, &triangleNormals](c::index<1> idx) restrict(amp){ • b::float32 v1PosX = vertexPositionArray(0, indexArray(2, idx[0])[0]); b::float32 v1PosY = vertexPositionArray(1, indexArray(2, idx[0])[0]); b::float32 v1PosZ = vertexPositionArray(2, indexArray(2, idx[0])[0]); • b::float32 v2PosX = vertexPositionArray(0, indexArray(1, idx[0])[0]); b::float32 v2PosY = vertexPositionArray(1, indexArray(1, idx[0])[0]); b::float32 v2PosZ = vertexPositionArray(2, indexArray(1, idx[0])[0]); • b::float32 v3PosX = vertexPositionArray(0, indexArray(0, idx[0])[0]); b::float32 v3PosY = vertexPositionArray(1, indexArray(0, idx[0])[0]); b::float32 v3PosZ = vertexPositionArray(2, indexArray(0, idx[0])[0]); • b::float32 x1 = v2PosX - v1PosX; b::float32 y1 = v2PosY - v1PosY; b::float32 z1 = v2PosZ - v1PosZ; • b::float32 x2 = v3PosX - v1PosX; b::float32 y2 = v3PosY - v1PosY; b::float32 z2 = v3PosZ - v1PosZ; • // cross them b::float32 x3 = y1 * z2 - z1 * y2;b::float32 y3 = z1 * x2 - x1 * z2;b::float32 z3 = x1 * y2 - y1 * x2; • NORMALIZE(x3, y3, z3); • triangleNormals(0, idx[0]) = x3; triangleNormals(1, idx[0]) = y3; triangleNormals(2, idx[0]) = z3;});

The power of C++ Project Austin app

The power of C++ Project Austin app

Presentation Transcript

The Power of

“ The Power of Scaffolding ” through ……. M U S I C !

Sharpen Up On C The Productivity of VB, The Elegance of Java The Power of Visual C

Programming C# Power Hour

THE POWER OF

SEB C-S measurements of power IGBTs

The Power of…

The Power of

The Power of…

The Power of …

The Power of

Exploring The Power of C!

The Power of Power Team

THE POWER OF…

The Power of the Power of K Panel Discussion

The Power of

The Power of the “Super 6” Power Program

The Power of

POWER PMAC C/C# TRAINING

THE POWER OF

The Power of Death The Power of Jesus’ Love The Power of Love in Action

The Power of Vitamin C