Scaling Charts with Design and GPUs

Superconductor Scaling Charts with Design and GPUs Leo Meyerovich (@LMeyerov)CEO of Graphistry.com | UC Berkeley

Visibility

Visibility through design+speed

Histogram of Voter Turnout by Town # Towns ballot box stuffing? Most towns had ~40% people vote 0% 25% 50% 75% 100% Voter Turnout

Tiny square shows town size (area) and vote (color) Incumbent Opposition

Filter for towns w/ high turnout

Tag suspicious with black

For visibility, speed design

Problem: Plot 10+ Time Series Signals

Design  200 Time Series Signals 100 s 0 s 0 s

Speed  Pan/Zoom Interactions 38 s 37 s 37 s

CPU Bottlenecks: naïve and offline Render real-time is30ms Layout Transform Parse 0ms 1600ms

Optimize Binary Data, Multicore Layout, GPU Render • Real-time interaction • Stream from server 12MB+/s Render Layout Prep 0ms 1600ms

Graphs: Placing Nodes and Edges

Direct Feedback on Settings

Uber: Trip Start to End

Direct Edge Placement: Overplotting

Speed  Design  Edge Bundling

 web

Bare Metal in the Browser Sequential 5 X 4 lanes SIMD Multicore 4+ cores GPU 1024 lanes

Superconductor: Parallel JS Viz Engine webpage data viz Parser HTML data CSS styling JS script data styling widgets Parser.js Selectors JavaScript VM GPU Layout Selectors.CL Pixels Renderer Layout.CL SUPERCONDUCTOR.js BROWSER Compiler Renderer.GL

Layout as Parallel Tree Traversals logical joins … x,y logical spawns Leaf w,h Parallelism in each traversal! w,h 1.Works for all data sets 2.Compiler: CSS  Schedule w,h w,h w,h w,h

GPU Traversals: Flat & Level-Synchronous y x Array per attribute h w level 1 Nodes in arraysflat level n parallel for loop level synchronous Tree Compiler handles transform of code & data

More Scalable Designs Immens (Stanford) Nanocubes (AT&T) MapD (MIT) Abstract Rendering (Continuum) Synerscope

Achieve data visibility throughhardware-accelerateddesigns (and deploy on the web  )

Graphistry Visualize Magnitudes More Data in the Browser Leo Meyerovich (@LMeyerov)CEO of Graphistry.com | UC Berkeley

Layout as Parallel Tree Traversals logical joins … x,y logical spawns Leaf w,h Parallelism in each traversal! w,h 1.Works for all data sets 2.Compiler: CSS  Schedule w,h w,h w,h w,h

GPU Traversals: Flat & Level-Synchronous y x Array per attribute h w level 1 Nodes in arraysflat level n parallel for loop level synchronous Tree Compiler handles transform of code & data

Today’s Supercomputer-in-a-Pocket Phone 16-lane CPU 1024-lane GPU core 1 1 L1d: 32KB 4-way SIMD 256-way SIMT 4 3 2 GPGPU core 1 Challenge: Parallelize Data Visualization 2 3 Prefetch Engine 4 L2: 1MB RAM: 2GB

Problem: Dynamic Memory Allocation on GPU? function circ(x,y,r) { buffer = new Array(r * 10) for (i = 0; i < r * 10; i++) buffer[i] = cos(i) } circ(…) oval(…) rect(…); … line(…); … 1.0 0.8 0.5 0.2 0 0.2 dynamic allocation square(…) rect(…); …

Dynamic Allocation as SIMD Traversals allocCirc(…) 4 fillCirc(…) allocRect(…) 7 fillRect(…) 1.0 0.8 0.5 0.2 0 0.2 1.0 0.8 0.5 0.2 1.0 0.8 0.5 0.2 0 0.2 allocRect(…) 6 fillRect(…) allocLine(…) 6 fillLine(…) 1. Prefix sum for needed space 3. Distribute offsets & compute 2. Allocate buffers 4. Give OpenGL buffer pointer

CPUvs. GPUfor Election Treemap: 5 traversals over 100K nodes COMBINED: 54X ! WebCL: 70X WebCL: 30X

Scaling Charts with Design and GPUs

Scaling Charts with Design and GPUs

Presentation Transcript

Fun and Profit With Pareto Charts

Digital Design: ASM Charts

Working with Charts and Graphics

Excel Working with Charts and Graphics

Mr. Scan: Efficient Clustering with MRNet and GPUs

Synchronization 3 and GPUs

PTAS’s with Scaling

Working with Charts

Scalable Data Clustering with GPUs

CHAPTER 19 State Machine Design with SM charts

GPUs and Accelerators

GRAPHICS AND COMPUTING GPUS

Scaling VFFAG eRHIC Design

Streaming Architectures and GPUs

Working with Pictures, Tables, and Charts

Representing Data with Charts and Graphs

Scaling VFFAG eRHIC Design

Working with Charts

Scaling VLSI Design Debugging with Interpolation