SQL vs GPU

SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy

Rationale • Terabyte-scale data is best managed in a DBMS • Some classes of computation are best implemented using a GPU

How to do GPU computation in a terabyte-scale database • Do the computation “outside” the database (export the data / compute / re-import the data) • Do the computation “inside” the database • Something in between

SQLCLR out-of-process server The basic concept • Implement computational functionality in a separate process from the SQL Server process • Access that functionality using interprocess communication (IPC) SQL Server Out-of-process server IPC SQL code Special-case functionality SQLCLR procedure or function Why? • Avoid memory, threading, and permissions restrictions on SQLCLR implementations • Load dynamic-link libraries • Invoke native-code methods • Exploit lower-level APIs (e.g. SqlClient, bulk insert) to facilitate data movement between SQL Server and non-SQL computational resources

SQLCLR out-of-process server  implementation SQL Server Out-of-process server IPC SQL code CUDA functionality SQLCLR procedure SQL declare @sql nvarchar(max) set @sql = N'exec SqA1.dbo.SWGPerf @qOffset=311, @chrNum=7, @dimTile=128' exec dbo.ExecSWG @sqlCmd=@sql, @targetTable='##tmpX08' SQLCLR implementation (in C#) • Allocates IPC buffer • Initializes IPC buffer with the result set from a specified SQL query • Signals out-of-process server • Waits for response from out-of-process server • Returns data from the IPC buffer as a SQL result set Out-of-process server implementation (in C#) • Loads the CUDA implementation (dynamic link library) • Invokes a method in the DLL with a reference to the IPC buffer • Waits for completion • Signals the SQLCLR implementation • Stand-alone application or Windows service CUDA implementation (in C++) • Implements GPU functionality • Uses the IPC buffer for both incoming and outgoing data

A test case: genomic sequence data • Nucleic acid sequence data • Data is processed in a “loose” workflow by a sequence of software tools • Example: • Align short sequences to a long reference sequence • Identify alignment regions of interest • Map regions of interest to annotation data (genome map, known variations, etc.)

A test case: genomic sequence data • Typical data-management scenario: • All data is maintained in file-system files • Processing software reads and writes files • “Database-like” operations (e.g. joins) are performed using command-line tools (e.g. Perl scripts) • Can we use a DBMS as a repository? • All data is maintained in the DBMS • Processing software reads and writes tables • “Database-like” operations (e.g. joins) are performed using SQL

A test case: genomic sequence data • Nucleic acid sequence data • DBMS: data repository • GPU: implementation of sequence alignment algorithms • Gapped alignment: Smith Waterman • Non-gapped alignment: bit vector mapping through a hash table

A test case: genomic sequence data • Results: so far, so good … • Sequence data can be managed using a DBMS • Use low-level (C#, C++) implementations to manipulate string and binary data • Sequence data from the DBMS can be processed using GPU implementations • GPU sequence alignment implementations can exploit task parallelism and execute 10x (or more) faster than equivalent CPU implementations • But … • DBMS data abstractions (tables, columns) must be mapped to and from native GPU data formats and layouts • Data movement out of and into the DBMS is slow

Going forward… • Compromise: maintain GPU-formatted data as BLOBs in the file system • BLOBs are in database “native” format • BLOBs correspond to database tables or table partitions • Much programming and testing remain to be done…

SQL vs GPU

SQL vs GPU

Presentation Transcript

GPU

Accelerating SQL Database Operations on a GPU with CUDA

GPU

Amazon RDS vs. SQL Azure

Why GPU?

SQL Trace vs Extended Events

GPU Computing

SQL Server 2012 vs. DataKeeper

A Discussion of CPU vs. GPU

GPU Brainstorming

GPU

Oracle vs. SQL Server

Cloud SQL Google vs EC2

GPU vs. CPU

FPGA vs. GPU for Sparse Matrix Vector Multiply

Oracle vs. SQL Server

GPU

GPU Manufacturers -JETALL GPU

GPU MANUFACTURERS WITH JETALL GPU

Azure SQL Database vs Azure SQL Data Warehouse

GPU