180 likes | 308 Views
A Relational Algebra Processor. 6.375 Final Project Ming Liu, Shuotao Xu. Motivation. Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU DBMS frequently used in analytics and scientific computing, but bottlenecked by:
E N D
A Relational Algebra Processor 6.375 Final Project Ming Liu, ShuotaoXu
Motivation • Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU • DBMS frequently used in analytics and scientific computing, but bottlenecked by: • Processor speed, software overhead, latency & bandwidth • Proposal: FPGA Based Relational Algebra Processor Host PC (DBMS) FPGA Relational Algebra Processor Physical Storage
Background|Relational Algebra (RA) • Many database queries are fundamentally decomposable to five basic RA operators • Although SQL is capable of much more Design dedicated processors on the FPGA for each operator
Project Goal • Design and implement an in-memory relational algebra processor on the FPGA • Explore the types of queries that can benefit from FPGA acceleration • Secondary: Outperform SQLite! • Some assumptions: • 32-bit wide table entries • Tables fit in memory • Max number of columns is 32 • Read only
Microarchitecture | Top-Level RAProcessor Host PC (C++ functions) Host PC (DBMS) RA Processor RA Processor DRAM Physical Storage PCIe
Microarchitecture | Row Marshaller • Exposes a simple interface for operators to access tables in DRAM • Address translation, burst aggregation, truncation & alignment • Multiplexes requests • Table values sent/received as 32-bit bursts
Microarchitecture | Selection • Filters rows based on predicates (e.g. age < 40) • 16 predicate evaluators • Internally comparators • A tree of gates to qualify the predicates • Max: 4 ORs of 4 ANDs
Microarchitecture | Projection • Select columns of a table • Column mask one-hot encoded • Do not need to buffer row; operate directly on data bursts
Microarchitecture | Binary Operators • Cartesian Product, Union, Difference and Deduplication • Nested loop implementation
Microarchitecture|Inter-operatorBypassing • Operators enabled concurrently; data passed between operators • No intermediate storage • Conditions: • A singly link of unary operators • Each operator has a single target output • No structural hazard • Software reorders and schedules the RA commands • Data source/destination encoded in command
Microarchitecture|Inter-operatorBypassing • Multiple 32-bit wide output FIFOs to other operators
Implementation Evaluation • Timing • Maximum Frequency: 55.786MHz • Critical Path: Row Marshaller mux • Area • Slice Registers: 50% • LUTs: 85% • BRAM/FIFOs: 47%
Performance Benchmark | Setup • SQLite • Internal SQLite timer to report execution time of the query • Thinkpad T430, Core i7-3520M @ 2.90Ghz, 1x8GB DDR3-1600 • RA Processor • Performance counters: cycles from start to ack of an operator
Performance Benchmark | Results • Limitation: Memory Bandwidth: 200MB/s vs 12.8GB/s
Performance Benchmark | Results • Select operator most competitive with SQLite • What happens with more predicates?
Improvements • Increasing data burst width • 32-bit to 256-bit: potential 8x speedup • Area/critical path increase • Maximizing memory bandwidth • Additional row buffers to buffer data from DDR2 Memory • Larger, faster DRAM; Higher clock speed
Conclusion & Future Work • Complex filtering operations performs well on the FPGA • Better than SQLite with sufficient memory bandwidth • Data intensive operators do not perform well • Future opportunities: • An accelerator alongside SQLite • Integration with HDD/SSD controller