1 / 18

A Relational Algebra Processor

A Relational Algebra Processor. 6.375 Final Project Ming Liu, Shuotao Xu. Motivation. Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU DBMS frequently used in analytics and scientific computing, but bottlenecked by:

lexi
Download Presentation

A Relational Algebra Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Relational Algebra Processor 6.375 Final Project Ming Liu, ShuotaoXu

  2. Motivation • Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU • DBMS frequently used in analytics and scientific computing, but bottlenecked by: • Processor speed, software overhead, latency & bandwidth • Proposal: FPGA Based Relational Algebra Processor Host PC (DBMS) FPGA Relational Algebra Processor Physical Storage

  3. Background|Relational Algebra (RA) • Many database queries are fundamentally decomposable to five basic RA operators • Although SQL is capable of much more Design dedicated processors on the FPGA for each operator

  4. Project Goal • Design and implement an in-memory relational algebra processor on the FPGA • Explore the types of queries that can benefit from FPGA acceleration • Secondary: Outperform SQLite! • Some assumptions: • 32-bit wide table entries • Tables fit in memory • Max number of columns is 32 • Read only

  5. Microarchitecture | Host Software FPGA

  6. Microarchitecture | Top-Level RAProcessor Host PC (C++ functions) Host PC (DBMS) RA Processor RA Processor DRAM Physical Storage PCIe

  7. Microarchitecture | Row Marshaller • Exposes a simple interface for operators to access tables in DRAM • Address translation, burst aggregation, truncation & alignment • Multiplexes requests • Table values sent/received as 32-bit bursts

  8. Microarchitecture | Selection • Filters rows based on predicates (e.g. age < 40) • 16 predicate evaluators • Internally comparators • A tree of gates to qualify the predicates • Max: 4 ORs of 4 ANDs

  9. Microarchitecture | Projection • Select columns of a table • Column mask one-hot encoded • Do not need to buffer row; operate directly on data bursts

  10. Microarchitecture | Binary Operators • Cartesian Product, Union, Difference and Deduplication • Nested loop implementation

  11. Microarchitecture|Inter-operatorBypassing • Operators enabled concurrently; data passed between operators • No intermediate storage • Conditions: • A singly link of unary operators • Each operator has a single target output • No structural hazard • Software reorders and schedules the RA commands • Data source/destination encoded in command

  12. Microarchitecture|Inter-operatorBypassing • Multiple 32-bit wide output FIFOs to other operators

  13. Implementation Evaluation • Timing • Maximum Frequency: 55.786MHz • Critical Path: Row Marshaller mux • Area • Slice Registers: 50% • LUTs: 85% • BRAM/FIFOs: 47%

  14. Performance Benchmark | Setup • SQLite • Internal SQLite timer to report execution time of the query • Thinkpad T430, Core i7-3520M @ 2.90Ghz, 1x8GB DDR3-1600 • RA Processor • Performance counters: cycles from start to ack of an operator

  15. Performance Benchmark | Results • Limitation: Memory Bandwidth: 200MB/s vs 12.8GB/s

  16. Performance Benchmark | Results • Select operator most competitive with SQLite • What happens with more predicates?

  17. Improvements • Increasing data burst width • 32-bit to 256-bit: potential 8x speedup • Area/critical path increase • Maximizing memory bandwidth • Additional row buffers to buffer data from DDR2 Memory • Larger, faster DRAM; Higher clock speed

  18. Conclusion & Future Work • Complex filtering operations performs well on the FPGA • Better than SQLite with sufficient memory bandwidth • Data intensive operators do not perform well • Future opportunities: • An accelerator alongside SQLite • Integration with HDD/SSD controller

More Related