1 / 24

Ingres/ VectorWise

Doug Inkster – Ingres Development. Ingres/ VectorWise. Abstract. Ingres Corp. recently announced a cooperative project with the database research firm VectorWise of Amsterdam. This session discusses the nature of the relationship between Ingres and VectorWise and its impact on Ingres users.

mickey
Download Presentation

Ingres/ VectorWise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doug Inkster – Ingres Development Ingres/VectorWise

  2. Abstract • Ingres Corp. recently announced a cooperative project with the database research firm VectorWise of Amsterdam. This session discusses the nature of the relationship between Ingres and VectorWise and its impact on Ingres users

  3. Overview • What is VectorWise • Significance of project (to existing users) • Column store v.s. row store • VectorWise innovations • Ingres/VectorWise interface details

  4. VectorWise • Small Dutch startup spun from CWI • Currently 6-7 employees • Exciting development project based on Ph. D. research of Marcin Zukowski under guidance of Peter Boncz • Ingres provided seed money and currently has exclusive rights to their technology

  5. New Ingres Storage Type • VectorWise technology will take the form of new Ingres storage type • Column store – but turbo-charged • Users will just define tables as usual, but with new storage type • Extremely fast

  6. Performance in Traditional RDBMS • TPC H query 1: 6 million rows • MySQL: 26.2 seconds • DBMS x: 28.1 seconds • Hand coded C program: 0.2 seconds • Initiated thought about why traditional RDBMS is so slow

  7. Row Stores – Storage Model • Row store : data stored on disk and processed by query engine in the form of rows • Ingres and all other commercial RDBMS’ follow same model • Data stored on disk as rows packed into blocks/pages • Extracted from pages and passed between query plan operations (joins, sorts, etc.) as rows • Techniques improved over time, but same basic model as research engines of 1970’s (System R, Berkeley Ingres)

  8. Row Stores – Storage Model • Select a, b from c: • Even if 200 columns in c, reads whole row image for all qualified rows • Inefficient use of disk (excessive I/O) • Inefficient use of memory (large row-sized buffers)

  9. Row Stores – Execution Model • Select a*b from c where d+e >= 250: • Execution model turns column expressions into pseudo-code (ADF in Ingres) • Pseudo-code evaluation almost always by interpretation – one row at a time • For each instruction, operand addresses are built (using known buffers) – all row induced overhead • Big switch on “opcode” (integer addition, float compare, type coercions, etc.) • Very poor locality (code and data), very inefficient • No benefit from modern hardware features (cache, instruction pipelines, etc.)

  10. Row Stores - Performance • Poor disk bandwidth • High instructions per tuple • Poor locality (data and instructions), poor exploitation of newer hardware – high cycles per instruction • Extremely high cycles per tuple!! • Designed for OLTP, not large scale analytical processing

  11. Column Stores – Storage Model • Derived from research in 1990’s • One of the earliest was MonetDB – Peter Boncz’s thesis • Data stored in columns (all values from a column in contiguous storage) • Ordinal position in column store dictates row • Only columns actually requested are accessed (compare to “select a, b from c” example) • Far more efficient disk usage

  12. Column Stores – Execution Model • Some column stores (Sybase IQ, Vertica) just read columns from disk and re-compose rows in memory • Execution model same as row stores • Only deal with I/O problem

  13. MonetDB – Improved Performance • Column store improved I/O performance • New execution model improved CPU performance • Column wise (not row wise) computation • Exploitation of newer hardware, compiler features

  14. MonetDB – Improved Performance • CPU Efficiency depends on “nice” code • out-of-order execution • few dependencies (control,data) • compiler support • Compilers love simple loops over arrays • loop-pipelining • automatic SIMD

  15. MonetDB – Expression Execution SELECT id, name, (age-30)*50 as bonus FROM people WHERE age > 30 /* Returns vector of oid’s satisfying comparison ** and count of entries in vector. */ int select_gt_float( oid* res, float* column, float val, int n) { for(int j=0,i=0; i<n; i++) if (column[i] >val) res[j++] = i; return j; } Compiles into loop that performs many (10, 100, 1000) comparisons in parallel

  16. MonetDB - Problem • Operations required on whole columns at a time • Lots of expensive intermediate result materialization • Memory requirements • Doesn’t scale into very large databases • Still way faster than anything before • Q1: MySQL – 26.2 seconds • MonetDB - 3.7 seconds

  17. MonetDB/X100 - Innovations • Fix MonetDB scaling problem • Columns broken into “chunks” or vectors (sized to fit cache) • Expression operators execute on vectors • Most primitives take just 0.5 (!) to 10 cycles per tuple • 10-100+ times faster than tuple-at-a-time • Other performance improvements • Tuned to disk/memory/cache • Lightweight compression to maximize throughput • Cooperative scans (future) • Q1: MySQL – 26.2 seconds • MonetDB – 3.7 seconds • X100 – 0.6 seconds!

  18. MonetDB/X100 – Hardware Trends • CPU • Increased cache, memory (64-bit) • Instruction pipelining, multiple simultaneous instructions, SIMD (single instruction, multiple data) • Disk • Increased disk capacity • Increased disk bandwidth • Increased random access speeds – but not as much as bandwidth • Sequential access increasing in importance

  19. MonetDB/X100 - Innovations • Reduced interpretation overhead • 100x less Function Calls • Good CPU cache use • High locality in the primitives • Cache-conscious data placement • No Tuple Navigation • Primitives only see arrays • Vectorization allows algorithmic optimization • CPU and compiler-friendly function bodies • Multiple work units, loop-pipelining, SIMD…

  20. Ingres/VectorWise Integration • Ingres offers: • Infrastructure • User interfaces, SQL compilation, utilities, etc. • Existing user community • VectorWise offers: • Execution engine (QEF/DMF/ADF equivalent) • Its own relational algebraic language

  21. Ingres/VectorWise Integration • SQL query parsed, optimized in Ingres • Ingres builds query plan from OPF-generated QEP • Query plan contains cross-compiled VectorWise query syntax • QEF passes query to VectorWise engine (and gets out of the way!) • VectorWise passes result set back to Ingres to be returned to caller (array of result rows – mapped to Ingres fetch buffers, many rows at a time)

  22. Ingres/VectorWise Integration • Currently: • Create table including unique/referential constraint definitions (with structure = …) • Copy table • Drop table • Select • Insert • Update • Delete • Create table … as select … • Insert into … select … • Create index (VectorWise style)

  23. Ingres/VectorWise – Initial Release • Individual queries limited to either VectorWise or Ingres native tables • Probably lifted in the near future • Seamless as possible • Integrated ingstart/ingstop • Same utilities • Same user interfaces • Same recovery processes

  24. Ingres/VectorWise - Conclusions • Tremendously exciting development in Ingres • Industry leading technology • Opens up new applications • Reduces pressure for new hardware

More Related