240 likes | 495 Views
Doug Inkster – Ingres Development. Ingres/ VectorWise. Abstract. Ingres Corp. recently announced a cooperative project with the database research firm VectorWise of Amsterdam. This session discusses the nature of the relationship between Ingres and VectorWise and its impact on Ingres users.
E N D
Doug Inkster – Ingres Development Ingres/VectorWise
Abstract • Ingres Corp. recently announced a cooperative project with the database research firm VectorWise of Amsterdam. This session discusses the nature of the relationship between Ingres and VectorWise and its impact on Ingres users
Overview • What is VectorWise • Significance of project (to existing users) • Column store v.s. row store • VectorWise innovations • Ingres/VectorWise interface details
VectorWise • Small Dutch startup spun from CWI • Currently 6-7 employees • Exciting development project based on Ph. D. research of Marcin Zukowski under guidance of Peter Boncz • Ingres provided seed money and currently has exclusive rights to their technology
New Ingres Storage Type • VectorWise technology will take the form of new Ingres storage type • Column store – but turbo-charged • Users will just define tables as usual, but with new storage type • Extremely fast
Performance in Traditional RDBMS • TPC H query 1: 6 million rows • MySQL: 26.2 seconds • DBMS x: 28.1 seconds • Hand coded C program: 0.2 seconds • Initiated thought about why traditional RDBMS is so slow
Row Stores – Storage Model • Row store : data stored on disk and processed by query engine in the form of rows • Ingres and all other commercial RDBMS’ follow same model • Data stored on disk as rows packed into blocks/pages • Extracted from pages and passed between query plan operations (joins, sorts, etc.) as rows • Techniques improved over time, but same basic model as research engines of 1970’s (System R, Berkeley Ingres)
Row Stores – Storage Model • Select a, b from c: • Even if 200 columns in c, reads whole row image for all qualified rows • Inefficient use of disk (excessive I/O) • Inefficient use of memory (large row-sized buffers)
Row Stores – Execution Model • Select a*b from c where d+e >= 250: • Execution model turns column expressions into pseudo-code (ADF in Ingres) • Pseudo-code evaluation almost always by interpretation – one row at a time • For each instruction, operand addresses are built (using known buffers) – all row induced overhead • Big switch on “opcode” (integer addition, float compare, type coercions, etc.) • Very poor locality (code and data), very inefficient • No benefit from modern hardware features (cache, instruction pipelines, etc.)
Row Stores - Performance • Poor disk bandwidth • High instructions per tuple • Poor locality (data and instructions), poor exploitation of newer hardware – high cycles per instruction • Extremely high cycles per tuple!! • Designed for OLTP, not large scale analytical processing
Column Stores – Storage Model • Derived from research in 1990’s • One of the earliest was MonetDB – Peter Boncz’s thesis • Data stored in columns (all values from a column in contiguous storage) • Ordinal position in column store dictates row • Only columns actually requested are accessed (compare to “select a, b from c” example) • Far more efficient disk usage
Column Stores – Execution Model • Some column stores (Sybase IQ, Vertica) just read columns from disk and re-compose rows in memory • Execution model same as row stores • Only deal with I/O problem
MonetDB – Improved Performance • Column store improved I/O performance • New execution model improved CPU performance • Column wise (not row wise) computation • Exploitation of newer hardware, compiler features
MonetDB – Improved Performance • CPU Efficiency depends on “nice” code • out-of-order execution • few dependencies (control,data) • compiler support • Compilers love simple loops over arrays • loop-pipelining • automatic SIMD
MonetDB – Expression Execution SELECT id, name, (age-30)*50 as bonus FROM people WHERE age > 30 /* Returns vector of oid’s satisfying comparison ** and count of entries in vector. */ int select_gt_float( oid* res, float* column, float val, int n) { for(int j=0,i=0; i<n; i++) if (column[i] >val) res[j++] = i; return j; } Compiles into loop that performs many (10, 100, 1000) comparisons in parallel
MonetDB - Problem • Operations required on whole columns at a time • Lots of expensive intermediate result materialization • Memory requirements • Doesn’t scale into very large databases • Still way faster than anything before • Q1: MySQL – 26.2 seconds • MonetDB - 3.7 seconds
MonetDB/X100 - Innovations • Fix MonetDB scaling problem • Columns broken into “chunks” or vectors (sized to fit cache) • Expression operators execute on vectors • Most primitives take just 0.5 (!) to 10 cycles per tuple • 10-100+ times faster than tuple-at-a-time • Other performance improvements • Tuned to disk/memory/cache • Lightweight compression to maximize throughput • Cooperative scans (future) • Q1: MySQL – 26.2 seconds • MonetDB – 3.7 seconds • X100 – 0.6 seconds!
MonetDB/X100 – Hardware Trends • CPU • Increased cache, memory (64-bit) • Instruction pipelining, multiple simultaneous instructions, SIMD (single instruction, multiple data) • Disk • Increased disk capacity • Increased disk bandwidth • Increased random access speeds – but not as much as bandwidth • Sequential access increasing in importance
MonetDB/X100 - Innovations • Reduced interpretation overhead • 100x less Function Calls • Good CPU cache use • High locality in the primitives • Cache-conscious data placement • No Tuple Navigation • Primitives only see arrays • Vectorization allows algorithmic optimization • CPU and compiler-friendly function bodies • Multiple work units, loop-pipelining, SIMD…
Ingres/VectorWise Integration • Ingres offers: • Infrastructure • User interfaces, SQL compilation, utilities, etc. • Existing user community • VectorWise offers: • Execution engine (QEF/DMF/ADF equivalent) • Its own relational algebraic language
Ingres/VectorWise Integration • SQL query parsed, optimized in Ingres • Ingres builds query plan from OPF-generated QEP • Query plan contains cross-compiled VectorWise query syntax • QEF passes query to VectorWise engine (and gets out of the way!) • VectorWise passes result set back to Ingres to be returned to caller (array of result rows – mapped to Ingres fetch buffers, many rows at a time)
Ingres/VectorWise Integration • Currently: • Create table including unique/referential constraint definitions (with structure = …) • Copy table • Drop table • Select • Insert • Update • Delete • Create table … as select … • Insert into … select … • Create index (VectorWise style)
Ingres/VectorWise – Initial Release • Individual queries limited to either VectorWise or Ingres native tables • Probably lifted in the near future • Seamless as possible • Integrated ingstart/ingstop • Same utilities • Same user interfaces • Same recovery processes
Ingres/VectorWise - Conclusions • Tremendously exciting development in Ingres • Industry leading technology • Opens up new applications • Reduces pressure for new hardware