150 likes | 353 Views
Ingres VectorWise. Presented b y Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011. VECTORWISE - Overview. A relational database software for reporting, data analysis and Business Intelligence Meaning it has to analyze terabytes of data
E N D
Ingres VectorWise Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31st 2011
VECTORWISE - Overview • A relational database software for reporting, data analysis and Business Intelligence • Meaning it has to analyze terabytes of data • VectorWise recently set a new record on the TPC-H benchmark
VECTORWISE USE CASES • Financial services like banks, wall street • Desirable to query historical data as well as current positions • Data volume is simply too large to store cost effectively in memory • VectorWise delivers in-memory performance with data stored on-disk • Social media: for example for advertisement • E-commerce • Performance of database suffers as the amount of historical data grows • VectorWise is able to deliver good performance even when analyzing large amount of data
VECTORWISE KEY FEATURES • SIMD(SINGLE INSTRUCTION MULTIPLE DATA) • SIMD instructions allows the same operation performed on multiple data simultaneously • Traditional databases process data one tuple at a time • Vectorwise processes vector of hundreds of element at once • Using large CPU cache as execution memory • Size of vector is tuned to fit into cache • HARDWARE ACCELERATED STRING-BASED OPERATIONS • Supported by Intel Xeon processor • Speeds up operations like: • Selections on strings using wild card matching • Aggregations on string-based values • Joins or sorts using string keys • Up to 2 – 4 times faster
VECTORWISE KEY FEATURES • Use of COLUMN-BASED storage • For data warehouse databases, most queries retrieve many rows • Row-based storage would generate a lot of unnecessary I/O • Column-based storage is generally accepted as a superior storage model for this type of workload
VECTORWISE KEY FEATURES • VectorWise’s Hybrid Column Store • By default data is stored column by column • For tables that are indexed on more than one column, indexed columns are stored together in the same block • This data storage model is known as PAX (Partition Attributes Across) • PAX delivers better cache performance
VECTORWISE KEY FEATURES • Guarantees ACID properties • Supports multi-version read consistency • Use of POSITIONAL DELTA TREES(PDTs) • Small inserts, updates or deletes is expensive in column-based database (as opposed to large bulk data load operations) • PDT is an in-memory structure that stores the position and the change (delta) at that position • PDTs use a configurable amount of memory. Once the memory pool is exhausted, updates are written to disk
VECTORWISE KEY FEATURES(CONT’D) DATA COMPRESSION - VectorWise compresses data on a column-by-column basis using these any one of these algorithm: RLE(Run Length Encoding, PFOR(Patched Frame Of Reference or delta encoding on top of PFOR)) - For instance the VectorWise Innovated use of data compression in order to improve performance by allocating a portion of physical memory for a memory-bases disk buffer called the CBM(Column Buffer Manager). The data is automatically pr-fetched from disk and stored in the CBM.
VECTORWISE KEY FEATURES(CONT’D) STORAGE INDEXES storage indexes in extreme cases can provide the same benefit as data partitioning does for other databases w/o the overhead of multiple database object or maintaining a partitioning strategy. • VectorWise automatically maintains a storage index per column storing minimum and maximum values for the data block. • Very efficient in determining whether a database block is a candidate block for a particular query.
VECTORWISE KEY FEATURES(CONT’D) PARALLEL EXECUTION • Parallel execution provides the greatest performance improvements in DSS (Decision support system) and data warehousing environments. The VectorWise engine is able to sustain a large amount of concurrent queries efficiently on a multi-core system • Ex.of Parallel Execution Server Connections and Buffers
performance • New record set by Ingres for the TPC-H benchmark at the 100GB scale factor is an astounding 3.4 times faster than the old mark. • New record of 251,561 QphH (Queries per hour) for 100 GB of data was set by Ingres's VectorWise database running on one HP ProLiant DL380 G7.
Advantage of using VectorWise software • Enables you to a workload on a server • Can lower the cost instantly by better utilizing your hardware (dynamic). • Achieve extremely fast performance for typical data warehouse and data mart workload.
WORK CITED • http://www.itwire.com/business-it-news • INGRES VectorWise Whitepaper, a technical whitepaper • http://uhesse.wordpress.com • http://kerryosborne.oracle-guy.com/2010/08/oracle-exadata-storage-indexes/ • http://www.openexpo.ch/fileadmin/documents/2010Bern/Slides/25_OlafLaber.pdf • http://www.wikipedia.org • Ailamaki, Anastassia. A Storage Model to Bridge the Processor/Memory Speed Gap. Carnegie Mello University, 2001 • ANY QUESTION?