280 likes | 387 Views
A Perfect Hybrid. Split query processing in Polybase. biaobiaoqi shenyapeng@gmail.com 2013/4/25. Outline. Background Related Work PDW Polybase Performance Evaluation. Background. Structured data & unstructured data RDBMS & Big Data. RDBMS. Combine. Insight. Hadoop. Related Work.
E N D
A Perfect Hybrid Split query processing in Polybase biaobiaoqi shenyapeng@gmail.com 2013/4/25
Outline • Background • Related Work • PDW • Polybase • Performance Evaluation
Background • Structured data & unstructured data • RDBMS & Big Data RDBMS Combine Insight Hadoop
Related Work • Sqoop: Transferring bulk data between Hadoop and structured data stores such as relational database • Teradata&Asterdata • Greenplum & Vertica: external table • Oracle: external table and OLH(Oracle loader for Hadoop) • IBM: split mechanism to use mapreduce to access appliance • Hadapt(HadoopDB): outset to support the execution of SQL-like queries across both unstructured and structured data sets.
PDW Architecture • Parallel Data Warehouse • Shared-nothing system
Components in PDW • Node • SQL server instance on node • Data are hash-partitioned through compute node • Control node: [PWD Engine in it] • query parsing • Optimization • creating distributed execution plan to compute nodes(DSQL) • tracking execution steps of plan to compute nodes • Compute node: • Storage • Query processing • DMS: Data Movement Service • (1)repartitioning rows of a table among the SQL Server instances on PDW compute nodes. • (2)converting fields of rows being loaded into appliance into the appropriate ODBC types.
Overview of Polybase • A new feature in PDW V2 • Using SQL standard language • Dealing with both structured and unstructured data(in SQL Server and Hadoop) • Split query processing paradigm • leveragesthe capabilities of SQL Server PDW, especially it's cost-based parallel query optimizer and execution engine.
Assumption in Polybase • 1. Polybase makes no assumptions about where HDFS data is • 2. Nor any assumptions about the OS of data nodes • 3. Nor the format of HDFS files (i.e. TextFile, RCFile, custom, …)
Core Components • External Table • HDFS Bridge in DMS • Cost-based query optimizer(wrapping the one in V1)
External Table • Create cluster instance • CREATE HADOOP_CLUSTER GSL_CLUSTER WITH (namenode=‘hadoop-head’,namenode_port=9000,jobtracker=‘hadoop-head’,jobtracker_port=9010); • Create File Format • CREATE HADOOP_FILEFORMAT TEXT_FORMAT WITH (INPUT_FORMAT=‘polybase.TextInputFormat’, OUTPUT_FORMAT = ‘polybase.TextOutputFormat’, ROW_DELIMITER = '\n', COLUMN_DELIMITER = ‘|’); • Create External Table • CREATE EXTERNAL TABLE hdfsCustomer ( c_custkeybigint not null, c_namevarchar(25) not null, …… c_commentvarchar(117) not null) WITH (LOCATION='/tpch1gb/customer.tbl', FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER, EXTERNAL_FILEFORMAT = TEXT_FORMAT));
HDFS Bridge • HDFS is a component of DMS • Goal: Transferring data in parallel between the nodes of Hadoop and PDW clusters. • HDFS shuffle phase: (read data from hadoop) • 1. Communicate with namenode, get info of file • 2. Balance number of bytes read by each DMS instance(based on hdfs info and dms instances count) • 3. Invoke openRecordReader() RecordReader instance: directly communicate with datanode • 4. Get data and transfer into ODBC types.(may done in mapreduce job) • 4. Hash function to determine target node for each record • Write to hadoop is almost the same • Invoking openRecordWriter()
Optimizer & Compilation • Parsing • A Memo data structure of alternative serial plans • Parallel optimization[in PDW V1] • Bottom-up optimizer to insert data movement operators in the serial plans • Cost-based query optimizer:[whether pushing to Hadoop] • Based on statistics\ relative size of two clusters and other factors • Semantic Compatibility • Data types • SQL semantics • Error handling
Statistics • Define statistics table for external table: • CREATE STATISTICS hdfsCustomerStats ON • hdfsCustomer (c_custkey); • Steps to obtain statistics in HDFS: • 1. Read block level sample data from DMS or map jobs • 2. Partitioned samples across compute nodes. • 3. Each node calculates a histogram on its portion • 4. Merge all histograms stored in catalog for database. • An alternative implementation: • In HadoopV2, let Hadoop cluster calculate the histograms. (cost a lot) • Make the best use of computational resource of Hadoop cluster
Semantic Compatibility • Data types • Java primitive types • Non-primitive types • Third-party types that can be implemented • Marked those can not be implemented in Java[only can be processed in PDW] • SQL semantics • Return of Expressions: implemented in Java • Returning null: eg. A+B (A==null || B==null)?null: (A+B) • Marked those can not be implemented in Java[only can be processed in PDW] • Error handling • Exceptions will come out in SQL should also be throwed in Java
Example • SELECT count (*) from Customer WHERE acctbal < 0GROUP BY nationkey
MapReduce Join • Distributed Hash Join • Support for equi-join • Implementation: • Build side: the side with smaller size of data. They are materialized in HDFS. • Probe side: the other side of data. • Partition build side, making build side in-memory to speed up. • Build side may also be replicated.
Performance Evaluation • Test configuration: • C-16/48 16 node PDW cluster, 48 node Hadoop cluster • C-30/30 30 node PDW cluster, 30 node Hadoopcluster • C-60 60 node PDW cluster and 60 node Hadoop cluster • Test database: • Two identical tables T1 and T2 • 10 billion rows • 13 integer attributes and 3 string attributes (~200 bytes/row) • About 2TB uncompressed • One copy of each table in HDFS • HDFS block size of 256 MB • Stored as a compressed RCFile • RCFiles store rows “column wise” inside a block • One copy of each table in PDW • Block-wise compression enabled
Selection on HDFS table Crossover Point: Above a selectivity factor of ~80%, PB Phase 2 is slower SELECT u1, u2, u3, str1, str2, str4 from T1 (in HDFS) where (u1 % 100) < sf Polybase Phase 2 Polybase Phase 1 SP PB.1 PB.1 PB.1 PB.1 PB.1 PB.1 SP SP SP SP SP 23
Join HDFS Table with PDW Table SELECT * from T1 (HDFS), T2 (PDW) where T1.u1 = T2.u2 and (T1.u2 % 100) < sf and (T2.u2 % 100) < 50 Polybase Phase 2 SP PB.1 PB.1 PB.1 PB.1 Polybase Phase 1 SP SP SP 24
Join Two HDFS Tables PB.2P – Selections on T1 and T2 pushed to Hadoop. Join performed on PDW PB.1 – All operators on PDW PB.2H – Selections & Join on Hadoop SELECT * from T1 (HDFS),T2 (HDFS) where T1.u1 = T2.u2 and (T1.u2 % 100) < SF and (T2.u2 % 100) < 10 PB.1 PB.1 PB.1 PB.1 PB.2P PB.2H PB.2P PB.2H PB.2P PB.2H PB.2H PB.2P 25
Performance Wrap-up • Split query processing really works! • Up to 10X performance improvement! • A cost-based optimizer is clearly required to decide when an operator should be pushed • Optimizer must also incorporate relative cluster sizes in its decisions
Reference • Split Query Processing in Polybase(SIGMOD’13, June 22-27,2013,New York,USA.) • Microsoft Corporation • Polybase: What, Why, How(ppt) • Microsoft Corporation • Query Optimization in Microsoft SQL Server PDW (SIGMOD'12, May 20-24,2012,Scottsdale,Arizona,USA) • Microsoft Corporation