Parallel Universe

Parallel Universe Fast Parallel MySQL Server

Target Markets • Database Servers • Data Warehouse Servers • Data Analytics Servers

Parallel Universe Parallel Universe is the industry’s only SQL server with fast parallel query engine. It is created by extending MySQL server architecture.Speed is achieved by processing tables in parallel utilizing multiple core/CPU of server hardware.Because of fast query processing being available to data analysis, it is an ideal data warehouse server.With Parallel Universe, you'll also be able to deploy less costly server hardware for the same query load/task.Parallel Universe is released under the GPL license and fully compatible with MySQL and Percona servers.Also available as part of Linux OS images at Amazon Web Services and www.GoGrid.com .

New Technology Today microprocessors which provide computational resource to RDBMS (Relational Data Base Management System) contain multiple CPU cores where each core is capable of executing its own code independently. If RDBMS server can break down its task into a number of smaller subtasks then these subtasks may be performed by those multiple cores concurrently resulting in faster execution. This new technology speeds up the execution phase of query.

Parallel Universe isan Extension to MySQL Server MySQL Server Fast Parallel Query Engine + Parallel Universe

MySQL Query Processing 1. Background Information MySQL query determines combinations of records from given tables which satisfy a given condition. SELECT field_list FROM table_list WHERE condition where field_list represents fields from the tables to output, each table consists of a number of records and each record is comprised of a number of fields(attributes) and the condition specifies field relationships. The server parses and translates the query into the optimum query execution plan which specifies a order by which the tables are processed, how records are read from the tables, conditions which must be satisfied by these records and process/output record method.

Current Technology All operations carried out by a single thread*. Optimization Phase Execution Phase Table Order Table Access Methods Record Match Conditions #Process Record Method Parse & Optimize Execute #Process Records Query Record Combinations Result Query Execution Plan *Thread is a flow of code execution scheduled by the operating system to run on a particular core of the CPU.

New Technology The plan is executed by multiple (n) threads. Execution Phase Table Order Table Access Methods Record Match Conditions Process Record Method Execute Thread 1~n Process Records Thread n Record Combinations Result Query Execution Plan

2. Example of Current Technology with 3 Tables, recursive execution by a single thread 1st table(t1) Read a record (may use an index) and store (used fields only) and if it satisfies the condition associated with this table then move down to the next table otherwise continue reading records when there is no more record to read, the procedure is finished. 2nd table(t2) Read a record (may use an index derived from record of t1) and if it satisfies the condition associated with this table which depends on records from this table and t1 then move down to the next table otherwise continue reading records when there is no more record to read, move back up to the previous table. Last table(t3) Read a record (may use an index derived from records of t2 and t1) and if it satisfies the condition associated with this table which depends on records from this table, t2 and t1 then process and output this particular combination of records according to the process record method of the query execution plan in any case continue reading records when there is no more record to read, move back up to the previous table.

Current Technology Table 1 Table 2 Table 3 Records Records Records Processing by Thread 1 Processing by Thread 1 Processing by Thread 1 Tables take turns in being processed by Thread 1.

3. Same Example, New Technology executed by 3 threads 1st table(t1, executed by thread 1) Read a record and if it satisfies the condition then insert this record into inter table buffer between this table and the next (possibly waits in case the buffer is full). Continue reading records when there is no more record to read, processing of this table is finished. 2nd table(t2, executed by thread 2) Wait for a record from t1 to be available in the inter table buffer between this table and the previous, read a record of this table and if it satisfies the condition then insert this record and the record of t1 into the inter table buffer. Continue reading records when there is no more record to read, remove the record of t1 from the buffer and wait for the next record. Last table(t3, executed by thread 3) Wait for a record set of t2 and t1 to be available in the inter table buffer then read a record of this table and if it satisfies the condition then process and output this particular combination of records. Continue reading records when there is no more record to read, remove the record set of t2 and t1 from the buffer and wait for the next record set.

New Technology - Fast Parallel Query Engine Tables in query are processed in parallel utilizing multiple core/CPU. Table 1 Table 2 Table 3 Records Records Records Processing by Thread 1 Processing by Thread 2 Processing by Thread 3 Record Sets* Record Sets *Record set is a set of records from tables processed thus far.

Table Processing by Thread i Records If Record Match then Append and Output Inter Table Buffer* Record Sets Record Sets Record Sets *Inter Table Buffer is a queue for record sets between tables to allow 2 threads to operate independently.

MySQL Compatible • Use Existing Databases • Use Existing Queries Specify Tables to be Processed in Parallel set @parallel_table_list=“mydb.t1,mydb.t2, mydb.t3” (default=null, parallel processing disabled)

Benchmarks Intel Dual Xeon Processors (2x6 cores) w/ 24GB Memory on Centos 6.2-64bit OS Using warm cache: 2nd and subsequent runs where tables have already been loaded into the memory. For Innodb benchmarks --innodb_buffer_pool_size=16G t1 ( `region` char(1) DEFAULT NULL, (A thru F and repeats) `idn` int(11) DEFAULT NULL, (0 thru 999,999) `rev_idn` int(11) DEFAULT NULL, (999,999 thru 0) `grp` int(11) DEFAULT NULL, (0 thru 99 in steps of 3, modulo 100) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 (1 million records)

t2 same as t1 with a key (idn) t3 same as t1 with a key (rev_idn) t4 same as t1 with a key (idn) t5 same as t1 with a key (rev_idn) t6 same as t1 with a key (idn) t7 same as t1 with a key (rev_idn) t8 same as t1 with a key (idn) 4 Innodb Tables Query SELECT count(*) FROM t1,t2,t3,t4 WHERE t1.idn=t2.idn and t2.rev_idn=t3.rev_idn and t3.idn=t4.idn and t4.grp>=0; Execution Plan

8 Innodb Tables Query SELECT count(*) FROM t1,t2,t3,t4,t5,t6,t7,t8 WHERE t1.idn=t2.idn and t2.rev_idn=t3.rev_idn and t3.idn=t4.idn and t4.rev_idn=t5.rev_idn and t5.idn=t6.idn and t6.rev_idn=t7.rev_idn and t7.idn=t8.idn and t8.grp>=0; Execution Plan

For MyISAM benchmarks t9 same as t1 with ENGINE=MyISAM (1 million records) same as t1 t10 same as t9 with a key (idn) t11 same as t9 with a key (rev_idn) t12 same as t9 with a key (idn) . . t23 same as t9 with a key (rev_idn) t24 same as t9 with a key (idn) 4 MyISAM Tables Query SELECT count(*) FROM t9,t10,t11,t12 WHERE t9.idn=t10.idn and t10.rev_idn=t11.rev_idn and t11.idn=t12.idn and t12.grp>=0; Execution Plan same as 4 Innodb Tables Query

8 MyISAM Tables Query SELECT count(*) FROM t9,t10,t11,t12,t13,t14,t15,t16 WHERE t9.idn=t10.idn and t10.rev_idn=t11.rev_idn and t11.idn=t12.idn and t12.rev_idn=t13.rev_idn and t13.idn=t14.idn and t14.rev_idn=t15.rev_idn and t15.idn=t16.idn and t16.grp>=0; Execution Plan same as 8 Innodb Tables Query

16 MyISAM Tables Query Amazon Web Services Cluster Compute Eight Extra Large Instance Server: 2 Intel Xeon E5-2670 Processors (2x8 cores) with 60.5 GB of memory and 3370 GB of instance storage (cc2.8xlarge) running Cluster Compute Amazon Linux 64 bit OS. SELECT straight_join count(*) FROM t9,t10,t11,t12,t13,t14,t15,t16,t17,t18,t19,t20,t21,t22,t23,t24 WHERE t9.idn=t10.idn and t10.rev_idn=t11.rev_idn and t11.idn=t12.idn and t12.rev_idn=t13.rev_idn and t13.idn=t14.idn and t14.rev_idn=t15.rev_idn and t15.idn=t16.idn and t16.rev_idn=t17.rev_idn and t17.idn=t18.idn and t18.rev_idn=t19.rev_idn and t19.idn=t20.idn and t20.rev_idn=t21.rev_idn and t21.idn=t22.idn and t22.rev_idn=t23.rev_idn and t23.idn=t24.idn and t24.grp>=0; Straight join is used to reduce query optimization time. Execution Plan similar to 8 Innodb Tables Query

Parallel Universe

Parallel Universe

Presentation Transcript

Using the Parallel Universe beyond MPI

Universe

Commercializing Access to the Parallel Universe of Connotative Meaning

“I feel like I’m diagonally parked in a parallel universe”

Expanding Universe

Expanding Universe

The Universe

The universe

Newton’s Universe

Parallel Literature, Parallel Language

Early Universe

The Universe

The Universe

Universe,

UNIVERSE

The Universe

Preliminary tests with co-scheduling and the Condor parallel universe

The Universe

Universe

Universe

Early Universe

Universe