The ScaleDB Storage Engine

The ScaleDB Storage Engine Enabling high performance and scalability, using a Multi-Table Index, and a Shared-Disk Clustering Architecture Moshe Shadmonmoshe@scaledb.com

Agenda • Overview • ScaleDB’s Clustering Architecture • Shared-Disk vs. Shared-Nothing • MySQL and a Shared-Disk Storage Engine • ScaleDB Installation • Demo • ScaleDB’s Indexing Technology • Multi-Table Index • Enabling Multi-Table Index in MySQL • Demo • Summary • ScaleDB Status & Product Availability

Overview • Plug-in Storage Engine for MySQL • Main Features: • Shared-Disk Architecture • Innovative Multi-Table Indexing • Transactional • Row-Level Locking • ACID Compliant • Atomicity: All tasks of a transaction performed or none of them are. • Consistency: The database is in a consistent state before and after the transaction. • Isolation: Data is not available in an intermediate state during a transaction • Durability: When a transaction completes, the transaction’s data will persist • Disk-Based Storage Engine

Shared-Disk vs. Shared-Nothing • Manageability • Adaptability • Availability/Fault-Tolerance • Scalability • Performance • Total Cost of Ownership (TCO)

Shared-Nothing: DatabaseInstance 1 Table A Table B Table C Vertical Partitioning DatabaseInstance 1 Table A DatabaseInstance 2 Table B DatabaseInstance 3 Table C

Shared Nothing: Partitioning Your Data…How • Predict usage patterns, application evolution, data growth patterns…all are moving targets • Avoid data skew: bottlenecks caused by frequently accessed data on just a few nodes • Avoid data shipping between nodes • Avoid delays from distributed 2-phase commit • Searches outside the partition column require participation by all nodes • Scaling becomes an exercise in fire fighting

Shared-Nothing:Horizontal Partitioning Logical View salary name age Partitioned by Salary salary name age salary name age salary name age Physical View Horizontal Partitioning – Salary % 3

Shared-Nothing:Horizontal Partitioning Pitfalls • Selections with equality predicates referencing the partitioning attribute are directed to a single node: • Retrieve Emp where salary = 60K • SELECT FROM Emp WHERE salary=60K • Equality predicates referencing a non-partitioning attribute and range predicates are directed to all nodes: • Retrieve Emp where age = 20 • Retrieve Emp where salary < 20K • SELECT FROM Emp WHERE salary<20K

Table A Table C Table B Shared-Disk:No Partitioning, Full Access to Data DatabaseInstance 1 Table A Table B Table C DB ClusterNode 1 DB ClusterNode 2 High-Speed Interconnect Shared DiskSubsystem DB ClusterNode 3

Scalability & Availability Shared Nothing Slave A Node A Node B Slave B Node C Slave C

Scalability & Availability Shared Disk MySQL Servers with ScaleDB Engine Node A Node B Node C Node D Node E Data

Shared-Disk:Summarizing Shared-Disk Benefits • Grow by simply adding nodes to the cluster • Servers can be added and removed dynamically according to your needs • No interruption to your application • High-Availability with dynamic failover • Existing nodes automatically take over • Significantly reduced maintenance costs • Can be built on low-cost commodity hardware • No data partitioning • No need for slaves • Low Total Cost of Ownership (TCO)

Shared-Disk:Making it work with MySQL Node 1 Node 2 Server Instance A Server Instance B ScaleDB Engine Instance A ScaleDB Engine Instance B Cluster Interconnect ClusterManager Comm. Layer ClusterManager Comm. Layer Buffer Manager Buffer Manager Shared Disk Sub-system

Shared-Disk: Insert New Row Node 1 Node 2 Server Instance A Server Instance B ScaleDB Engine Instance A ScaleDB Engine Instance B Cluster Interconnect ClusterManager Comm. Layer ClusterManager Comm. Layer Buffer Manager Buffer Manager Shared Disk Sub-system

Shared-Disk: Select Node 1 Node 2 Server Instance A Server Instance B ScaleDB Engine Instance A ScaleDB Engine Instance B Cluster Interconnect ClusterManager Comm. Layer ClusterManager Comm. Layer Buffer Manager Buffer Manager Shared Disk Sub-system

Shared-Disk: Create Table Node 1 Node 2 Table A Meta-Data Table A Meta-Data Server Instance A Server Instance B ScaleDB Engine Instance A ScaleDB Engine Instance B Cluster Interconnect ClusterManager Comm. Layer ClusterManager Comm. Layer Buffer Manager Buffer Manager Shared Disk Sub-system

ScaleDB Installation • Define cluster = true in ScaleDB Config file: • ScaleDB.cnf is at the same directory as my.cnf: • Cluster params: • cluster = true • nodes_in_cluster = 2 • node_id = 1 • this_machine_port = 100 • next_machine_ip_address = 192.168.0.101 • next_machine_port = 100 • log_directory = /share/logs/

Demo - Sysbench • ScaleDB cluster – one node – show throughput • ScaleDB cluster – 2nd node – show throughput

ScaleDB: Multi-Table Indexing #2 #1 #1 #2 #3 #3 #4 #4 #5 #5 B-tree: Only indexes the data in tables ScaleDB: Indexes the data and relationships Index #1 Index #2 Index #3 Index #4 Index #5 ScaleDB Index • Advantages: • Faster • Smaller • Referential integrity

Example • Scenario: Select information that is spread across 3 tables: Colleges, Students and Enrollment • Relationships: Students are enrolled in courses within departments of colleges SELECT c1.CollName, s.StudName, c2.CourseName , e.Grade FROM College AS c1 JOIN Student AS s JOIN Enrollment AS e JOIN Course AS c2 ON ( c1.CollNo = s.CollNo AND s.CollNo = e.CollNo AND s.StudentNo = e.StudentNo AND e.CollNo = c2.CollNo AND e.DeptNo = c2.DeptNo AND e.CourseNum = c2.CourseNum ) WHERE c1.CollNo = X AND s.StudentNo = Y ;

Option #1: Conventional Joins College Table Get College information Students Table Get Student information Enrollment Table Search enrollment by College & Student

Option #2: Materialized View . . .

Mapping Foreign Keys to Data Views • Create College Table College • Create Department Table • Foreign key – College • Create Course Table • Foreign Key – Department Department Students • Create Students Table • Foreign key – College Course Enrollment • Create Enrollment Table • Foreign key - Students • The Parent-Child tables are Created in MySQL Such that MySQL is able to operate over the new tables • The data of the Parent-Child tables is assembled on the fly from the source tables

Mapping Foreign Keys to Data Views College Department College Department Course College Students Enrollment College Students Meta-Data Tables: 1. College 2. College-Dept 3. College-Dept-Course 4. College-Students 5. College-Students-Enrollment 6. Department 7. Students 8. Course 9. Enrollment ScaleDB Physical files: 1. College 2. Department 3. Student 4. Course 5. Enrollment

Enabling the MySQL optimizer to use a Multi-Table Index SELECT c1.CollName, s.StudName, c2.CourseName , e.Grade FROM College AS c1 JOIN Student AS s JOIN Enrollment AS e JOIN Course AS c2 ON ( c1.CollNo = s.CollNo AND s.CollNo = e.CollNo AND s.StudentNo = e.StudentNo AND e.CollNo = c2.CollNo AND e.DeptNo = c2.DeptNo AND e.CourseNum = c2.CourseNum ) WHERE c1.CollNo = X AND s.StudentNo = Y ; CREATE TABLE sdb_view_college_course_student ( L1_CollNo INT NOT NULL, L1_CollName CHAR(32) NOT NULL, L1_CollBudget INT NOT NULL, L1_CollDescription CHAR(60) NOT NULL, … Table College Columns L2_StudNo INT NOT NULL, L2_StudName CHAR(48) NOT NULL, … Table Student Columns L3_CourseNum CHAR(9) NOT NULL, L3_Grade CHAR(2) NOT NULL, … Table Enrollment Columns PRIMARY KEY ( L1_CollNo, L2_StudtNo, L3_CourseNum)) ENGINE = SCALEDB; Select L1_CollName, L2_StudName, L3_CourseName, L3_Grade FROM sdb_view_college_course_student WHERE l1_CollNo = X AND l2_StudentNo = Y ;

Option #3: Multi-Table Index ScaleDB Multi-Table Index College Enrollment Departments Students Students Enrollment Colleges Courses Enrollment 001 Agriculture $1,234,567 Nice place to visit 002 Arts $5,432,567 Sports not so good 003 Business $9,999,666 Cool logo 004 Education $3,234,567 Ugh Worcester 005 Engineering $8,238,568 Serious work 006 Law $7,237,767 Jumpy students 007 Liberal Arts $9,898,777 Pretty campus 008 Medicine $5,987,004 In Texas 56-8033 008 Mike Hogan Caucasian 56-8045 008 Moshe Smith Caucasian 56-8044 008 Sally Shadmon Native American 56-8055 008 Billy Fleegle African American 56-8037 008 Saul Goode African American 56-8122 008 Tim Collins Polynesian 56-8233 008 Sam Gee Asian 56-8334 008 Rod Paulino Asian 008 4455 56-8037 B+ 008 4455 56-8033 C 008 4455 56-8045 B+ 008 4456 56-8044 A- 008 4456 56-8122 B- 008 4454 56-8233 C 008 4455 56-8334 F 008 4454 56-8055 D Col_ID# Col_Name Col_Budget Col_Description Student_ID# College_ID# Student_Name Student_Desc Coll_ID# Coll_Name Coll_Budget Coll_Description College_ID# Dept_ID# Student_ID# Grade

The Multi-Table Index • Multi-Table Index appears to MySQL as a data table • ScaleDB does not maintain data file associated with the Multi-Table Index • For a query using virtual table, ScaleDB assembles the rows on the fly using the Multi-Table Index • ScaleDB indexes are different than B-tree indexes • ScaleDB indexes provide the same functionality as B-tree, plus… • They maintain referential integrity with minimal overhead • They allow you to search for the data and relationships • They are much smaller in size

Demo • Query with join • Query with Multi-Table Index • 2nd node virtual table

Benchmarking ScaleDB Index

Summary • ScaleDB Cluster • Multiple ScaleDB instances share the same physical data. • Connecting to the cluster is similar to connecting to a single node. • For the application, the cluster appears as a single node. • Transparent application failover • Transparent Scalability • ScaleDB Indexes • Provide the B-tree functionality • High performance • Map relationships • Maintain referential integrity • Smaller footprint • Independent of the key size

ScaleDB Status and Product Availability • Started Beta Process • We are looking for beta companies • Product launch is scheduled for June timeframe • Please talk to us if you are developer interested in working with ScaleDB • moshe@scaledb.com

The ScaleDB Storage Engine

The ScaleDB Storage Engine

Presentation Transcript

Distributed Data Storage and Parallel Processing Engine

HYRISE – A Main Memory Hybrid Storage Engine

ScaleDB Technical Presentation

A Storage Engine for Amazon S3

ScaleDB: Persistence for Stream Data

Overland Storage The Storage Conundrum

THE COSMIC ENGINE

Using Q4M a message queue storage engine for MySQL

The Inefficient Engine

The Storage Hierarchy

The Steam Engine

The ScaleDB Storage Engine

XtraDB OpenSource Storage Engine for MySQL

ScaleDB

The Storage

ScaleDB Technical Presentation

The Stirling engine

Storage Engine for Semantic Web

Moshe Shadmon ScaleDB

ScaleDB

The Steam Engine

Find the Best Self-Storage Storage