410 likes | 677 Views
The Art of Database Sharding. Maxym Kharchenko Amazon.com. April 22-26, 2012 Mandalay Bay Convention Center Las Vegas, Nevada, USA. www.collaborate12.org www.collaborate12.ioug.org. When your data grows …. New System. Problem. Old System. The Big Data problem.
E N D
The Artof Database Sharding Maxym Kharchenko Amazon.com
April 22-26, 2012Mandalay Bay Convention CenterLas Vegas, Nevada, USA www.collaborate12.org www.collaborate12.ioug.org
When your data grows … New System Problem Old System
The Big Data problem One machine is not enough
Running on >1 machines 10,000,000 1 Courtesy: John Rauser @amazon.com
Split your datainto small independent chunks And run each chunkon cheap commodity hardware
How to split your data Data Data Data Data Data
Step 2: Chose sharding key and function
Bad Sharding Can we partition collaborate participants by last name ? CREATE TABLE Collaborate_Participants(last_namevarchar2(30) PRIMARY KEY,signup_datedate)
Avalanche Effect i.e. MD5
Hashes and Buckets MOD MOD MOD
Resharding 75 % bad 3 shards • Adding 4th shard
Logical Shards MOD MOD MOD MOD
Implementing Shards: Standbys Apps Read Only Unsharded Shard 1 Standby Shard 2
Implementing Shards: Tables Apps Read Only Create materialized view … as select … from a@shard1 Drop materialized view … preserve table Shard 2 Shard1 MVA TabA TabA
Why shards are awesome • Small data, small load • Better caching, faster queries • Smaller load, fewer surprises • Faster maintenance, i.e. restores • Eggs not in one basket: • Availability redefined • Safer maintenance • Multiple points of view: • SQL performance • System load
Why shards are NOT so great • More systems • Power, rack space etc • Needs automation … bad • More likely to fail overall • Some operations become impractical: • Joins across shards • Foreign keys across shards • More work: • Applications, developers, DBAs • High skill, DIY everything
Implementing Shards:Moving “data head” Apps Shard 1 Shard 2 Shard 3 Shard 4
Bad Sharding. Example 2 Can we shard customers by meaningless sequence ? CREATE TABLE Orders (order_id number PRIMARY KEY,customer_fname varchar2(30),customer_lname varchar2(30),order_date date) order_id: 10000 - 20000 order_id: 20001 - 30000 order_id: 30001 - 40000 order_id: 40001 - 50000