720 likes | 1.09k Views
The Art of Database Sharding. Maxym Kharchenko Amazon.com. Whoami. Started as a database kernel developer Network database: db_VISTA ORACLE DBA for ~ 10-12 years Starting with ORACLE 8 Last 3 years: Sr. Persistence Engineer @ Amazon.com OCM, ORACLE Ace Associate
E N D
The Artof Database Sharding Maxym Kharchenko Amazon.com
Whoami • Started as a database kernel developer • Network database: db_VISTA • ORACLE DBA for ~ 10-12 years • Starting with ORACLE 8 • Last 3 years: Sr. Persistence Engineer @Amazon.com • OCM, ORACLE Ace Associate • Blog: http://intermediatesql.com • Twitter: @maxymkh
Agenda • The “big data” scaling problem • Solving scaling with “sharding” • Practical sharding • Your sharding experience: Good and bad
How to scale a database New System Problem Old System 2013 2014 2015 2016 2017
Use lots of cheap machines Not bigger machines
Commodity hardware = $$$$$ $$
Split your datainto small independent chunks And run each chunkon cheap commodity hardware
Sharding CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200) );
Sharding CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200) ) SHARD BY <method> (<shard_key>) ( SPLIT SIZE evenly SPLIT LOAD evenly DISCOURAGE CROSS SHARD ACCESS DISCOURAGE DATA MOVE USING 4 DATABASES );
Split size evenly SHARD BY LIST ( first_letter(author) ) ( SPLIT SIZE evenly ); H-M A-G N-T U-Z
Split load evenly SHARD BY RANGE (id) ( SPLIT SIZE evenly SPLIT LOAD evenly ); 1-100 101-200 201-300 301-400
Split load evenly SHARD BY HASH (id) ( SPLIT SIZE evenly SPLIT LOAD evenly ); 0 1 2 3
Discourage cross shard access SHARD BY HASH (id)( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE id = 34567876;
Discourage cross shard access SHARD BY HASH (id)( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;
Discourage cross shard access SHARD BY HASH (author) ( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title; 0 1 2 3
Discourage data move SHARD BY mod(hash(author), 4) ( DISCOURAGE DATA MOVE ); 0 1 2 3
Discourage data move SHARD BY mod(hash_function(author), 6) ( DISCOURAGE DATA MOVE ); 4 5 0 1 2 3
Physical and Logical shards SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE ); DB 1 DB 2 DB 3 DB 4
Executing queries defshard_query(sql, binds, shard_key): """ Execute query in the correct db """ shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds) SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;
Implementing Shards: Standbys Apps Read Only Unsharded Shard 1 Standby Shard 2 Drop non-qualifying data Drop non-qualifying data
Implementing Shards: Tables Apps Read Only Create materialized view … as select … from a@shard1 Drop materialized view … preserve table Shard 2 Shard1 TabA MVA TabA
Implementing Shards:Moving “data head” Apps Shard 1 Shard 2 Shard 3 Shard 4
Data protection App App Shard 1 Shard 2 Shard 3 Shard 4 Stb1 Stb 2 Stb 3 Stb 4
Why shards are awesome • (potentially) Unlimited scaling • Local ACID + relational • Better maintenance • Eggs not in one basket • “Apples to apples comparison” with other shards
Why shards are NOT so great • More systems • Power, rack space etc • Needs automation … bad • More likely to fail overall • Some operations become difficult: • Transactions across shards • Foreign keys across shards • More work: • Applications, developers, DBAs • High skill, DIY everything