250 likes | 368 Views
About Us. ScaleBase builds the first Database Load Balancer Let you scale your MySQL – totally transparently Version 1.2 released Q3-2011. Why To Shard. How To Shard MySQL. Analyze your schema Split your data Update your code We’ll use a sample application as an example.
E N D
About Us • ScaleBase builds the first Database Load Balancer • Let you scale your MySQL – totally transparently • Version 1.2 released Q3-2011
How To Shard MySQL • Analyze your schema • Split your data • Update your code • We’ll use a sample application as an example
Our Sample Application • DayTrader application • Standard benchmark application by the Apache Group
Splitting Your Schema • Collect information • Table list and size • Tables relationships • SQL Query log • See how-to collect this data at
Table Policies • Split • Global • Master
Deciding Which Tables To Shard • Look at the biggest tables in your schema.There is no exact number of tables to look at. Most schemas have several big tables, and the rest are very small. Table sizes are not divided evenly. • Look at the SQL log – are there joins between those tables? • If there are, take the smaller table, and make it a global table. • If not, those tables will serve as your shard environment. • Look at the SQL log • If the tables are not accessed frequently, make them global tables. • Look at the most accessed tables (especially those with many write commands) and mark them as sharded. Go to step 2 to make sure they can be sharded.
Day Trader Sharding • So, how will the DayTrader schema will look like sharded?
Copying The Data • Have the database cloned in all shards. It can be done by cloning a VM, or copying the physical files, or using mysqldump to export once and import to all shards. • For each shard (on shard tables only): • Drop all indexes. • Delete the irrelevant data from the shard. Note: This action creates a lot of fragmentation. You might consider creating temporary a table, inserting to it only the relevant rows, drop the original table and rename the temporary one to the real name. • Create all indexes.
Changing The Code • Data access layer has to be re-written
Without ORM • Upgrading Connection Pool • Your first task is to write a connection pool that is “sharding” aware. The class should look something like this: public class ShardingAwareDatasource { public static Connection getConnection(Object shardingKey) {…} public static Connection getAnyConnection() {…} }
Without ORM • Changing DML • Shard tables – direct DML to correct shard • Global tables – DML on all shards, XA transactions • Changing Queries • Go over all of your queries. • For each query: • Identify whether it runs on a global or shard table. • If it runs on a global table, make sure the connection used is from getAnyConnection. • If it is based on a shard table, check if it contains the shard key. • If so, then use that shard key in the getConnection method. • If the query uses other tables, break it down into multiple queries. • If not, then split the query into multiple queries, so that each contains a shard key. • Make sure your code merges data that is gathered across multiple queries. • Usually, you’ll see that if the query is not trivial (contains only one table; if the table is sharded, it must contain the shard key in the where clause; etc.) it will have to be changed. It’s a lot of work, but it pays off in performance.
Queries Limitations • Cross shard queries • With group by? Order by? • Aggregate functions?
With ORM • Since most ORMs don’t support sharding, you’re out of luck. Most likely you’ll have to rewrite your ORM code, directly use SQL, and handle the object mapping by yourself. Not an easy task.
In DayTrader • Since ejb3 (ORM) was used, complete DAL rewrite was necessary. The following slides explain the required changes.
AccountDataBean • Account table is not sharded • AccountDataBean holds a collection of OrderDataBean and HoldingDataBean • Sharded. Must run against the correct database • Update operations must execute on all shards • Changes: • Implement shard-aware getOrders/setOrders • Implement custom, cross database Insert/Update/Remove functionality (not in EntityManager). • Alternative – extend EntityManager to support Global tables
HoldingDataBean • Must be rewritten • Can’t use EntityManager • Or implement Shard aware EntityManager
Implementing Custom EntityManager • Implement your own PersistenceProvider that create EntityManagerFactory that creates your EntityManager • Configure that PersistenceProvider in the persistence.xml file • Extend EntityManagerImpl • Implement find method • Check Class is mapped to sharded table (can be done using annotations or central configuration) • If so – run find on correct shard • If not – run on any shard • Implement merge, createQuery, etc • Implement broker that is sharding aware
Benefits Of ScaleBase ScaleBase Planner • ScaleBase • Analyzer • Shard • Reviver • Data • Federator • Results • Aggregator • Shard • Dispatcher
ScaleBase Architecture ScaleBase performs as Database Proxy 100% of ScaleBase functions become transparent
Summary • Implementing Sharding is hard • And ScaleBase makes it transparent – so give it a try
What’s Next • Join us for our next webinar “The Benefits Of Sharding MySQL” on December 13th. • Follow registration instructions on our site. • Send additional questions to liran.zelkha@scalebase.com