160 likes | 336 Views
Database Scalability, Elasticity, and Autonomy in the Cloud Agrawal et al. Oct 24, 2011. Framing. Survey paper Identifies necessary qualities of cloud storage Scalability Sensible consistency / programming model Scale-down and migration Autonomic management
E N D
Database Scalability, Elasticity, and Autonomy in the CloudAgrawal et al. Oct 24, 2011
Framing • Survey paper • Identifies necessary qualities of cloud storage • Scalability • Sensible consistency / programming model • Scale-down and migration • Autonomic management • Pointers to different work in the space
Scalability • Add more resources, get more performance • Handle more requests per second • Store more data • Achievable with scale-up or scale-out • Scale-out is the only paradigm for the cloud • App’s parallelism is limited by Amdahl’s Law
Finding the right design point • What’s the right consistency / programming model? • Pure key-value stores are too weak • Only have transactions on single records • Traditional RDBMs are too strong • Can’t just run MySQL at scale • Instead, provide strong consistency within a portion of the data • Megastore • Vertica, Aster, Teradata, Greenplum, …
Data Fusion vs. Data Fission Fusion Fission Weak Strong Consistency Dynamo Megastore, G-Store MySQL Azure, ElasTraS, Rel Cloud BigTable, PNUTS
Data Fusion • Start with a key-value store • Partition records into groups • Provide multi-record updates within a group • Cross-group operations handled separately • Assumes that cross-group ops are rare
Data Fission • Start with a relational database • Partition tables into shards • Provide ACID within each shard • Cross-shard ops are expensive • Assumes that cross-shard ops are rare
What’s the difference? • Is Fusion vs. Fission a worthwhile distinction? • Seems like they both arrive at the same place • Megastore “Fusion” vs. ElasTras “Fission” • Shard tables based on a table’s primary key • Shard is co-located on the same machine • ACID transactions within a shard • Primary and secondary indexes • All Megastore is missing is an SQL interface!
The difference • Different targeted users • Fusion is for people who own datacenters • Fission is for people who want SQL in the cloud • Different exposed API • Fusion is more explicit about performance • Fission tries to hide partitioning from user • Anything else?
Elasticity • Dynamically scaling up and down on-demand • Important with pay-as-you-go cloud pricing • Consolidate to reduce costs • Expand to increase performance • Need to move state and processing duties around within the system
Live migration of databases • Shared-disk • “Global disk” shared by all DB nodes • Just need to copy in-memory state • Iterative copy: sync up cached pages + transaction state to minimize the availability hit • Shared-nothing • Each DB node is its own separate DB instance • Need to copy both local disk state and memory • Push/pull: gradually shift new requests to the new node, sync state in the background
Database Autonomy • Need management to be more automatic • Elasticity and load balancing based on usage and ML predictions • Performance modeling • Migration costs (availability, performance, $$$) • Resource isolation (consolidated services) • SLAs
Tree schema • Primary table’s primary key used for sharding • Secondary tables are sharded into row groups • Row groups are co-located and transactional • Global tables are write-rarely, and replicated on all nodes