330 likes | 650 Views
Megastore Scalable Highly Available Storage for Interactive Systems. Megastore: Providing Scalable Highly Available Storage for Interactive Services.
E N D
Megastore Scalable Highly Available Storage for Interactive Systems Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Baker, Chris Bond, James C Corbett, JJ Furman, AndreyKhorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, VadimYushprakh. CIDR 2011. Presented By Ajit Padukone
Megastore - Agenda • Motivation • Design of Megastore • Data Model • Data Storage • Transactions and Concurrency Control • How Megastore achieves Availability and Scalability. • PAXOS. • Megastore’s approach.
Megastore - Motivation • Storage requirements of today’s interactive online applications. • Scalability. • Rapid Development. • Responsiveness (Low Latency). • Durability and Consistency. • Fault Tolerant. • These requirements are in conflict !
Megastore - Motivation • Available systems • Relational DBMS– Rich set of features, expressive language helps development, but difficult to scale. Eg: MySQL, PostgreSQL, MS SQL Server, Oracle RDB. • NoSQL Systems – Highly Scalable but Limited API and loose consistency models. Eg: Google’s BigTable, Apache Hadoop’sHbase, Facebook’s Cassandra. • Megastore blends the scalability of NoSQL with the convenience of traditional RDBMS.
Megastore – Data Model • API Design Requirements • Predictable Runtimes rather than Expressiveness. • Reads dominate Writes. • Storing and Querying Hierarchical data is easier in BigTable. • Hence • Data is not normalized but stored in Hierarchical method. • Joins are not supported – have to be implemented in application code.
Megastore – Data Model • Between abstract tuples of RDBMS and concrete row-column storage of NoSQL. • Tables are entity group root tables or child tables. • Entity Group – consists of a root entity along with all child entities. • There can be several root tables – leading to several classes of Entity Groups.
Megastore – Data Model CREATE TABLE Photo { required int64 user_id; required int32 photo_id; required int64 time; required string full_url; optional string thumbnail_url; repeated string tag; } PRIMARY KEY(user_id, photo_id), IN TABLE User, ENTITY GROUP KEY(user_id) REFERENCES User; CREATE SCHEMA PhotoApp; CREATE TABLE User { required int64 user_id; required string name; } PRIMARY KEY(user_id), ENTITY GROUP ROOT;
Megastore – Data Storage How is it stored in BigTable? “A Bigtable is a sparse, distributed, persistent multidimensional sorted map” !!!
Megastore – Data Storage A Sorted Map { "1" : "x", "aaaaa" : "y", "aaaab" : "world", "xyz" : "hello", "zzzzz" : "woot } A Map { "zzzzz" : "woot“, "xyz" : "hello", "aaaab" : "world", "1" : "x", "aaaaa" : "y" }
Megastore – Data Storage A Sorted Multidimensional Map { "1" : { "A" : "x", "B" : "z" }, "aaaaa" : { "A" : "y", "B" : "w" }, "aaaab" : { "A" : "world", "B" : "ocean" }, "xyz" : { "A" : "hello", "B" : "there” }, "zzzzz" : { "A" : "woot", "B" : "1337" } }
Megastore – Data Storage "aaaab" : { "A" : "world", "B" : "ocean" }, "xyz" : { "A" : "hello", "B" : "there” }, "zzzzz" : { "A" : "woot", "B" : "1337" } } A Sorted Multidimensional Map { "1" : { "A" : "x", "B" : "z" }, "aaaaa" : { "A" : "y", "B" : "w" },
Megastore – Data Storage A BigTable – Column families are static, columns are not. "aaaaa" : { "A" : { "foo" : "y", "bar" : "d" }, "B" : { "" : "w" } }, "aaaab" : { "A" : { "foo" : "world", "bar" : "domination" }, "B" : { “position" : "ocean“ } }
Megastore – Data Model Example: User {user_id:101, name: ‘John’ } Photo{ user_id:101, photo_id:500, time:2009, full_url: ‘john-pic1’, tag:’vacation’, tag:’holiday’, tag:’Paris’} Photo{ user_id:101, photo_id:500, time:2010, full_url: ‘john-pic2’, tag:’office’, tag:’friends’, tag:’pub’} User{user_id:102, name: ‘Mary’ } Photo{ user_id:102, photo_id:600, time:2009, full_url: ‘mary-pic1’, tag:’office’, tag:’picnic’, tag:’Paris’} Photo{ user_id:102, photo_id:601, time:2011, full_url: ‘mary-pic2’, tag:’birthday’, tag:’friends’}
Megastore – Data Storage How is it stored in BigTable? “user_id" : { “User" : { “name" : “<name>" }, }, “user_id, photo_id" : { “Photo" : { “time" : “<time>" , “full_url”: ”<url>”, “thumbnail_url”:”<thumbnail_url”>, “tag”: “<tag 1>”, “tag”: “<tag 2>”, “tag”: “<tag 2>”, … } }
Megastore – Data Storage How is it stored in BigTable?
Megastore – Data Storage • Indexing • Local Index – find data within Entity Group. CREATE LOCAL INDEX PhotosByTime ON Photo(user_id, time); • Global Index - spans entity groups. CREATE GLOBAL INDEX PhotosByTag ON Photo(tag) STORING (thumbnail_url); • The ‘Storing’ Clause • Faster retrieval of certain properties. • Repeated Index • Efficient alternative to child tables. • Inline Index • Useful for extracting slices of data from child tables in parent tables
Megastore – Data Storage PhotosByTag How is it stored in BigTable? PhotosByTime
Megastore – Data Storage Inline Indexes - How is it stored in BigTable? “user_id" : { “User" : { “name" : “<name>“, “PhotosByTime”: “<user_id>,<time1>,<user_id,>,<photo_id1>” “PhotosByTime”: “<user_id>,<time2>,<user_id,>,<photo_id2>” “PhotosByTime”: “<user_id>,<time3>,<user_id,>,<photo_id3>” “PhotosByTime”: “<user_id>,<time4>,<user_id,>,<photo_id4>” } }
Megastore – Data Storage • Transactions and Concurrency Control • Each Entity Group acts as mini-db, provides ACID semantics. • Transaction management using Write Ahead Logging. • BigTablefeature – ability to store multiple data for same row/column with different timestamps. • Multiversion Concurrency using timestamps – reads and writes do not block each other. Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html
Megastore – Availability / Scalability • Availability • Fault Tolerance achieved by Replication. • Fault Tolerant replication of logs. Adapted the PAXOS algorithm. • Scalability • Performance maximized by partitioning based on Entity Groups. • Transactions wihtin entity-group – single phase using PAXOS. • Transactions across entity groups – two phase using Asynchronous Message Queue • Indexes – ACID within Entity Group, Looser semantics across Entity Groups. Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html
Megastore – Availability / Scalability • Replication Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011 Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html
Megastore – Availability / Scalability • Operations: Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011
Megastore – Replication • PAXOS Algorithm • a way to reach consensus among a group of replicas on a single value. • tolerates delayed or reordered messages and replicas that fail by stopping. • Can tolerate upto N/2 failures. • The original PAXOS algorithm is ill-suited for high-latency network links because it demands multiple rounds of communication so Megastore uses an improved version. • Use? • Databases typically use PAXOS to replicate a transaction log, where a separate instance of PAXOS is used for each position in the log. Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html
Megastore – Data Storage • PAXOS Algorithm • A Master-Slave model is generally used where the Master handles all the replication of writes. • But it causes a bottleneck. Source: http://en.wikipedia.org/wiki/Paxos_(computer_science) Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html
Megastore – Replication • Megastore Architecture Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011
Megastore – Replication • Megastore Read Process Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011
Megastore – Replication • Megastore Write Process Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011
Megastore Experience: • Megastore has been deployed within Google for several years; more than 100 production applications use it as their storage service • Most of the customers see extremely high levels of availability (at least five nines) despite a steady stream of machine failures, network hiccups, datacenter outages, and other faults. • Average read latencies are tens of milliseconds, depending on the amount of data, showing that most reads are local
Megastore Performance Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011
Megastore Questions?