230 likes | 622 Views
Mina Farid University of Waterloo CS 848 Presentation 8 February 2010. PNUTS: Yahoo!’s Hosted Data Serving Platform. Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Research. Outline.
E N D
Mina Farid University of Waterloo CS 848 Presentation 8 February 2010 PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Research
Outline • Motivation • Data and Query Model • Consistency • System Architecture • Applications • Experiments Mina Farid
Motivation • Scalability • Response Time (SLAs) • High Availability and Fault Tolerance • Relaxed Consistency Guarantees • Serializable Transactions • Eventual Consistency: update any replica, all updates are propagated to all replicas, but potentially in different orders Mina Farid
Data and Query Model • Simplified Relational Data Model (tables, records, attributes) • Flexible schemas • Query: Selection and Projection from a single table. Specific applications • Scans a few records • No ad-hoc queries • Support for hashed and ordered tables Mina Farid
Consistency • In between • One record updates • Per-record timeline consistency: replicas of a record apply updates in the same order • For one version, all replicas contain the same information General Serializability Eventual Consistency Mina Farid
Consistency (cont’d) • Master replica for each record. • Updates are forwarded to this master replica • Master record carries the version info • API calls - Consistency Read-any Read-critical(required_version) Read-latest Write Test-and-set-write(required_version) Mina Farid
System Architecture Tablet Controller Message Broker Routers Storage Unit 1 Storage Unit 2 Storage Unit N . . . . . . . . Region Mina Farid
System Architecture – Data Storage and Retrieval • Regions with full complement of system and data • Tables are partitioned into tablets • Tablet is just a group of records of a certain table • Tablets are stored on storage units servers • Storage units respond to: get() scan() set() Mina Farid
Routers’ Mapping – Ordered Table • Routers decide: • Which tablets contain which records • Which SU holds which tablets Tablet 1 Tablet 2 Tablet 3 Tablet 4 Mina Farid
System Architecture Tablet Controller Message Broker Routers Storage Unit 1 Storage Unit 2 Storage Unit N . . . . . . . . Region Mina Farid
System Architecture Tablet Controller Message Broker Message Broker Tablet Controller Routers Routers Storage Units Storage Units Region 1 Region 2 Mina Farid
System Architecture – Replication and Consistency 1- Yahoo! Message Broker • Reliable topic based publish/subscribe • Updates are asynchronously propagated to all replicas • Provides ‘Partial Ordering’: • Messages published to a particular YMB will be delivered to all subscribers in the same order. • Messages published to different YMBs may be delivered in any order • Solution: per-record mastership Mina Farid
System Architecture – Replication and Consistency 2- Consistency and Record Mastership • One copy of a record as a master • Updates are forwarded to that master copy • Publish update (commit) • Different records in the same table can be mastered in different clusters • Who is the master record? How it is selected? • Each record carries meta-data information about the identity of the master record (changeable) • Record receiving most updates Mina Farid
Query Processing • Multi-record querying • Scatter-gather engine (Router) • Split multi-record request to multiple single-record requests • Initiates parallel queries • Assemble and evaluate results, and send it back to the client • Handles range and scan queries (also supports top-k) Mina Farid
Applications • User Databases Millions of records, frequent updates, important data, relaxed consistency • Social Application Flexible schemas, large number of small updates, no real-time requirements (relaxed consistency) • Content Meta-Data Manage structured metadata, scalable, consistent • Session Data Scalable storage to manage states, but low consistency required Mina Farid
Experiments • Main criteria: Average Request Latency (response time) • Experiment Setup • 3 Regions (2 West, 1 East) 1- Inserting data 2- Varying Load 3- Varying number of Storage Units Mina Farid
Future Enhancements Includes adding the following features: • Indexing, Materialized Views • Bundled updates (atomic non-isolated updates for multiple records) Mina Farid
Conclusion Mina Farid
Thank You! Questions? Mina Farid
Google BigTable • Record-oriented access to very large tables • Does not support: • Geographic replication • Secondary indexes • Materialized views • Hash-organized tables Mina Farid
Dynamo • Focuses on availability • Provides geographic replication via ‘gossip’ mechanism • Eventual consistency model does not suit all applications • “Updates are committed in different orders at different replicas”, then replicas are eventually reconciled (updates may roll back) • Does not support: • Ordered tables Mina Farid
Boxwood • Provides B-tree implementation • The design favors consistency over scalability (tens of machines) Mina Farid