1 / 32

PNUTS: Yahoo!’s Hosted Data Serving Platform

PNUTS: Yahoo!’s Hosted Data Serving Platform. Brian F. Cooper, Raghu Ramakrishnan , Utkarsh Srivastava , Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz , Daniel Weaver and Ramana Yerneni Yahoo! Research VLDB 2008. Motivation. Web applications requirement: Scalability

zlhna
Download Presentation

PNUTS: Yahoo!’s Hosted Data Serving Platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, RaghuRamakrishnan, UtkarshSrivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and RamanaYerneni Yahoo! Research VLDB 2008

  2. Motivation • Web applications requirement: • Scalability • Low latency and Geographic scope • High availability and Fault tolerance • Low operational cost • Web applications need: • Simplified query • No joins, aggregations • Relaxed consistency • Applications can tolerate stale data

  3. Example: Social network updates 6 Jimi<ph.. 8 Mary <re.. 12 Sonja <ph.. 15 Brandon <po.. 16 Mike <ph.. <photo> <title>Flower</title> <url>www.flickr.com</url> </photo> 17 Bob <re..

  4. What is PNUTS ? [Platform for Nimble Universal Table Storage] Massively parallel database system Hosted, centrally managed infrastructure shared by multiple applications Geographically distributed service Focus is on data serving for web applications Used by Yahoo!’s web applications

  5. What is PNUTS? CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Structured, flexible schema Geographic replication Parallel database Hosted, managed infrastructure

  6. Contribution of paper • Record level, asynchronous geographic replication • Use of guaranteed message-delivery service than a persistent log • Consistency model • In between the eventual consistency and full serializability • Careful choice of features to include or exclude • Delivery of data management as a hosted service

  7. System Architecture • Tables are partitioned horizontally into “Tablets” and scattered across many servers • Each tablet is stored only on a single server within a region • Router • Caches the interval mapping • Maps each tablet to a SU • Tablet Controller • Owns the interval mapping and keeps it up to date • Automatic load balancing • Server failure • Hotspot

  8. Data-path components System Architecture : One region Clients REST API Routers Message Broker Tablet controller Storage units

  9. Detailed Architecture Local region Remote regions Clients REST API Routers YMB Tablet controller Storage units

  10. Storage unit Tablet Tablet splitting and balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Tablets may grow over time Overfull tablets split Shed load by moving tablets to other servers

  11. Functionality: Data and Query model • Simplified relational data model • Schema are flexible • Records are stored as parsed JSON objects • Per-record and Multi-record operations • Get, Set, Delete • Multiget, Scan • Web service (RESTful) API

  12. Functionality: Consistency Model • Hides the complexity of replication • Per-record timeline consistency • All replica of given record apply all updates in the same order • Support whole range of API calls with different levels of consistency • Read-any • Read-critical(required_version) • Read-latest • Write • Test-and-set-write(required_version)

  13. Consistency Model Read Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  14. Consistency Model Read up-to-date Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  15. Consistency Model Read-critical(required version): Read ≥ v.6 Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  16. Consistency Model Write Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  17. Consistency Model Write if = v.7 Test-and-set-write(required version) ERROR Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  18. Data Storage and Retrieval • Ordered table • PK space of a table is divided into intervals • For a given PK, binary search is used to find the tablet • Hash-organized table • n-bit hash function H(), 0 ≤ H() < 2n . [0... 2n) is divided into intervals • Hash the key. Search set of interval using binary search

  19. Record for key k Get key k Record for key k 1 2 3 4 Query Processing: Single Read Get key k SU SU SU

  20. {k1, k2, … kn} Get k1 Scatter/ gather server Get k2 Get k3 1 2 Query Processing: Bulk Read SU SU SU

  21. MIN-Canteloupe SU1 Canteloupe-Lime SU3 Lime-Strawberry SU2 Strawberry-MAX SU1 Grapefruit…Pear? Grapefruit…Lime? Lime…Pear? Storage unit 1 Storage unit 2 Storage unit 3 Query Processing: Range Queries Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Router Lime Mango Orange Strawberry Tomato Watermelon

  22. Query Processing • Range scan can span tablets • Handled by scatter gather engine. Uses simple incremental scanning • Only one tablet scanned at a time • Client may not need all results at once • Continuation object returned to client to indicate where range scan should continue • Notification • External clients can subscribe to YMB to receive updates to data • Client knows about tables, does not know about tablets • Automatically subscribed to all tablets, even as tablets are added/removed

  23. Replication & Consistency • Data in PNUTS is replicated across sites • One copy of record designated as master across all replicas • Master record can be in any tablet and in any region • All updates should be directed to master copy • Write locality • Hidden field in each record stores identity of current master • Updates can be submitted to any copy • Forwarded to master, applied in order received by master • Record also has hidden field for origin of last few updates • Publish message to YMB indicating identity of new master • Mastership change is simply a record update.

  24. Replication & Consistency • System uses asynchronous replication which ensures low-latency updates • Yahoo! Message Broker (YMB) • Distributed publish-subscribe service • Guarantees delivery once a message is published • Logs the message to multiple disks on multiple servers • Message not purged from log until update is applied to all replicas • Guarantees messages published to a particular cluster will be delivered in same order at all other clusters • Record updates are published to YMB by master copy • All replicas subscribe to the updates, and get them in same order for a particular record • One pub-sub topic per tablet

  25. Asynchronous Replication

  26. Write key k SU SU SU 4 5 8 3 7 2 1 6 Replication & Consistency: Data Updates Write key k Sequence # for key k Routers Message brokers Write key k Sequence # for key k SUCCESS Write key k

  27. Recovery • Recovering from a failure • Copying lost tablets from another replica • Tablet controller requests copy from remote replica • Publish checkpoint message to YMB • Copy the tablet to destination region • Tablet boundaries should be kept synchronized across replicas

  28. Experimental Results: Varying Load

  29. Experimental Results: Varying Read/Write Ratio

  30. Experimental Results:Varying Number of Storage Units

  31. Positive Points Simplicity Exploits the web application properties to provide scalable, robust and highly available data store facility Degree of consistency desired can be specified by the applications Per-record timeline consistency and asynchronous replication to achieve low latency Low operational cost

  32. Negative Points • Single YMB in each region • Is this a bottleneck?? • One replica per region • Is this enough? • Paper claims no need to recover any data from the failed storage unit itself • What If the master fails • Since there is no master replica, system is orphaned. When system is in this state few updates might be lost • Will this violate time-line consistency ? • Is the current record partition mechanism efficient? • Clever way of choosing the tablet boundary ? • Currently customers are assigned different clusters of storage units and YMB for performance isolation • Wastage of resources?

More Related