1 / 38

PNUTS: Yahoo!’s Hosted Data Serving Platform

Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo! Research. PNUTS: Yahoo!’s Hosted Data Serving Platform. With some additions by S. Sudarshan. How do I build a cool new web app?.

damaris
Download Presentation

PNUTS: Yahoo!’s Hosted Data Serving Platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo! Research PNUTS: Yahoo!’s Hosted Data Serving Platform With some additions by S. Sudarshan

  2. How do I build a cool new web app? • Option 1: Code it up! Make it live! • Scale it later • It gets posted to slashdot • Scale it now! • Flickr, Twitter, MySpace, Facebook, …

  3. How do I build a cool new web app? • Option 2: Make it industrial strength! • Evaluate scalable database backends • Evaluate scalable indexing systems • Evaluate scalable caching systems • Architect data partitioning schemes • Architect data replication schemes • Architect monitoring and reporting infrastructure • Write application • Go live • Realize it doesn’t scale as well as you hoped • Rearchitect around bottlenecks • 1 year later – ready to go!

  4. What are my friends up to? Sonja: Brandon: Example: social network updates Brian Sonja Jimi Brandon Kurt

  5. Example: social network updates 6 Jimi <ph.. 8 Mary <re.. 12 Sonja <ph.. 15 Brandon <po.. 16 Mike <ph.. <photo> <title>Flower</title> <url>www.flickr.com</url> </photo> 17 Bob <re..

  6. What do we need from our DBMS? • Web applications need: • Scalability • And the ability to scale linearly • Geographic scope • High availability • Web applications typically have: • Simplified query needs • No joins, aggregations • Relaxed consistency needs • Applications can tolerate stale or reordered data

  7. What is PNUTS?

  8. A 42342 E A 42342 E B 42521 W B 42521 W F 15677 E D 12352 E E 75656 C C 66354 W B 42521 W A 42342 E C 66354 W C 66354 W D 12352 E D 12352 E E 75656 C E 75656 C F 15677 E F 15677 E What is PNUTS? Indexes and views CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Geographic replication Parallel database Structured, flexible schema Hosted, managed infrastructure

  9. Query model • Per-record operations • Get • Set • Delete • Multi-record operations • Multiget • Scan • Getrange • Web service (RESTful) API

  10. Data-path components Detailed architecture Clients REST API Routers Message Broker Tablet controller Storage units

  11. Detailed architecture Local region Remote regions Clients REST API Routers YMB Tablet controller Storage units

  12. Storage unit Tablet Tablet splitting and balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Tablets may grow over time Overfull tablets split Shed load by moving tablets to other servers

  13. Query processing

  14. MIN-Canteloupe SU1 Canteloupe-Lime SU3 Lime-Strawberry SU2 Strawberry-MAX SU1 Grapefruit…Pear? Grapefruit…Lime? Lime…Pear? Storage unit 1 Storage unit 2 Storage unit 3 Range queries Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Router Lime Mango Orange Strawberry Tomato Watermelon

  15. Write key k SU SU SU 3 8 5 4 2 6 1 7 Updates Sequence # for key k Write key k Routers Message brokers Write key k Sequence # for key k SUCCESS Write key k

  16. Yahoo Message Bus • Distributed publish-subscribe service • Guarantees delivery once a message is published • Logging at site where message is published, and at other sites when received • Guarantees messages published to a particular cluster will be delivered in same order at all other clusters • Record updates are published to YMB by master copy • All replicas subscribe to the updates, and get them in same order for a particular record

  17. Asynchronous replication and consistency

  18. Asynchronous replication

  19. Consistency model • Goal: make it easier for applications to reason about updates and cope with asynchrony • What happens to a record with primary key “Brian”? Record inserted Delete Update Update Update Update Update Update Update v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Time Generation 1

  20. Consistency model Read Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  21. Consistency model Read up-to-date Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  22. Consistency model Read-critical(required version): Read ≥ v.6 Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  23. Consistency model Write Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  24. Consistency model Write if = v.7 Test-and-set-write(required version) ERROR Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  25. Mechanism: per record mastership Consistency model Write if = v.7 ERROR Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  26. Record and Tablet Mastership • Data in PNUTS is replicated across sites • Hidden field in each record stores which copy is the master copy • updates can be submitted to any copy • forwarded to master, applied in order received by master • Record also contains origin of last few updates • Mastership can be changed by current master, based on this information • Mastership change is simply a record update • Tablets mastership • Required to ensure primary key consistency • Can be different from record mastership

  27. Other Features • Per record transactions • Copying a tablet (on failure, for e.g.) • Request copy • Publish checkpoint message • Get copy of tablet as of when checkpoint is received • Apply later updates • Tablet split • Has to be coordinated across all copies

  28. Query Processing • Range scan can span tablets • Only one tablet scanned at a time • Client may not need all results at once • Continuation object returned to client to indicate where range scan should continue • Notification • One pub-sub topic per tablet • Client knows about tables, does not know about tablets • Automatically subscribed to all tablets, even as tablets are added/removed. • Usual problem with pub-sub: undelivered notifications, handled in usual way

  29. Experiments

  30. Experimental setup • Production PNUTS code • Enhanced with ordered table type • Three PNUTS regions • 2 west coast, 1 east coast • 5 storage units, 2 message brokers, 1 router • West: Dual 2.8 GHz Xeon, 4GB RAM, 6 disk RAID 5 array • East: Quad 2.13 GHz Xeon, 4GB RAM, 1 SATA disk • Workload • 1200-3600 requests/second • 0-50% writes • 80% locality

  31. Inserts • Inserts • required 75.6 ms per insert in West 1 (tablet master) • 131.5 ms per insert into the non-master West 2, and • 315.5 ms per insert into the non-master East.

  32. 10% writes by default

  33. Scalability

  34. Request skew

  35. Size of range scans

  36. Related work • Distributed and parallel databases • Especially query processing and transactions • BigTable, Dynamo, S3, SimpleDB, SQL Server Data Services, Cassandra • Distributed filesystems • Ceph, Boxwood, Sinfonia • Distributed (P2P) hash tables • Chord, Pastry, … • Database replication • Master-slave, epidemic/gossip, synchronous…

  37. Conclusions and ongoing work • PNUTS is an interesting researchproduct • Research: consistency, performance, fault tolerance, rich functionality • Product: make it work, keep it (relatively) simple, learn from experience and real applications • Ongoing work • Indexes and materialized views • Bundled updates • Batch query processing

  38. Thanks! • cooperb@yahoo-inc.com • research.yahoo.com

More Related