1 / 65

Bigtable : A Distributed Storage System for Structured Data

Bigtable : A Distributed Storage System for Structured Data. Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu. Motivation and Design Goal. Distributed Storage System for Structured Data Scalability Petabytes of data on Thousands of (commodity) machines

ayla
Download Presentation

Bigtable : A Distributed Storage System for Structured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: KyunghoJeonkyunghoj@buffalo.edu Fall 2012: CSE 704 Web-scale Data Management

  2. Motivation and Design Goal • Distributed Storage System for Structured Data • Scalability • Petabytes of data on Thousands of (commodity) machines • Wide Applicability • Throughput-oriented and Latency-sensitive • High Performance • High Availability Fall 2012: CSE 704 Web-scale Data Management

  3. Data Model Fall 2012: CSE 704 Web-scale Data Management

  4. Data Model • Not a Full Relational Data Model • Provides a simple data model • Supports Dynamic Control over Data Layout • Allows clients to reason about the locality properties Fall 2012: CSE 704 Web-scale Data Management

  5. Data Model – A Big Table • A Table in Bigtable is a: • Sparse • Distributed • Persistent • Multidimensional • Sorted map Fall 2012: CSE 704 Web-scale Data Management

  6. Data Model • Data is indexed using row and column names • Data is treated as uninterpretedstrings • (row:string, column:string, time:int64) string • Data locality can be controlled through careful choices of the schema Fall 2012: CSE 704 Web-scale Data Management

  7. Data Model • Rows • Data maintained in lexicographic order by row key • Tablet: rows with consecutive keys • Units of distribution and load balancing • Columns • Column families • Family:qualifier • Cells • Timestamps Fall 2012: CSE 704 Web-scale Data Management

  8. Data Model – WebTable Example A large collection of web pages and related information Fall 2012: CSE 704 Web-scale Data Management

  9. Data Model – WebTable Example Row Key Tablet - Group of rows with consecutive keys. Unit of Distribution Bigtable maintains data in lexicographic order by row key Fall 2012: CSE 704 Web-scale Data Management

  10. Data Model – WebTable Example Column Family Column family is the unit of access control Fall 2012: CSE 704 Web-scale Data Management

  11. Data Model – WebTable Example Column Column key is specified by “Column family:qualifier” Fall 2012: CSE 704 Web-scale Data Management

  12. Data Model – WebTable Example Column You can add a column in a column family if the column family was created Fall 2012: CSE 704 Web-scale Data Management

  13. Data Model – WebTable Example Cell Cell: the storage referenced by a particular row key, column key, and timestamp Fall 2012: CSE 704 Web-scale Data Management

  14. Data Model – WebTable Example Different cells in a table can contain multiple versions indexed by timestamp Fall 2012: CSE 704 Web-scale Data Management

  15. API Fall 2012: CSE 704 Web-scale Data Management

  16. API Write or Delete values in Bigtable Look up values from individual rows Iterate over a subset of the data in a table Fall 2012: CSE 704 Web-scale Data Management

  17. API – Update a Row Fall 2012: CSE 704 Web-scale Data Management

  18. API – Update a Row Opens a Table Fall 2012: CSE 704 Web-scale Data Management

  19. API – Update a Row We’re going to mutate the row Fall 2012: CSE 704 Web-scale Data Management

  20. API – Update a Row Store a new item under the column key “anchor:www.c-span.org” Fall 2012: CSE 704 Web-scale Data Management

  21. API – Update a Row Delete an item under the column key “anchor:www.abc.com” Fall 2012: CSE 704 Web-scale Data Management

  22. API – Update a Row Atomic Mutation Fall 2012: CSE 704 Web-scale Data Management

  23. API – Iterate over a Table Create a Scanner instance Fall 2012: CSE 704 Web-scale Data Management

  24. API – Iterate over a Table Access “anchor” column family Fall 2012: CSE 704 Web-scale Data Management

  25. API – Iterate over a Table Specify “return all versions” Fall 2012: CSE 704 Web-scale Data Management

  26. API – Iterate over a Table Specify a row key Fall 2012: CSE 704 Web-scale Data Management

  27. API – Iterate over a Table Iterate over rows Fall 2012: CSE 704 Web-scale Data Management

  28. API – Other Features Single row transaction Client-supplied scripts in the address space of the server Input source/Output target for MapReduce jobs Fall 2012: CSE 704 Web-scale Data Management

  29. A Typical Google Machine Fall 2012: CSE 704 Web-scale Data Management

  30. A Google Cluster Fall 2012: CSE 704 Web-scale Data Management

  31. A Google Cluster Fall 2012: CSE 704 Web-scale Data Management

  32. Building Blocks • Chubby • Highly-available and persistent distributed lock service • GFS • Store logs and data files • SSTable • Google’s immutable file format • A persistent, ordered immutable map from keys to values • http://code.google.com/p/leveldb/ Fall 2012: CSE 704 Web-scale Data Management

  33. Chubby • Highly-available and persistent distributed lock service • 5 replicas, one is elected as a master • Paxos • Provides a namespace that consists of directories and small files Fall 2012: CSE 704 Web-scale Data Management

  34. Implementation • Client Library • Master • one and only one! • Tablet Servers • Many Fall 2012: CSE 704 Web-scale Data Management

  35. Implementation - Master • Responsible for assigning tablets to table servers • Addition/removal of tablet server • Tablet-server load balancing • Garbage collecting files in GFS • Handles schema changes • Single master system (as GFS did) Fall 2012: CSE 704 Web-scale Data Management

  36. Tablet Server Manages a set of tablets Handles read and write requests to the tablets Splits tablets that have grown too large Fall 2012: CSE 704 Web-scale Data Management

  37. How Does a Client Find a Tablet? Fall 2012: CSE 704 Web-scale Data Management

  38. Tablet Assignment Each tablet is assigned to at most one tablet server at a time When a tablet is unassigned, and a tablet server is available, the master assigns the tablet by sending a tablet load request Bigtable uses Chubby to keep track of tablet servers Fall 2012: CSE 704 Web-scale Data Management

  39. Tablet Assignment • Detecting a tablet server which is no longer serving its tablets • The master periodically asks each tablet server for the status of its lock • If a tablet server reports it has lost its lock, or if the master cannot reach a tablet server, • The master attempts to acquire an exclusive lock on the server’s file • If the lock acquire is successful -> Chubby is alive, so the tablet server must have a problem • The master deletes the server’s file in Chubby to ensure the tablet server can never serve again • Then, the master move all the tablets that were previously assigned to that server into the set of unassigned tablets Fall 2012: CSE 704 Web-scale Data Management

  40. Tablet Assignment • When a master is started, the master… • Grabs a unique master lock in Chubby • Scans the servers directory in Chubby to find the live servers • Communicates with every live tablet server to discover the current tablet assignment • Scans the METADATA table and adds unassigned tablets to the set of unassigned tablets Fall 2012: CSE 704 Web-scale Data Management

  41. Tablet Serving Fall 2012: CSE 704 Web-scale Data Management

  42. Tablet Serving • Memtable • A sorted buffer • Maintains the updates on a row-by-row basis • Each row is copy-on-write to maintain row-level consistency • Older updates are stored in a sequence of SSTable Fall 2012: CSE 704 Web-scale Data Management

  43. Tablet Serving Fall 2012: CSE 704 Web-scale Data Management

  44. Tablet Serving - Write • Write operation • The server checks if the operation is valid • A valid mutation is written to the commit log • After the write has been committed, its contents are inserted into the memtable Fall 2012: CSE 704 Web-scale Data Management

  45. Tablet Serving Fall 2012: CSE 704 Web-scale Data Management

  46. Tablet Serving - Read • Read operation • Check if the operation is valid • A valid operation is executed on a merged view of the sequence of SSTables and the memtable • The merged view can be formed efficiently since SSTables and the memtable are lexicographically sorted data structure Fall 2012: CSE 704 Web-scale Data Management

  47. Tablet Serving - Recover Fall 2012: CSE 704 Web-scale Data Management

  48. Tablet Serving - Recover • Recover a table • A tablet server reads its metadata from METADATA table • The metadata contains the list of SSTables that comprise a tablet and a set of redo points • The server reads the indices of the SSTables into memory and reconstructs the memtable by applying all of the updates that have committed since the redo points Fall 2012: CSE 704 Web-scale Data Management

  49. Compaction • Minor compaction • When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable • Major compaction • Rewrite multiple SSTables into one SSTable Fall 2012: CSE 704 Web-scale Data Management

  50. Compaction memtable Memory GFS Commit Log SSTable SSTable SSTable SSTable Write Op Fall 2012: CSE 704 Web-scale Data Management

More Related