1 / 14

Bigtable : A Distributed Storage System for Structured Data

Bigtable : A Distributed Storage System for Structured Data. 0256803 高睿鴻. Introduction. Petabytes of data across thousands of commodity servers. Goal: wide applicability, scalability, high performance , and high availability. Product: Google Analytics, Google Earth, Personalized Search ….

baker-nolan
Download Presentation

Bigtable : A Distributed Storage System for Structured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bigtable: A Distributed Storage System for Structured Data 0256803 高睿鴻

  2. Introduction • Petabytes of data across thousands of commodity servers. • Goal: wide applicability, scalability, high performance , and high availability. • Product: Google Analytics, Google Earth, Personalized Search ….

  3. Data Model • Row key, column key, timestamp.

  4. Chubby • Highly-available and persistent distributed lock service . • Ensure that there is at most one active master at any time. • Discover tablet servers and finalize tablet server deaths. • Store Bigtable schema information. • Store access control lists.

  5. Tablet • Table consists of a set of tablets. • Tablet contains all data associated with a row range. • 100 ~ 200 MB

  6. Tablet

  7. SSTable • SSTable file format is used internally to store Bigtable data. • Provides a persistent, ordered immutable map from keys to values. • Disk v.s memory

  8. Tablet representation

  9. Compactions • Size of memtable increase • Minor compaction process (old memtable→ SSTable→ GFS) • Merging compaction (old SSTables + memtable→ new SSTable)

  10. Architecture

  11. Performance • A tablet server executes approximately 1200 reads per second. • Significant drop in per-server throughput (1~50)

  12. Performance • Imbalance in load in multiple server configuration • Other processes contending for CPU and network • Throughput 100-fold V.S 500-fold servers • Transfer 64KB block over the network for every 1000byte read

  13. Real Application • Google Analytics • JavaScript, raw click table, summary table • Google Earth • Satellite imagery, imagery table • Personalized search • Web search, images, news

  14. Conclusion • Substantial amount of flexibility from designing their data model for Bigtable • Can remove bottlenecks and inefficiencies

More Related