140 likes | 256 Views
Bigtable : A Distributed Storage System for Structured Data. 0256803 高睿鴻. Introduction. Petabytes of data across thousands of commodity servers. Goal: wide applicability, scalability, high performance , and high availability. Product: Google Analytics, Google Earth, Personalized Search ….
E N D
Bigtable: A Distributed Storage System for Structured Data 0256803 高睿鴻
Introduction • Petabytes of data across thousands of commodity servers. • Goal: wide applicability, scalability, high performance , and high availability. • Product: Google Analytics, Google Earth, Personalized Search ….
Data Model • Row key, column key, timestamp.
Chubby • Highly-available and persistent distributed lock service . • Ensure that there is at most one active master at any time. • Discover tablet servers and finalize tablet server deaths. • Store Bigtable schema information. • Store access control lists.
Tablet • Table consists of a set of tablets. • Tablet contains all data associated with a row range. • 100 ~ 200 MB
SSTable • SSTable file format is used internally to store Bigtable data. • Provides a persistent, ordered immutable map from keys to values. • Disk v.s memory
Compactions • Size of memtable increase • Minor compaction process (old memtable→ SSTable→ GFS) • Merging compaction (old SSTables + memtable→ new SSTable)
Performance • A tablet server executes approximately 1200 reads per second. • Significant drop in per-server throughput (1~50)
Performance • Imbalance in load in multiple server configuration • Other processes contending for CPU and network • Throughput 100-fold V.S 500-fold servers • Transfer 64KB block over the network for every 1000byte read
Real Application • Google Analytics • JavaScript, raw click table, summary table • Google Earth • Satellite imagery, imagery table • Personalized search • Web search, images, news
Conclusion • Substantial amount of flexibility from designing their data model for Bigtable • Can remove bottlenecks and inefficiencies