220 likes | 235 Views
Explore the features and benefits of BigTable, a distributed storage system designed for structured data. Learn about its data model, scalability, and implementation.
E N D
Big Table:Distributed Storage System For Structured Data • Sergejs Melderis 1 Dennis Kafura – CS5204 – Operating Systems
BigTable Unstructured Data vs. Structured Data • Unstructured data refers to computerized information that either does not have a data model • plain text, audio • Structured data can be described by data model • Flat • Hierarchical • Network • Relational • Dimensional • Object-relational Dennis Kafura – CS5204 – Operating Systems
BigTable Relational Model and RDBMS • most popular model of organizing structured data • model based on first-order predicate logic • provides a declarative method for specifying data and queries via SQL • data is organized in tables of fixed-length records • variety of open source and commercial implementations • provides ACID properties 3 Dennis Kafura – CS5204 – Operating Systems
BigTable NoSQL • not relational database • no fixed table schemas • no join operations • no sql • flexible and/or no data model • usually do not provide ACID properties • scale horizontally 4 Dennis Kafura – CS5204 – Operating Systems
BigTable BigTable • distributed, high performance, fault-tolerant, NoSql storage system build on top of Google File System • designed to scale to a very large size on low cost commodity hardware • it was designed by Google and used in various projects (web indexing) • the paper was published in 2006 • related implementations • HBase • Hypertable • Apache Cassandra • Neptune 5 Dennis Kafura – CS5204 – Operating Systems
BigTable BigTable Data Model • sparse, distributed, persistent multi-dimensional sorted map • map is indexed by a row key, column family, column key, and a timestamp • { row : { column_family : { column : { timestamp : value } } • } 6 Dennis Kafura – CS5204 – Operating Systems
BigTable Webtable “contents” “anchor:cnnsi.com “anchor:my.look.ca” t6 t9 t9 “com.cnn.www” 7 Dennis Kafura – CS5204 – Operating Systems
BigTable Relational Data Model 8 Dennis Kafura – CS5204 – Operating Systems
student_id Column Qualifier BigTable Student table Row Key Column Family Column Qualifier Dennis Kafura – CS5204 – Operating Systems
crn Column Qualifier BigTable Course table Row Key Column Family Column Qualifier Dennis Kafura – CS5204 – Operating Systems
BigTable Example info:first_name info:last_name info:major courses:96322 courses:96320 “905514” info:course info:title info:instructor_id students:905514 students:905520 “96322” 11 Dennis Kafura – CS5204 – Operating Systems
BigTable Students data view in JSON • { 905514: { info : { first_name : { t1 : Sergejs }, last_name : { t1 : Melderis }, major : { t1 : Comp Science } }, courses : { 96322: { t1 : “YES” }, 96320: { t2 : “NO” } } • } 12 Dennis Kafura – CS5204 – Operating Systems
BigTable Rows • row keys are arbitrary strings up to 64 KB • read and write of data under a single row is atomic • ordered in lexicographic order by row key • row range is dynamically partitioned into blocks called tablets • tablets are units of distribution and loadbalancing 13 Dennis Kafura – CS5204 – Operating Systems
BigTable Columns • Column keys are grouped by column families • Column family is a basic unit of access control • All data stored in a column family is of the same type • Number of column families should be small • There can be unlimited number of columns • Column key is named using family:qualifier 14 Dennis Kafura – CS5204 – Operating Systems
BigTable Timestamps • Bigtable can contain multiple versions of the same data • timestamps are 64-bit integers assigned by Bigtable or client • client can specify to keep up to n versions of data 15 Dennis Kafura – CS5204 – Operating Systems
BigTable Implementation • client library • one master server • distributed lock service called Chubby • many tablet servers containing several tablets • tablet server • handles read and write requests • automatically splits tablets that have grown too large (100 - 200 MB) • client data directly goes to tablet server 16 Dennis Kafura – CS5204 – Operating Systems
BigTable Tablet Location • three-level hierarchy to store tablet location • first level is stored in lock service • root tablet contains the location of metadata tables • metadata tablets contain the location of user tables UserTable1 METADATA tablets Root tablet Lock Service UserTable2 Dennis Kafura – CS5204 – Operating Systems
BigTable Distribution of data • One master server • Chubby distributed lock service • Hundred or thousands of tablet servers • Each tablet contains a contiguous range of rows • Master distributes tablets across of servers • Each tablet server contains tablets with different ranges 18 Dennis Kafura – CS5204 – Operating Systems
BigTable Tablet Representation memtable Read Op Memory GFS tablet log SSTable SSTable Write Op 19 Dennis Kafura – CS5204 – Operating Systems
BigTable Compactions • compaction is a process of writing memtable to SSTable • minor compaction write memtable to SSTable • shrinks the memory usage of the tablet server • reduces the commit log • merging compaction merges several SSTables • major compaction rewrites all SSTables into exactly one SSTable 20 Dennis Kafura – CS5204 – Operating Systems
BigTable API • create, delete tables and column families • write or delete values • look up values from individual rows • scan over a subset of the data in a table 21 Dennis Kafura – CS5204 – Operating Systems
BigTable 22 Dennis Kafura – CS5204 – Operating Systems