250 likes | 379 Views
LevelDB. Riak. Some key-value stores using log-structure. Zhichao Liang frankey0207@gmail.com. Outline. Why log structure? Riak: log-structure hash table Rethinkdb : log-structure b-tree Leveldb : log-structure merge tree Conclusion. Outline. Why log structure?
E N D
LevelDB Riak Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com
Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
Log Structure • A log-structured file system is a file system design first proposed in 1988 by John K. Ousterhout and Fred Douglis. • Design for high write throughput, all updates to data and metadata are written sequentially to a continuous stream, called a log. • Conventional file systems tend to lay out files with great care for spatial locality and make in-place changes to their data structures.
Log Structure for SSD • Random write degrades the system performance and shrinks the lifetime of ssd. • Log structure is ssd-friendly natively! Magnetic Disk SSD RAM free new data 1 data 1 erased new data 1 free data 1 block free data 2 data 2 free erased data 2 free data 2 data 3 free new data 3 data 3 data 3 erased free data 3 free free data 4 free block free free free free free
Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
Riak ? • Riak is an open source, highly scalable, fault-tolerant distributed database. • Supported core features: - operate in highly distributed environments - no single point of failure - highly fault-tolerant - scales simply and intelligently - highly data available - low cost of operations
Bitcask • A Bitcask instance is a directory, and only one operating system process will open that Bitcask for writing at a given time. • The active file is only written by appending, which means that sequential writes do not require disk seeking.
Hash Index: keydir • A keydir is simply a hash table that maps every key in a Bitcask to a fixed-size structure giving the file, offset and size of the most recently written entry for that key .
Merge • The merge process iterates over all non-active file and produces as output a set of data files containing only the “live” or latest versions of each present key.
Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
RethinkDB ? • RethinkDB is a persistent, industrial-strength key-value store with full support for the Memcached protocol. • Powerful technology: - Linear scaling across cores - Fine-grained durability control - Instantaneous recovery on power failure • Supported core features: - Atomic increment/decrement - Values up to 10MB in size - Multi-GET support - Up to one million transactions per second on commodity hardware
Installation & usage • RethinkDB works on modern 64-bit distributions of Linux. • Running the rethinkdb server: Ubuntu 10.04.1 x86_64 Ubuntu 10.10 x86_64 Red Hat Enterprise Linux 5 x86_64 CentOS 5 x86_64 SUSE Linux 10 • Default installation path: /usr/bin/rethinkdb-1.0 • ./rethinkdb-1.0 -f /u01/rethinkdb_data • ./rethinkdb-1.0 -f /u01/rethinkdb_data -c 4 -p 11500 • ./rethinkdb-1.0 -f /u01/rethinkdb_data • -f /u03/rethinkdb_data -c 4 -p 11500
The methodology • Firstly, lack of mechanical parts makes random reads on SSD are significantly efficient! • Secondly, random writes trigger more erases, making these operations expensive, and decreasing the drive lifetime! • RethinkDB takes an append-only approach to storing data, pioneered by log-structured file system! What are the consequences of appen-only ?
Append-only consequences Data Consistency 1) eliminating data locality requires a larger number of disk access Hot Backups Instantaneous Recovery Easy Replication 2) large amount of data that quickly becomes obsolete in an environment with a heavy insert or update workload Lock-Free Concurrency Live Schema Changes Database Snapshots
Append-only B-tree Page 1 Page 1 Page 1 Page 1 15 15 15 15 15 Data File … … 5 Page 3 Page 3 Page 2 Page 3 Page 2 Page 3 15 5 15 15 15 5 15 9 19 9 19 19 19 9 19
Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
LevelDB ? • LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. • Supported core features: - Data is stored sorted by key - Multiple changes can be made in one atomic batch - Users can create a transient snapshot to get a consistent view of data - Data is automatically compressed using the Snappy compression library
Installation & usage • LevelDB works with snappy, which is a compression /decompression library. • It is a library, no database server! download snappy from http://code.google.com/p/snappy/ cd snappy-1.0.4 ./configure && make && make install svn checkout http://leveldb.googlecode.com/svn/trunk/leveldb-read-only cdleveldb-read-only make && cp libleveldb.a /usr/local/lib && cp -r include/leveldb /usr/local/include libleveldb.a
Log-structure merge tree • LevelDB
Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
Conclusion • Log-structure