Why and How to Build a Trusted Database System on Untrusted Storage?

Why and How to Build a Trusted Database System on Untrusted Storage? Radek Vingralek STAR Lab, InterTrust Technologies In collaboration with U. Maheshwari and W. Shapiro

What? • Trusted Storage • can be read and written only by trusted programs Stanford Database Seminar

Why? Digital Rights Management contract content Stanford Database Seminar

What? Revisited volatile memory processor untrusted storage trusted storage <50B Stanford Database Seminar

What? Refined • Must protect also against accidental data corruption • atomic updates • efficient backups • type-safe interface • automatic index maintenance • Must run in an embedded environment • small footprint • Must provide acceptable performance Stanford Database Seminar

What? Refined • Can assume single-user workload • none or a simple concurrency control • optimized for response time, not throughput • lots of idle time (can be used for database reorganization) • Can assume a small database • 100 KB to 10 MB • can cache the working set • no-steal buffer management Stanford Database Seminar

A Trivial Solution plaintext data • Critique: • does not protect metadata • cannot use sorted indexes key H(db) encryption, hashing COTS dbms trusted storage db untrusted storage Stanford Database Seminar

A Better Solution plaintext data • Critique: • must scan, hash and crypt the entire db to read or write (COTS) dbms key H(db) encryption, hashing db untrusted storage trusted storage Stanford Database Seminar

Yet A Better Solution plaintext data • Open issues: • could we do better than a logarithmic overhead? • could we integrate the tree search with data location? (COTS) dbms key H(A) encryption, hashing A H(B) H(C) B C H(D) H(E) H(F) H(G) untrusted storage D E F G Stanford Database Seminar

TDB Architecture Collections of Objects • Collection Store • index maintenance • scan, match, range • Object • abstract type • Backup Store • full / incremental • validated restore • Object Store • object cache • concurrency control • Chunk Store • encryption, hashing • atomic updates • Chunk • byte sequence • 100B--100KB Untrusted storage Trusted storage Stanford Database Seminar

Chunk Store - Specification • Interface • allocate() -> ChunkId • write( ChunkId, Buffer ) • read( ChunkId ) -> Buffer • deallocate( ChunkId ) • Crash atomicity • commit = [ write | deallocate ]* • Tamper detection • raise an exception if chunk validation fails Stanford Database Seminar

Chunk Store – Storage Organization • Log-structured Storage Organization • no static representation of chunks outside of the log • log in the untrusted storage • Advantages • traffic analysis cannot link updates to the same chunk • atomic updates for free • easily supports variable-sized chunks • copy-on-write snapshots for fast backups • integrates well with hash verification (see next slide) • Disadvantages • destroys clustering (cacheable working set) • cleaning overhead (expect plenty of idle time) Stanford Database Seminar

Chunk Store - Chunk Map • Integrates hash tree and location map • Map: ChunkId  Handle • Handle = ‹Hash, Location› • MetaChunk = Array[Handle] trusted storage H(R) R meta chunks S T data chunks Y X Stanford Database Seminar

cached • Optimized • trusted cache: ChunkId  Handle • look for cached handle upward from X • derefence handles down to X • avoids validating entire path Chunk Store - Read • Basic scheme: Dereference handles from root to X • Derefence • use location to fetch • use hash to validate trusted storage H(R) R S T Y X Stanford Database Seminar

Optimized: • buffer dirty handle of X in cache • defer upward propagation dirty Chunk Store - Write • Basic: write chunks from X to root trusted storage H(R) R S T Y X Stanford Database Seminar

Chunk Store - Checkpointing the Map • When dirty handles fill cache • write affected meta chunks to log • write root chunk last trusted storage H(R) R T X ... X ... S meta chunks Stanford Database Seminar

Chunk Store - Crash Recovery • Process log from last root chunk • residual log • checkpointed log • Must validate residual log trusted storage H(R) ... ... R Y T X ... X ... S crash residual log Stanford Database Seminar

Chunk Store - Validating the Log • Keep incremental hash of residual log in trusted storage • updated after each commit • Hash protects all current chunks • in residual log: directly • in checkpointed log: through chunk map trusted storage H*(residual-log) ... ... R Y T X ... X ... S crash residual log Stanford Database Seminar

Chunk Store - Counter-Based Log Validation • A commit chunk is written with each commit • contains a sequential hash of commit set • signed with system secret key • One-way counter used to prevent replays • Benefits: • allows bounded discrepancy between trusted and untrusted storage • doesn’t require writing to trusted storage after each transaction hash hash X ... R T X ... X ... S c.c. 73 c.c. 74 crash residual log Stanford Database Seminar

Chunk Store - Log Cleaning • Log cleaner creates free space by reclaiming obsolete chunk versions • Segments • Log divided into fixed-sized regions called segments ( ~100 KB) • Segments are securely linked in the residual log for recovery • Cleaning step • read 1 or more segments • check chunk map to find live chunk versions • ChunkId’s in the headers of chunk versions • write live chunk versions to the end of log • mark segments as free • May not clean segments in residual log Stanford Database Seminar

Chunk Store - Multiple Partitions • Partitions may use separate crypto parameters (algorithms, keys) • Enables fast copy-on-write snapshots and efficient backups • More difficult for the cleaner to test chunk version liveness Partition Map Partition Map Q P Q P Position Maps Position Maps Data chunks Data chunks D D2 Stanford Database Seminar

Chunk Store - Cleaning and Partition Snapshots Snaphot PQ P updates c Cleaner moves Q’s c Q&P Q P Q P P.a P.b P.c P.a P.b P.c P.c P.a P.b P.c P.c P.c Checkpoint Crash!! P.a P.b P.c ... P.c ... P.c ... Residual log Stanford Database Seminar

Backup Store • Creates and restores backups of partitions • Backups can be full or incremental • Backup creation utilizes snapshots to guarantee backup consistency (wrt concurrent updates) without locking • Supports full and incremental backups of partitions • Backup Store must verify during a backup restore • integrity of the backup (using a signature) • correctness of incremental restore sequencing Stanford Database Seminar

Object Store • Provides type-safe access to named C++ objects • objects provide pickle and unpickle methods for persistence • but no transparent persistence • Implements full transactional semantics • in addition to atomic updates • Maps each object into a single chunk • less data written and read from the log • simplifies concurrency control • Provides an in-memory cache of decrypted, validated, unpickled, type-checked C++ objects • Implements no-steal buffer management policy Stanford Database Seminar

Collection Store • Provides access to indexed collections of C++ objects using scan, exact match and range queries • Performs automatic index maintenance during updates • implements insensitive iterators • Uses functional indices • an extractor function is used to obtain a key from an object • Collections and indexes are represented as objects • index nodes locked according to 2PL Stanford Database Seminar

Performance Evaluation - Benchmark • Compared TDB to BerkeleyDB using TPC-B • Used TPC-B because: • implementation included with BerkeleyDB • BerkeleyDB functionality limited choice of benchmarks (e.g., 1 index per collection) Stanford Database Seminar

Performance Evaluation - Setup • Evaluation platform • 733 MHz Pentium II, 256 MB • Windows NT 4.0, NTFS files • EIDE disk, 8.9 ms (read), 10.9 ms write seek time • 7200 RPM (4.2 ms avg. rot. latency) • one-way counter: file on NTFS • Both systems used a 4 MB cache • Crypto parameters (for secure version of TDB): • SHA-1 for hashing (hash truncated to 12 B) • 3DES for encryption Stanford Database Seminar

8 6.8 7 5.8 6 5 3.8 avg. response time (ms) 4 3 2 1 0 BerkeleyDB TDB TDB-S Performance Evaluation - Results • Response Time (avg over 100,000 transactions in a steady state): • TDB utilization was set to 60% Stanford Database Seminar

Response Time vs. Utilization • Measured response times for different TDB utilizations: Stanford Database Seminar

Related Work • Theoretical work • Merkle Tree 1980 • Checking correctness of memory (Blum, et. al. 1992) • Secure audit logs, Schneier & Kelsey 1998 • append-only data • read sequentially • Secure file systems • Cryptographic FS, Blaze ‘93 • Read-only SFS, Fu et al. ‘00 • Protected FS, Stein et al. ‘01 Stanford Database Seminar

A Retrospective Instead of Conclusions • Got lots of mileage from using log-structured storage • Partitions add lots of complexity • Cleaning not a big problem • Crypto overhead small on modern PCs (< 6%) • Code footprint too large for many embedded systems • needs to be within 10 KB • GnatDb (see a TR) • For More Information: • OSDI 2000 -- “How to Build a Trusted Database System on Untrusted Storage.” U. Maheshwari, R. Vingralek, W. Shapiro • Technical Reports available at http://www.star-lab.com/tr/ Stanford Database Seminar

Database Size vs. Utilization Stanford Database Seminar

Stanford Database Seminar

Why and How to Build a Trusted Database System on Untrusted Storage?