NFS on a Database: Structure and Performance

NFS on a Database: Structure and Performance Alan Halverson Babis Samios

Motivation • Goal: NFS Server / Database Backend • Why Database? • Transactions provide idempotency naturally • Graceful backup/recovery • Why NFS? • Nearly universal client availability • Transparent access for existing applications • Ease of implementation

Approach • Implement standard UNIX file API • open(), read(), write(), etc. • All routines talk to the database • Modify NFS server to use new API • … • Profit!

Main Results • Efficient Implementation is Possible • Same order of magnitude with native file system for read/write operations • Choice of Database Schema is Important • Server Cache Usage is Critical • Avoids database round-trips

Roadmap • Approach • NFS server choices • Databases choices • Architecture/Design • Experimental Setup & Results • Summary/Conclusions

Database Choices • Many available DBMS’s • We chose PostgreSQL • Free, open source • Inspiration for our work was the Inversion File System – also implemented on top of Postgres • Uses client/server model

NFS Server Choices • Kernel mode • Pros: included in Linux, supports NFS v3 • Cons: difficult to debug • User mode - UNFSD • Pros: Easier to debug, comm. with PostgreSQL possible! • Cons: Only supports NFS v2 • Our choice: User mode

Architecture

Database Schema • meta-data -> file_attributes • dir hierarchy -> naming • data -> Many options • Table/File (used by Inversion FS) • Single Table (avoids table creation overhead) • Intermediate solutions (e.g. table/dir)

Single Table Schema file_attributes 1 1 1 N N N naming all_files

Caching • Old Story: Client Side Caching • Buffer cache • New Story: Server Side Caching • Minimize the number of round-trips to the DB by maintaining three different caches: • Stat cache • Naming cache • Buffer cache (significantly beneficial only in a multi-client environment) Major Contribution

Binary Data • SQL statements issued to PostgreSQL must contain ASCII data only • Provides escaping function • escape(data) ≤ 4 x data • We used base64 encoding • base64(data) = 4/3 x data • Best case raw write performance is 4/3 of native file system write performance

Experimental Setup

Summary/Conclusions • Design and implementation of NFS operating on top of PostgreSQL • Use of 3-tier architecture for maximum flexibility • Performance comparable to native UNIX FS for read/write operations • Factors that affect performance • Caching (both server and client side) • Chunk size and NFS r/w message size • Database Schema

Things we will not do • Asynchronous database writes (for both data and meta-data) • Compare recovery times with both ext2 and ext3 • Test multi-client environment • Add mechanism for querying system meta-data

NFS on a Database: Structure and Performance

NFS on a Database: Structure and Performance

Presentation Transcript

Performance Analysis using Windows Performance Toolkit

FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

The Progress V9 Database

Oracle Database 11g for Windows and .NET

Oracle Partitioning in Oracle Database 11g

Tutorial HK: Configuring Oracle on Linux For Peak Performance

Advanced Database Systems

Database Administration: The Complete Guide to Practices and Procedures

DataBase Data Modeling Using the Entity-Relationship Model

Converting to FileMaker 8.5

Top 10, no – make that 11, things about Oracle Database 11g Release 1

Performance Tools

IT 20303

Introduction to Database

MCS-043

Protein Structure

Chapter 11: Storage and File Structure

Database Design Specialist

Chapter 4: SQL