210 likes | 301 Views
Full-Text Support in a Database Semantic File System. Kristen LeFevre & Kevin Roundy Computer Sciences 736. Leveraging DBs in File Systems. What do databases have to offer? Transactions Concurrency control Crash recovery Query power (metadata) Extensibility – add new objects/modules
E N D
Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736
Leveraging DBs in File Systems What do databases have to offer? • Transactions • Concurrency control • Crash recovery • Query power (metadata) • Extensibility – add new objects/modules • Efficient Search!
Re-thinking Directories • Current state of directories: • User remembers what, not where Our System: • Search tools for grouping related files • Semantically meaningful directories [Semantic FS] • Files are stored in tables • Directories are just for looks LAME!
Related Work • Semantic Filesystems • Use a DB [Inversion Filesystem] • NFS Meets Databases [Halverson] • NFS for portability, transparency, existing code support, familiar semantics • Server-side caching for performance Bringing ideas together: • Use [Halverson]’s infrastructure to implement semantic filesystem ideas
Roadmap • Overview of System Design and Implementation • Virtual Directories and Full-Text Queries • Live Demonstration • Conclusions & Future Work
System Architecture Standard NFS Clients: ... Client Client NFS Front End NFS Server: Custom Backend Object-Relational Database: M M TS2 M M TS2 Storage Storage
Postgres Capabilities An object-relational DB such as Postgres lets you define and add modules. Case in point: Tsearch2 New type: tsvector Related function: to_tsvector to_tsvector(‘a b a c'); ‘a':1,3 ‘b':2 ‘c':4 Related index: idxFTI Set triggers to do updates
[Halverson] Schema fileatt 1 1 1 N N N naming allfiles
Database Schema strstr(a,”.txt”) fileatt 1 1 1 N N N naming allfiles
Database Schema strstr(a,”.txt”) fileatt 1 tsearch2 index 1 1 1 1 allfiles_txt N N N naming allfiles
Roadmap • Overview of System Design and Implementation • Virtual Directories and Full-Text Queries • Live Demonstration • Conclusions & Future Work
Virtual Directories and Text Search • Want to handle 2 types of text queries • Boolean keyword queries • e.g. (‘Kristen’ | ‘Kevin’ | ‘Remzi’) & ‘file’ & ‘system’ • IR rank queries • e.g. Rank files with respect to (‘computer’ & ‘architecture’) • More powerful than grep! • Virtual directories proposed for Semantic File systems • Incorporate full-text queries without “breaking” NFS interface for existing applications
DBMS Full-Text Support • Keyword Search • Text indices support search over keywords • Words extracted from document, stemmed, “stopwords” removed • Rank • Used existing rank() function as a black-box • rank() counts number of times each word appears in document, and whether search terms are near one another • Optionally, normalize by document length • Other notions of IR rank could easily be substituted
Semantics of Virtual Directories • Encountered some tradeoffs • What we did: • Static virtual directories (search once on mkdir) • Directory contents as a snapshot at one point in time • Hard links /CS736 project papers reading questions %nfs% writeup talk outline NFS Thread ideas NFS vs AFS
Semantics of Virtual Directories • Encountered some tradeoffs • Alternatives (all also valid): • Static virtual directory creation with symbolic links • leads to dangling (broken) links • Process query lazily on readdir command • Semantics used in Semantic File System paper • Dynamically update contents of virtual directories on file creation, deletion, or write • Can be implemented using database triggers • More expensive, heavier back-end load
Roadmap • Overview of System Design and Implementation • Virtual Directories and Full-Text Queries • Live Demonstration • Conclusions & Future Work
Roadmap • Overview of System Design and Implementation • Virtual Directories and Full-Text Queries • Live Demonstration • Conclusions & Future Work
Conclusions • Benefits of our proxy architecture: • Standard NFS clients • Postgres as black box • Simple to expose functionality of DB • Use & add DB objects at will
Future Work • Performance evaluation to understand the overhead of new functionality • Dynamic index maintenance (file creation & modification) • Virtual directory creation and text querying • Block-level text writes and caching • Query support for other file types • Mechanisms for extracting and indexing meta-data from additional file types (e.g., image files) • Performance Monitoring, Adaptive Indexing and storage format within the NFS Proxy
Thanks!Questions? Special Thanks: Remzi Arpaci-Dusseau Alan Halverson David DeWitt