580 likes | 592 Views
This article discusses the problem faced by users in accessing data on distributed systems and proposes a solution using Tactical Storage Systems. The system allows users to create, reconfigure, and tear down abstractions without administrator intervention.
E N D
Separating Abstractions from Resources in a Tactical Storage System Douglas Thain University of Notre Dame http://www.nd.edu/~ccl
Abstract • Users of distributed systems encounter many practical barriers between their jobs and the data they wish to access. • Problem: Users have access to many resources (disks), but are stuck with the abstractions (cluster NFS) provided by administrators. • Solution: Tactical Storage Systems allow any user to create, reconfigure, and tear down abstractions without bugging the administrator.
Transparent Distributed Filesystem shared disk The Standard Model
Transparent Distributed Filesystem Transparent Distributed Filesystem private disk private disk private disk FTP, SCP, RSYNC, HTTP, ... shared disk shared disk private disk The Standard Model
Problems with the Standard Model • Users encounter partitions in the WAN. • Easy to access data inside cluster, hard outside. • Must use different mechanisms on diff links. • Difficult to combine resources together. • Resources go unused. • Disks on each node of a cluster. • Unorganized resources in a department/lab. • Unnecessary cross-talk between users. • User A demands async NFS for performance. • User B demands sync NFS for consistency. • A global file system is not possible!
What if... • Users could easily access any storage? • I could borrow an unused disk for NFS? • An entire cluster can be used as storage? • Multiple clusters could be combined? • I could reconfigure structures without root? • (Or bugging the administrator daily.) • Solution: Tactical Storage System (TSS)
Outline • Problems with the Standard Model • Tactical Storage Systems • File Servers, Catalogs, Abstractions, Adapters • Applications: • Remote Dynamic Linking in HEP Simulation • Remote Database Access in HEP Simulation • Expandable Filesystem for Experimental Data • Expandable Database for Bioinformatics Simulation • Ongoing Work • Malloc, Dynamic Views, DACLs, PINS • Final Thought
Tactical Storage Systems (TSS) • A TSS allows any node to serve as a file server or as a file system client. • All components can be deployed without special privileges – but with security. • Users can build up complex structures. • Filesystems, databases, caches, ... • Two Independent Concepts: • Resources – The raw storage to be used. • Abstractions – The organization of storage.
App App Adapter Central Filesystem Distributed Filesystem Abstraction Adapter Distributed Database Abstraction file server file server file server file server file server file server file server UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX Cluster administrator controls policy on all storage in cluster Workstations owners control policy on each machine. App Adapter ??? file system file system file system file system file system file system file system
Components of a TSS: 1 – File Servers 2 – Catalogs 3 – Abstractions 4 – Adapters
1 – File Servers • Unix-Like Interface • open/close/read/write • getfile/putfile to stream whole files • opendir/stat/rename/unlink • Complete Independence • choose friends • limit bandwidth/space • evict users? • Trivial to Deploy • run server + setacl • no privilege required • can be thrown into a grid system • Flexible Access Control Chirp Protocol file server A file server B file system owner of server A owner of server B
Access Control in File Servers • Unix Security is not Sufficient • No global user database possible/desirable. • Mapping external credentials to Unix gets messy. • Instead, Make External Names First-Class • Perform access control on remote, not local, names. • Types: Globus, Kerberos, Unix, Hostname, Address • Each directory has an ACL: globus:/O=NotreDame/CN=DThain RWLA kerberos:dthain@nd.edu RWL hostname:*.cs.nd.edu RL address:192.168.1.* RWLA
test.c test.dat a.out cms.exe Problem: Shared Namespace file server globus:/O=NotreDame/* RWLAX
/O=NotreDame/CN=Monk /O=NotreDame/CN=Ted mkdir mkdir /O=NotreDame/CN=Monk RWLA /O=NotreDame/CN=Ted RWLA test.c a.out test.c a.out Solution: Reservation (V) Right file server mkdir only! O=NotreDame/CN=* V(RWLA)
2 - Catalogs HTTP XML, TXT, ClassAds catalog server catalog server periodic UDP updates
3 - Abstractions • An abstraction is an organizational layer built on top of one or more file servers. • End Users choose what abstractions to employ. • Working Examples: • CFS: Central File System • DSFS: Distributed Shared File System • DSDB: Distributed Shared Database • Others Possible? • Distributed Backup System • Striped File System (RAID/Zebra)
CFS: Central File System appl appl appl adapter adapter adapter CFS CFS CFS file server file file file
access data lookup file location DSFS: Dist. Shared File System appl appl adapter adapter DSFS DSFS file server file server file server file file file file file ptr file file file file file ptr ptr
DSDB: Dist. Shared Database appl appl adapter adapter DSDB DSDB insert query direct access file server database server file server create file file file file index file file file file file file file file
tcsh tcsh cat cat vi vi 4 - Adapter • Like an OS Kernel • Tracks procs, files, etc. • Adds new capabilities. • Enforces owner’s policies. • Delegated Syscalls • Trapped via ptrace interface. • Action taken by Parrot. • Resources chrgd to Parrot. • User Chooses Abstr. • Appears as a filesystem. • Option: Timeout tolerance. • Option: Cons. semantics. • Option: Servers to use. • Option: Auth mechanisms. system calls trapped via ptrace Adapter - Parrot process table file table Abstractions: CFS – DSFS - DSDB
App App Adapter Central Filesystem Distributed Filesystem Abstraction Adapter Distributed Database Abstraction file server file server file server file server file server file server file server UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX Cluster administrator controls policy on all storage in cluster Workstations owners control policy on each machine. App Adapter ??? file system file system file system file system file system file system file system
Performance Summary • Nothing comes for free! • System calls: order of magnitude slower. • Memory bandwidth overhead: extra copies. • TSS can drive network/switch to limits. • Compared to NFS Protocol: • TSS slightly better on small operations. (no lookup) • TSS much better in network bandwidth. (TCP) • NFS caches, TSS doesn’t (today), mixed blessing. • On real applications: • Measurable slowdown • Benefit: far more flexible and scalable.
Outline • Problems with the Standard Model • Tactical Storage Systems • File Servers, Catalogs, Abstractions, Adapters • Applications: • Remote Dynamic Linking in HEP Simulation • Remote Database Access in HEP Simulation • Expandable Filesystem for Astrophysics Data • Expandable Database for Mol. Dynamics Simulation • Ongoing Work • Malloc, Dynamic Views, DACLs, PINS • Final Thoughts
Credit: Igor Sfiligoi @ Fermi National Lab Remote Dynamic Linking • Modular Simulation Needs Many Libraries • Devel. on workstations, then ported to grid. • Selection of library depends on analysis tech. • Solution: Dynamic Link with TSS and FTP: • LD_LIBRARY_PATH=/ftp/server.name/libs Send adapter along with job. appl select several MB from 60 GB of libraries liba.so FTP server file system ld.so libb.so Anon. Login. adapter WAN libc.so FTP driver
Related Work • Lots of file services for the Grid: • GridFTP, Freeldr, NeST, IBP, SRB, RFIO,... • Adapter interfaces with many of these! • Why have another file server? • Reason 1: Must have precise Unix semantics! • Apps distinguish ENOENT vs EACCES vs EISDIR. • FTP always returns error 550, regardless of error. • Reason 2: TSS focused on easy deployment. • No privilege required, no config files, no rebuilding, flexible access control, ...
Credit: Sander Klous @ NIKHEF Remote Database Access • HEP Simulation Needs Direct DB Access • App linked against Objectivity DB. • Objectivity accesses filesystem directly. • How to distribute application securely? • Solution: Remote Root Mount via TSS: parrot –M /=/chirp/fileserver/rootdir DB code can read/write/lock files directly. GSI script DB data TSS file server file system adapter WAN libdb.so GSI Auth CFS sim.exe
Credit: John Poirer @ Notre Dame Astrophysics Dept. Can only analyze the most recent data. 25-year archive 10 GB/day today could be lots more! buffer disk daily tape analysis code daily tape daily tape daily tape daily tape Expandable Filesystemfor Experimental Data Project GRAND http://www.nd.edu/~grand
Credit: John Poirer @ Notre Dame Astrophysics Dept. Can analyze all data over large time scales. analysis code 25-year archive 10 GB/day today could be lots more! Adapter buffer disk daily tape Distributed Shared Filesystem daily tape daily tape daily tape daily tape file server file server file server Expandable Filesystemfor Experimental Data Project GRAND http://www.nd.edu/~grand file server
Appl: Distributed MD Database • State of Molecular Dynamics Research: • Easy to run lots of simulations! • Difficult to understand the “big picture” • Hard to systematically share results and ask questions. • Desired Questions and Activities: • “What parameters have I explored?” • “How can I share results with friends?” • “Replicate these items five times for safety.” • “Recompute everything that relied on this machine.” • GEMS: Grid Enabled Molecular Sims • Distributed database for MD siml at Notre Dame. • XML database for indexing, TSS for storage/policy.
XML+ Temp>300K Mol==CH4 data host5:fileZ host6:fileX XML -> host6:fileX host2:fileY host5:fileZ XML -> host1:fileA host7:fileB host3:fileC A Y C Z X B Credit: Jesus Izaguirre and Aaron Striegel, Notre Dame CSE Dept. GEMS Distributed Database database server catalog server catalog server
GEMS and Tactical Storage • Dynamic System Configuration • Add/remove servers, discovered via catalog • Policy Control in File Servers • Groups can Collaborate within Constraints • Security Implemented within File Servers • Direct Access via Adapters • Unmodified Simulations can use Database • Alternate Web/Viz Interfaces for Users.
Outline • Problems with the Standard Model • Tactical Storage Systems • File Servers, Catalogs, Abstractions, Adapters • Applications: • Remote Dynamic Linking in HEP Simulation • Remote Database Access in HEP Simulation • Expandable Filesystem for Astrophysics Data • Expandable Database for Mol. Dynamics Simulation • Ongoing Work • Malloc, Dynamic Views, DACLs, PINS • Final Thoughts
Ongoing Work • Malloc() for the Filesystem • Resource owners want to limit users. (quota) • End users need space assurance. (alloc) • Need per-user allocations, not just global limits. • Dynamic Data Views • Convert from DB to FS and back again. • Distributed Access Control • ACLs refer to group definitions elsewhere. • What’s new? Fault tolerance / policy management. • Processing in Storage (PINS) • Move computation to data. • Needs new programming (scripting) model.
job1 10 GB job2 80 GB job3 20 GB input output taska 40 GB taskb 40 GB Malloc in the Filesystem • Paper: “Grid3: Principles and Practice” • 90% of jobs would fail, most due to disk! • Users need to alloc disk like anything else. • (Not accessible to user: quotas, loopback) • Allocation integrated with directory tree: scratch 100 GB
Dynamic Data Views • The same data can be perceived as either a file system or a database. • Example: • DB: get files s.t. (T>300K) && (Mol==“CH4”) • FS: then process using scripts and shell • DB: associate derived files with original • FS: export and tar files for others.
Temp>300K Mol==CH4 App XML -> host6:fileX host2:fileY host5:fileZ Distributed Filesystem Abstraction Dynamic Data Views database server A Y C Z X B
file server B file server A Group “Presidents” /O=NotreDame/CN=Jenkins /O=Purdue/CN=Jischke /O=Indiana/CN=Herbert Access Control List hostname:*.nd.edu RL group:serverB/presidents RWL Distributed Access Control Lists • Users are very comfortable with the ACL and group model. • Can it be adapted to a grid environment? • Yes, can let an ACL refer to remote server. • Challenges: failures, caching, sharing policy. TSS client
PINS: Processing in Storage • Observation: • Traditional clusters separate CPU and storage into two distinct systems/problems. • Distributed computing is always some direct combination of CPU and I/O needs. • Idea: PINS • Cluster HW is already a tighly integrated complex of CPU and I/O. Make the SW reflect the HW. • Key: Always compute in the same place that the data is located. Leave newly created data in place.
1. Compute Y = F(X). 3. Y is stored on S3. 2 Dispatch F to S3. F Y Processing in Storage database server XML index of data files file server file server file server file server (X 200) A B A C X D C S2 S3 S4 S1
Outline • Problems with the Standard Model • Tactical Storage Systems • File Servers, Catalogs, Abstractions, Adapters • Applications: • Remote Dynamic Linking in HEP Simulation • Remote Database Access in HEP Simulation • Expandable Filesystem for Astrophysics Data • Expandable Database for Mol. Dynamics Simulation • Ongoing Work • Malloc, Dynamic Views, DACLs, PINS • Final Thoughts
Tactical Storage Systems • Separate Abstractions from Resources • Components: • Servers, catalogs, abstractions, adapters. • Completely user level. • Performance acceptable for real applications. • Independent but Cooperating Components • Owners of file servers set policy. • Users must work within policies. • Within policies, users are free to build.