220 likes | 233 Views
Tactical Storage Systems (TSS) allow users to access multiple independent data sources easily and securely. TSS provides simple, secure, and semantic access to remote data, enabling users to build complex storage structures. This system solves the problem of accessing data across different services, protocols, and locations.
E N D
Tactical Storage:Simple, Secure, and SemanticAccess to Remote Data Prof. Douglas Thain University of Notre Dame http://www.cse.nd.edu/~dthain
Plentiful Computing Power http://www.cs.wisc.edu/condor/map • As of 25 April 2006... • Condor Worldwide: • 56,682 CPUs / ??? TB / 1758 sites • Teragrid • 15,328 CPUs / 220 TB / 6 sites • Open Science Grid • 21,156 CPUs / 83 TB / 61 sites • EGEE Grid • Lots???
Shared Filesystem private disk private disk private disk shared disk Complex Ecology of Storage HTTP, FTP, RFIO, gLite, SRB, SCP, RSYNC, HTTP... private disk shared disk Independent Cluster Disks
Problems Accessing Data • Large Burden on the User • User may not be able/willing to state files in advance. • Different services/protocols available at different sites. • Programs not modified to take advantage of services. • Different access modes for different purposes. • File transfer: preparing system for intended use. • File system: access to data for running jobs. • Resources go unused. • Disks on each node of a cluster. • Unorganized resources in a department/lab. • Would like to combine disks into larger structures. • A global file system can’t satisfy everyone! • (Global means different things to different people.) • Both a technical and social problem.
What’s the Problem? • We often assume that the site administrator is responsible for making the site comfortable for the user. (Not possible on the grid!) • Rather, the user should be able to bring along a mechanism to access multiple independent (remote?) data sources. • Of course, we have to make it easy!
Tactical Storage Systems (TSS) • A TSS allows any node to serve as a file server or as a file system client. • All components can be deployed without special privileges – but with security. • Users can build up complex structures. • Filesystems, databases, caches, ... • Admins need not know/care about larger structures. • Two Independent Concepts: • Resources – The raw storage to be used. • Abstractions – The organization of storage.
App file transfer App Parrot Simple Filesystem Distributed Filesystem Abstraction Parrot Distributed Database Abstraction file server file server file server file server file server file server file server 3PT UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX UNIX Cluster administrator controls policy on all storage in cluster Workstations owners control policy on each machine. App Parrot ??? file system file system file system file system file system file system file system
Key Properties • Tactical Storage is Simple: • Appears as an ordinary filesystem. • Applies to unmodified applications and data w/out code changes, relinking, kernel modules, etc... • Tactical Storage is Secure: • Authentication with standard GSI or Kerberos. • Rich distributed access control system. • Tactical Storage is Semantic: • Name data by meaning, not by location. • Supports external name resolution mechanisms.
Credit: Sander Klous @ NIKHEF Remote Database Access • HEP Simulation Needs Direct DB Access • App linked against Objectivity DB. • Objectivity accesses filesystem directly. • How to distribute application securely? • Solution: Remote Root Mount via Parrot: parrot –M /=/chirp/fileserver/rootdir DB code can read/write/lock files directly. GSI script DB data file server file system Parrot WAN libdb.so GSI Auth Simple FS sim.exe
Credit: Igor Sfiligoi @ Fermi National Lab Remote Application Loading • Modular Simulation Needs Many Libraries • Devel. on workstations, then ported to grid. • Selection of library depends on analysis tech. • Constraint: Must use HTTP for file access. • Solution: Dynamic Link with TSS+HTTP: • /home/cdfsoft -> /http/dcaf.fnal.gov/cdfsoft appl proxy select several MB from 60 GB of libraries liba.so HTTP server file system Parrot libb.so proxy HTTP libc.so
GET /home HTTP/1.0 <HTML> <HEAD> <H1> opendir(/home) opendir(/home) Technical Problem • HTTP is not a filesystem! (No directories) • Advantages: Firewalls, caches, admins. Appl HTTP Server root Parrot home etc bin HTTP Module alice babar cms
GET /home/.dir HTTP/1.0 alice babar cms opendir(/home) .dir opendir(/home) .dir Technical Problem • Solution: Turn the directories into files. • Can be cached in ordinary proxies! • Hierarchical SHA1 integrity check. Appl HTTP Server make httpfs root Parrot home etc bin HTTP Module alice babar cms
Logical Access to Bio Data • Many databases of biological data in different formats around the world: • Archives: Swiss-Prot, TreMBL, NCBI, etc... • Replicas: Public, Shared, Private, ??? • Users and applications want to refer to data objects by logical name, not location! • Access the nearest copy of the non-redundant protein database, don’t care where it is. • Solution: EGEE data management system maps logical names (LFNs) to physical names (SFNs). Credit: Christophe Blanchet, Bioinformatics Center of Lyon, CNRS IBCP, France http://gbio.ibcp.fr/cblanchet, Christophe.Blanchet@ibcp.fr
Run BLAST on LFN://ncbi.gov/nr.data Where is LFN://ncbi.gov/nr.data? open(LFN://ncbi.gov/nr.data) Find it at: FTP://ibcp.fr/nr.data open(FTP://ibcp.fr/nr.data) RETR nr.data Logical Access to Bio Data gLite Server BLAST nr.data EGEE File Location Service Chirp Server Parrot nr.data FTP Server RFIO gLite HTTP FTP nr.data
Current Work • Now that we can easily use any storage... • Much easier to arrange data/jobs arbitrarily. • Idea: combine cluster storage / cluster comp! • Goal: keep jobs close to data that they need. • PINS: Processing in STorage • Example: GEMS Distributed Databank • Facility for creating, storing, and analyzing molecular dynamics data in a cluster. • Goal: Be able to easily scale both CPU and storage capacity by adding commodity nodes. Credit: Jesus Izaguirre and Aaron Striegel @ Notre Dame
App Fetch D1 Adapter Distributed Filesystem Abstraction D1 D1 D3 D4 D2 D3 D4 file server file server file server file server file server file server file server UNIX UNIX UNIX UNIX UNIX UNIX UNIX D1 D2 D3 D4 D1 D2 D3 D4 J1 J2 J3 J4 meta-data database Compute F(D1) Query (Mol==“CH4”) && (T>300K) file system file system file system file system file system file system file system F(D1) F
More Open Problems • Resource Management • How to prevent overcommitment -> badput? • Security • How to easily express complex policies for sharing and controlling combined cpu/disk? • Reliability • How to deal with disconnection, erasure, rejection, unexpected performance, etc... • Garbage Collection • What’s to prevent me from filling every disk everywhere with computations that I might need? • Debugging • How do we dig out of numerous, noisy, distributed logs that state relevant to a complex workflow?
Conclusion Tactical storage allows end users to build large structures out of simple building blocks without getting stuck on the ugly details.
Acknowledgments • Science Collaborators: • Christophe Blanchet • Patrick Flynn • Sander Klous • Peter Kunzst • Erwin Laure • John Poirier • Igor Sfiligoi • CS Collaborators: • Jesus Izaguirre • Aaron Striegel • CS Students: • Paul Brenner • James Fitzgerald • Jeff Hemmes • Paul Madrid • Chris Moretti • Gerhard Niederwieser • Phil Snowberger • Justin Wozniak
For more information... Cooperative Computing Lab http://www.cse.nd.edu/~ccl Cooperative Computing Tools http://www.cctools.org Douglas Thain • dthain@cse.nd.edu • http://www.cse.nd.edu/~dthain