160 likes | 263 Views
Disk Farms at Jefferson Lab . Bryan Hess Bryan.Hess@jlab.org. Background. Data stored in 6,000 tape StorageTek silo Data throughput > 2TB per day Batch farm of ~250cpus for data reduction and analysis Interactive analysis as well. User Needs. Fast access to frequently used data from silo
E N D
Disk Farms at Jefferson Lab Bryan Hess Bryan.Hess@jlab.org
Background • Data stored in 6,000 tape StorageTek silo • Data throughput > 2TB per day • Batch farm of ~250cpus for data reduction and analysis • Interactive analysis as well
User Needs • Fast access to frequently used data from silo • Automatic staging of files for the batch farm (distinct disk pool for farm) • Tracking of disk cache use • Disk Use • File Use • Access patterns
Cache Disk Management • Read-only area with subset of silo data • Unified NFS-view of cache disks in /cache • User interaction with cache • jcache (request files) • jcache –g halla (request for specific group) • jcache –d (early deletion)
Cache disk policies • Disk Pools are divided into groups • Management policy set per group: • Cache – LRU files removed as needed • Stage – Reference counting • Explicit – manual addition and deletion
Architecture: Hardware • Linux 2.2.17 (now 2.2.18prexx?) • Dual 750MHz Pentium III • Asus Motherboards • Mylex RAID controllers • 11 x 73GB disks ≈ 800GB raid0 • Gigabit ethernet to Foundry BigIron switch • …about 3¢/MB
Architecture: Software • Java 1.3 • Cache manager on each node • MySQL database used by all servers • Protocol for file transfers (more shortly) • Writes to cache are never NFS • Reads from cache may be NFS
Protocol for file moving • Simple extensible protocol for file copies • Messages are java serialized object • Protocol is synchronous – all calls block • asynchrony by threading • Fall back to raw data transfer for speed– faster and more fair than NFS. • Session may make many connections
Protocol for file moving • Cache server extends the basic protocol • Add database hooks for cache • Add hooks for cache policies • Additional message type were added
Example: Get from cache using our Protocol (1) • cacheClient.getFile(“/foo”, “halla”); • send locate request to any server • receive locate reply cache1 Cache3 has /foo Where is /foo? cache2 cache3 Database Client (farm node) cache4
Example: Get from cache using our Protocol (2) • cacheClient.getFile(“/foo”, “halla”); • contact appropriate server • initiate direct xfer • Returns true on success cache1 Sending /foo cache2 Get /foo cache3 Database Client (farm node) cache4
Example: simple put to cache using our Protocol • putFile(“/quux”,”halla”,123456789); Cache4 has room cache1 Where can I put /quux? cache2 cache3 Database Client (data mover) cache4
Fault Tolerance • Dead machines do not stop the system • Only impact is on NFS clients • Exception handling for • Receive timeouts • Refused connections • Broken connections • Complete garbage on connections
Authorization and Authentication • Shared secret for each file transfer session • Session authorization by policy objects • Example: receive 5 files from user@bar • Plug-in authenticators • Establish shared secret between client and server • No cleartext passwords
Bulk Data Transfers • Model supports parallel transfers • Many files at once, but not bbftp style • For bulk data transfer over WANs • Web-based class loader– zero pain updates • Firewall issues • Client initiates all connections
Additional Information http://cc.jlab.org/scicomp Bryan.Hess@jlab.org Ian.Bird@jlab.org Andy.Kowalski@jlab.org