1 / 16

Disk Farms at Jefferson Lab

Disk Farms at Jefferson Lab . Bryan Hess Bryan.Hess@jlab.org. Background. Data stored in 6,000 tape StorageTek silo Data throughput > 2TB per day Batch farm of ~250cpus for data reduction and analysis Interactive analysis as well. User Needs. Fast access to frequently used data from silo

berne
Download Presentation

Disk Farms at Jefferson Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Disk Farms at Jefferson Lab Bryan Hess Bryan.Hess@jlab.org

  2. Background • Data stored in 6,000 tape StorageTek silo • Data throughput > 2TB per day • Batch farm of ~250cpus for data reduction and analysis • Interactive analysis as well

  3. User Needs • Fast access to frequently used data from silo • Automatic staging of files for the batch farm (distinct disk pool for farm) • Tracking of disk cache use • Disk Use • File Use • Access patterns

  4. Cache Disk Management • Read-only area with subset of silo data • Unified NFS-view of cache disks in /cache • User interaction with cache • jcache (request files) • jcache –g halla (request for specific group) • jcache –d (early deletion)

  5. Cache disk policies • Disk Pools are divided into groups • Management policy set per group: • Cache – LRU files removed as needed • Stage – Reference counting • Explicit – manual addition and deletion

  6. Architecture: Hardware • Linux 2.2.17 (now 2.2.18prexx?) • Dual 750MHz Pentium III • Asus Motherboards • Mylex RAID controllers • 11 x 73GB disks ≈ 800GB raid0 • Gigabit ethernet to Foundry BigIron switch • …about 3¢/MB

  7. Architecture: Software • Java 1.3 • Cache manager on each node • MySQL database used by all servers • Protocol for file transfers (more shortly) • Writes to cache are never NFS • Reads from cache may be NFS

  8. Protocol for file moving • Simple extensible protocol for file copies • Messages are java serialized object • Protocol is synchronous – all calls block • asynchrony by threading • Fall back to raw data transfer for speed– faster and more fair than NFS. • Session may make many connections

  9. Protocol for file moving • Cache server extends the basic protocol • Add database hooks for cache • Add hooks for cache policies • Additional message type were added

  10. Example: Get from cache using our Protocol (1) • cacheClient.getFile(“/foo”, “halla”); • send locate request to any server • receive locate reply cache1 Cache3 has /foo Where is /foo? cache2 cache3 Database Client (farm node) cache4

  11. Example: Get from cache using our Protocol (2) • cacheClient.getFile(“/foo”, “halla”); • contact appropriate server • initiate direct xfer • Returns true on success cache1 Sending /foo cache2 Get /foo cache3 Database Client (farm node) cache4

  12. Example: simple put to cache using our Protocol • putFile(“/quux”,”halla”,123456789); Cache4 has room cache1 Where can I put /quux? cache2 cache3 Database Client (data mover) cache4

  13. Fault Tolerance • Dead machines do not stop the system • Only impact is on NFS clients • Exception handling for • Receive timeouts • Refused connections • Broken connections • Complete garbage on connections

  14. Authorization and Authentication • Shared secret for each file transfer session • Session authorization by policy objects • Example: receive 5 files from user@bar • Plug-in authenticators • Establish shared secret between client and server • No cleartext passwords

  15. Bulk Data Transfers • Model supports parallel transfers • Many files at once, but not bbftp style • For bulk data transfer over WANs • Web-based class loader– zero pain updates • Firewall issues • Client initiates all connections

  16. Additional Information http://cc.jlab.org/scicomp Bryan.Hess@jlab.org Ian.Bird@jlab.org Andy.Kowalski@jlab.org

More Related