80 likes | 172 Views
Data Management after LS1. Brief overview of current DM. Replica catalog: LFC LFN -> list of SEs SEs are defined in the DIRAC Configuration System For each protocol : end-point, SAPath , [space token, WSUrl ] Currently only used: SRM and rfio File placement according to Computing Model
E N D
Brief overview of current DM • Replica catalog: LFC • LFN -> list of SEs • SEs are defined in the DIRAC Configuration System • For each protocol: end-point, SAPath, [space token, WSUrl] • Currently only used: SRM and rfio • File placement according to Computing Model • FTS transfers from original SE (asynchronous) • Disk replicas and archives completely split: • Only T0D1 and T1D0, no T1D1 SE any longer • Production jobs: • Input file download to WN (max 10 GB) using gsiftp • User jobs: • Protocol access from SE (on LAN) • Output upload: • From WN to (local) SE (gsiftp). Upload policy defined in the job • Job splitting and brokering: • According to LFC information • If file is unavailable, the job is rescheduled PhC
Caveats with current system • Inconsistencies between FC, SE catalog and actual storage • Some files are temporarily unavailable (server down) • Some files are lost (unrecoverable disk, tape) • Consequences: • Wrong brokering of jobs: cannot access files • Except for download policy if another replica is on disk/cache • SE overload • Busy, or not enough movers • As if files are unavailable • Jobs are rescheduled PhC
Future of replica catalog • We probably still need one • Job brokering: • Don’t want to transfer files all over the place (even with caches) • DM accounting: • Want to know what/how much data is where • But… • Should not need to be highly accurate as now • Allow files to be unavailable without the job failing • Considering the DIRAC File Catalog • Mostly replica location (as used in LFC) • Built-in space usage accounting per directory and SE PhC
Access and transfer protocols • Welcome gfal2 and FTS3! • Hopefully transparent protocol usage for transfers • However transfer requests should be expressed with compatible URLs • Access to T1D0 data • 99% for reconstruction or re-stripping, i.e. download • Read once, therefore still require a sizeable staging pool • Unnecessary to copy to T0D1 before copying to WN • xrootvs http/webdav • No strong feelings • What is important is unique URL, redirection and WAN access • However why not use (almost) standard protocols • CVMFS experience is very positive, why not http for data? • Of course better if all SEs provide the same protocol • http/webdav for EOS and Castor? • We are willing to look at the http ecosystem PhC
Other DM functionality • File staging from tape • Currently provided by SRM • Keep SRM for T1D0 handling • Limited usage for bringOnline • Not used for getting tURL • Space tokens • Can easily be replaced by different endpoints • Preferred to using namespace! • Storage usage • Also provided by SRM • Is there a replacement? PhC
Next steps • Re-implement DIRAC DM functionality with gfal2 • Exploit new features of FTS3 • Migrate to DIRAC File Catalog • In parallel with LFC • Investigate http/webdav for file location and access • First, use it for healing • Still brokering using a replica catalog • Usage for job brokering (replacing replica catalog)? • Scalability? PhC
What else? • Dynamic data caching • Not clear yet how to best use this without replicating everything everywhere • When do caches expire? • Job brokering? • Don’t want to hold jobs while a dataset is replicated • Data popularity • Information collection in place • Can it be used for automatic replication/deletion? • Or better as a hint for Data managers? • What is the metrics to be used? • What if 10 files out of a 100TB dataset are used for tests, but none is interested in the rest? • Fraction of dataset used or absolute number of accesses? • Very few analysis passes on full dataset • Many iterative usage of same subset PhC