1 / 8

Data Management after LS1

Data Management after LS1. Brief overview of current DM. Replica catalog: LFC LFN -> list of SEs SEs are defined in the DIRAC Configuration System For each protocol : end-point, SAPath , [space token, WSUrl ] Currently only used: SRM and rfio File placement according to Computing Model

mignon
Download Presentation

Data Management after LS1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Managementafter LS1

  2. Brief overview of current DM • Replica catalog: LFC • LFN -> list of SEs • SEs are defined in the DIRAC Configuration System • For each protocol: end-point, SAPath, [space token, WSUrl] • Currently only used: SRM and rfio • File placement according to Computing Model • FTS transfers from original SE (asynchronous) • Disk replicas and archives completely split: • Only T0D1 and T1D0, no T1D1 SE any longer • Production jobs: • Input file download to WN (max 10 GB) using gsiftp • User jobs: • Protocol access from SE (on LAN) • Output upload: • From WN to (local) SE (gsiftp). Upload policy defined in the job • Job splitting and brokering: • According to LFC information • If file is unavailable, the job is rescheduled PhC

  3. Caveats with current system • Inconsistencies between FC, SE catalog and actual storage • Some files are temporarily unavailable (server down) • Some files are lost (unrecoverable disk, tape) • Consequences: • Wrong brokering of jobs: cannot access files • Except for download policy if another replica is on disk/cache • SE overload • Busy, or not enough movers • As if files are unavailable • Jobs are rescheduled PhC

  4. Future of replica catalog • We probably still need one • Job brokering: • Don’t want to transfer files all over the place (even with caches) • DM accounting: • Want to know what/how much data is where • But… • Should not need to be highly accurate as now • Allow files to be unavailable without the job failing • Considering the DIRAC File Catalog • Mostly replica location (as used in LFC) • Built-in space usage accounting per directory and SE PhC

  5. Access and transfer protocols • Welcome gfal2 and FTS3! • Hopefully transparent protocol usage for transfers • However transfer requests should be expressed with compatible URLs • Access to T1D0 data • 99% for reconstruction or re-stripping, i.e. download • Read once, therefore still require a sizeable staging pool • Unnecessary to copy to T0D1 before copying to WN • xrootvs http/webdav • No strong feelings • What is important is unique URL, redirection and WAN access • However why not use (almost) standard protocols • CVMFS experience is very positive, why not http for data? • Of course better if all SEs provide the same protocol • http/webdav for EOS and Castor? • We are willing to look at the http ecosystem PhC

  6. Other DM functionality • File staging from tape • Currently provided by SRM • Keep SRM for T1D0 handling • Limited usage for bringOnline • Not used for getting tURL • Space tokens • Can easily be replaced by different endpoints • Preferred to using namespace! • Storage usage • Also provided by SRM • Is there a replacement? PhC

  7. Next steps • Re-implement DIRAC DM functionality with gfal2 • Exploit new features of FTS3 • Migrate to DIRAC File Catalog • In parallel with LFC • Investigate http/webdav for file location and access • First, use it for healing • Still brokering using a replica catalog • Usage for job brokering (replacing replica catalog)? • Scalability? PhC

  8. What else? • Dynamic data caching • Not clear yet how to best use this without replicating everything everywhere • When do caches expire? • Job brokering? • Don’t want to hold jobs while a dataset is replicated • Data popularity • Information collection in place • Can it be used for automatic replication/deletion? • Or better as a hint for Data managers? • What is the metrics to be used? • What if 10 files out of a 100TB dataset are used for tests, but none is interested in the rest? • Fraction of dataset used or absolute number of accesses? • Very few analysis passes on full dataset • Many iterative usage of same subset PhC

More Related