100 likes | 111 Views
This discussion covers various points related to the deployment, support, validation, monitoring, optimization, maintenance, and information systems for LFC and DPM. Topics include local catalogs, Tier 2 support, validation and monitoring tools, optimization strategies, maintenance considerations, and more.
E N D
DPM and LFC:Discussion Points from a Site Perspective Graeme StewartGridPP, University of Glasgow
Outline of Discussion Points • Deployment • Supporting Tier 2s • Validation and Monitoring • Does it work? • Optimisation • Does it work well? • Maintenance and Utilities • Will it stay working? • Information Systems • How do I tell you what I’m offering?
Deployment : LFC Local Catalogs • Deployed though YAIM installer at most (Tier2) sites • Configuration very easy: LFC_HOST=my-lfc.$DOMAIN LFC_DB_PASSWORD=XXXXXXX LFC_CENTRAL=“” LFC_LOCAL=“” • Installing by hand easy – twiki instructions • Add your LFC to BDII_REGIONS on the CE to publish your catalog
Deployment: DPM • YAIM install for Tier2s. Functionality improved in 2.7.0 – now handles DPM disk servers with multiple filesystems. DPMDATA="/gridstore0“ DPMMGR=dpmmgr DPMUSER_PWD=XXXX DPMFSIZE=200M DPM_HOST=“dpm-admin.$MY_DOMAIN” DPMPOOL=myPool DPMPOOL_NODES=“svr1.$MY_DOMAIN:/gridstore0 \ svr1.$MY_DOMAIN:/gridstore1 svr2.$MY_DOMAIN:/bigdisk” • Conversion of Classic SE possible – excellent migration route for smaller sites. • Improvements: • Simplify YAIM (remove DPMDATA), rename DPMPOOL_NODES to DPM_FILESYSTEMS. • Service is now straightforward to setup.
Deployment: Tier 2 Support • Deployment of SRMs is not trivial – especially if things go wrong! (daemon headcount: Classic SE 1, DPM 6) • Good support relationship between Tier2s and Tier1 essential • Tier1 can have experts to help Tier2s • Tier2s also build up a support community • UKI Example • Storage group at RAL • Mailing list and weekly phone conferences • Wiki to collect documentation
Validation • Local LFC • Not yet monitored in any SFT • Nameserver functionality can be tested • Sites need a tool to check registrations lcg-cr --catalog=MY_LFC ? • DPM • Monitored through SFTs, but indirectly • Sites can conduct tests “by hand” (e.g., http://wiki.gridpp.ac.uk/wiki/DPM_Testing) • Often difficult to tell where an ST rm test actually fails • Multiple SEs?
Optimisation • Problem of deploying DPM mostly solved. • But can Tier2s provide the performance the experiments want through DPM. • Do we know what this is? srm_write_rate(KSI2K, VO) • Wrong choices at deployment more costly to recover from once service is established: • Running DPM head node too hard can cause FTS failures • Better to isolate SRM daemons from disk servers
Maintenance • LFC and DPM both hold metadata in their databases – different, e.g., from the Classic SE • Databases must be backed up. • Frequency/Recovery time? • What the experiments'’ expectations of T2 storage? • Are we saying anything about the T2s hardware yet? • For both services ensuring certificates are valid is still a headache
DPM: Care and Feeding As DPM rolls out to T2s we move from deployment to service maintenance. What tools do sites need from the DPM developers? • Filesystem draining utilities • Allows reconfiguration of DPM. Removal of servers for maintenance, etc. • Per-VO Quotas • Reserved pools probably not flexible enough • Database Tools • Removal of files from dead filesystem • Load Balancing • Better filesystem selection algorithms to express hardware differences know to the sites
Tier2 SEs and SRM:Glue sticks it together • Currently Tier2s publish 1 SA per VO. • GlueSAType attribute refers only to SRM storage model (permanent, durable, volatile). • GlueSEArchitecture hints at underlying hardware (disk, multidisk, tape) but is too vague for experiments. And it may vary for each SA! • Should we publish SEs as abstract Glue services? Might help with debugging? • Should we add additional fields describing storage’s “durability” (low, medium, high, archival?). Will need to be per SA – no field in Glue1.2. (ref: GDB presentations from Jeff and Laurence). • Can T2s usefully advertise volatile space to the experiments?