60 likes | 177 Views
Summary of the last GridKA Cloud Meeting (07 July 2010). Marc Goulette (University of Geneva). Cloud Status (Guenter Duckeck). * New ATLAS contact Andreas Petzold at GridKa since 1st July * Operations running smooth in June, some problems :
E N D
Summary of the last GridKA Cloud Meeting (07 July 2010) Marc Goulette (University of Geneva) Swiss WLCG Operations Meeting
Cloud Status (Guenter Duckeck) * New ATLAS contact Andreas Petzold at GridKa since 1st July * Operations running smooth in June, some problems: - GridKa tape reading tests failed, tape library broken - Freiburg extended downtime due to cooling problems - DESY-HH: ATLASSCRATCHDISK size/overload - DESY-ZN: observed ATLAS jobs with excessive memory usage * Amsterdam Jamboree: WLCG meeting on evolution of data and storage element - Trend to more dynamic data distribution (caching) rather than static placement - Several demonstrator projects in the next months - Might change/increase network usage * TAB and HGF-Grid PB meetings: - Discussed network situation in DE cloud - Started first analysis of ATLAS data transfer patterns using log information provided by sites + GridKa dominates + DE T2-T2 traffic low (<10%) + Interpretation difficult as GridKa numbers also include FTS 3rd-party transfers betweeen sites (but this is expected to be small contribution) + Some variations between sites (DESY has relatively large non-DE & CERN transfer fraction) - Discussion of network situation in DE cloud (see http://indico.cern.ch/getFile.py/access?contribId=3&resId=0&materialId=0&confId=100512 for details). Mixed situation wrt. network connectivity in Germany. - J. Schultes provided a script to parse dCache billing logs Swiss WLCG Operations Meeting
Cloud Status (Guenter Duckeck) * ATLAS GridKa F2F operations meeting on June 24 - Agenda and minutes: http://indico.cern.ch/conferenceDisplay.py?confId=98902 - Extensive and productive discussion of operation areas, monitoring, testing, documentation - Template for operations wiki (to be filled): https://twiki.cern.ch/twiki/bin/view/Sandbox/GridKaSquadPage - We should extend our cloud monitoring page http://happyface-goegrid.gwdg.de/cloudmon/CloudMon.html + Job info (e.g. running, queued, CPU/Wallt for prod and user), + Storage info (e.g. space token usage, IO rates, movers) + Will discuss if/how sites could provide this information * ATLAS DE cloud computing meeting on July 19/20 - Main focus on user analysis experience and support - Plan to have 2 hrs T1/T2 operations meeting before - Preliminary agenda: https://indico.desy.de/conferenceOtherViews.py?view=standard&confId=3161 Swiss WLCG Operations Meeting
TIER1 OPERATIONS (Gen Kawamura, Andreas Petzold): ------------------------------------------------- * dCache milestone file space is ready (ongoing this week, 2/3 ready, 1/3 yet to come) * FTS updated to latest release including OS upgrade * CREAM CE: cream-3-fzk available (CREAM 1.6 / SL5) cream-2-fzk had been drained and was updated * OPS tests switched off, nagios probes used instead * Upgrade of VOBOX * LFC: new 1.7.5 on SL5 will be installed (no date yet) * dCache access statistics: - dcap access to all space tokens becoming more important - Most accessed files: COND, DBRELEASE, group.phys-top.D2PD (on SCRATCHDISK), DATADISK (see T1 report pdf file) * Tape problems of last month: All problems fixed, tape library back online but still not as reliable as required PRODUCTION OPERATIONS: ---------------------- * In general no problems to report, almost no production in June, "missing pilots"-problems under investigation Swiss WLCG Operations Meeting
DATA MANAGEMENT (Cedric Serfon): -------------------------------- * Smooth operation in June * Overall transfer efficiency in the last 30 days: 96% (95% last month) * Volume transfers a bit lower (~1.7M files [June], ~2.5M [May], 136MB/s [June], 300MB/s [May]) * 2 file losses - Wuppertal (~19000 files) due to problem with disk controller - LRZ (~9000 files) backplane burned * It was recently proposed not to export MC (DATA) to T2s that do not have at least 50TB for ATLASMCDISK (ATLASDATADISK) - Until now only a proposition, will probably discussed in software week - Current situation: 3 sites in DE cloud to cross this threshold for at least one of their space tokens: Cyfronet, MPPMU, Innsbruck + CYFRONET: Will get new hardware this year and will be able to increase tokens to 50TB + MPPMU expect new hardware in September, Increased one token to 50TB + Innsbruck will add new hardware * LOCALGROUPDISK usage: - 22 user over 1TB (17 in May), Total space used: 173TB (110TB in May, 89TB in April) - Could run into problems soon with LOCALGROUPDISK filling up - Quota system still under development (probably not available before end of summer) * Discussion of provenance of files at Tier2 sites (obtained from Dashboard/site services): - Most of transfer volumes from GridKa - Exception for CSCS where more that 1/2 of files coming from CERN (caused by group user doing production at CERN Swiss WLCG Operations Meeting
TIER2 REPORT (Jan Erik Sundermann): ----------------------------------- * Discussion on space token usage (see pdf file) - CYFRONET MCDISK close to be full * Accounting (see pdf file) - Almost no production in June - See increased user activity (mainly via PANDA pilots) SOFTWARE INSTALLATION (Joerg Meyer), see pdf file: -------------------------------------------------- * Most sites have the latest releases installed. Some smaller problems under investigation Swiss WLCG Operations Meeting