1 / 17

Status of the WLCG Tier-2 Centres

Status of the WLCG Tier-2 Centres. M.C. Vetterli Simon Fraser University and TRIUMF WLCG Overview Board, CERN , October 27 th 2008. Sources of Information. Discussions with experiment representatives in July

suarezs
Download Presentation

Status of the WLCG Tier-2 Centres

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Status of the WLCG Tier-2 Centres M.C. VetterliSimon Fraser University and TRIUMF WLCG Overview Board,CERN, October 27th 2008

  2. Sources of Information • Discussions with experiment representatives in July • APEL monitoring portalhttp://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php • WLCG reliability reportshttp://lcg.web.cern.ch/LCG/accounts.htm • October GDB mtg; dedicated to Tier-2 issueshttp://indico.cern.ch/conferenceDisplay.py?confId=20234 • Talks from the last OB & LHCCSlides labeled with a * are from MV’s LHCC rapporteur talk

  3. Tier-2 Performance Summary* • Overall, the Tier-2s are contributing much more now • Significant fractions of the Monte Carlo simulations are being done in the T2s for all experiments • Reliability is better, but still needs to improve • CCRC’08 exercise is generally considered a success for the Tier2s

  4. Tier-2 Centres in CCRC’08 – General* • Overall, the Tier-2s and the experiments considered the CCRC’08 exercise to be a success • The networking/data transfers were tested extensively; some FTS tuning was needed, but it worked out • Experiments tended to continue other activities in parallel which is a good test of the system, although the load was not as high as anticipated • While CMS did include significant user analysis activities, the chaotic use of the Grid by a large number of inexperienced people is still to be tested

  5. Tier-2 Issues/Concerns As of CB and meetings with experiments this summer • Communications: Do Tier-2s have a voice? Is there a good mechanism for disseminating information? • Better monitoring: Pledges vs actual vs used • Hardware acquisitions:What should be bought? kSI2006? • Tier-2 capacity:Size of datasets? Effect of LHC delay? • …

  6. Tier-2 Issues/Concerns • Upcoming onslaught of users: Some user analysis tests have been done but scaling is a concern • User Support: Ticketing system exists but it is not really used for user support issues. This affects Tier-2s especially. • Federated Tier-2s: Tools to federate? Monitoring? (averaging) • Interoperabilityof EGEE, OSG, and NDGF should be improved • Software/Middleware updates: Could be smoother; too frequent

  7. Communications for Tier-2s • Identified by the T2s at the last CB as a serious problem.Interesting to me that many in experiment computing management did not share this concern. • Should communication be organized according to experiment or to Tier-1 association? There are also differing opinions on this. There are two issues: Grid middleware/operations Experiment software • My view after studying this is that the situation is OK for “tightly coupled” Tier-2s, but not for remote and smaller Tier-2s that are not well coupled to a Tier-1.

  8. Communications for Tier-2s • Many lines of communication do indeed exist. • Some examples are:CMS hastwo Tier-2 coordinators: Ken Bloom (Nebraska) Giuseppe Bagliesi (INFN)- attend all operations meetings - feed T2 issues back to the operations group - write T2-relevant minutes - organize T2 workshops  ALICE has designated 1 Core Offline person in 3 to have privileged contact with a given T2 site manager- weekly coordination meetings - Tier-2 federations provide a single contact person - A Tier-2 coordinates with its regional Tier-1

  9. Communications for Tier-2s ATLAS uses its cloud structure for communications- Every Tier-2 is coupled to a Tier-1 - 5 national clouds; others have foreign members (e.g. “Germany” includes Krakow, Prague, Switzerland; Netherlands includes Russia, Israel, Turkey) - Each cloud has a Tier-2 coordinatorRegional organizations, such as:+ France Tier-2/3 technical group:- coordinates with Tier-1 and with experiments - monthly meetings - coordinates procurement and site management+ GRIF:Tier-2 federation of 5 labs around Paris+ Canada:Weekly teleconferences of technical personnel (T1 & T2) to share information and prepare for upgrades, large production, etc.+ Many others exist; e.g. in the US and the UK

  10. Communications for Tier-2s • Tier-2 Overview Board reps: Michel Jouvin and Atul Gurtu have just been appointed to the OB to give the Tier-2s a voice there. • Tier-2 mailing list:Actually exists and is being reviewed for completeness & accuracy • Tier-2 GDB:The October GDB was dedicated to Tier-2 issues+ reports from experiments: role of the T2s; communications + talks on regional organizations + discussion of accounting + technical talks on storage, batch systems, middleware Seems to have been a success; repeat a couple of times per year?

  11. Tier-2 Installed Resources • But how much of this is a problem of under-use rather than under-contribution? a task force has been set up to extract installed capacities from the Glue schema • Monthly APEL reports still undergo significant modifications from first draft. Good because communication with T2s better Bad because APEL accounting still has problemsAccounting seems to be very finicky; breaks when the CE or MON box is upgraded • How are jobs distributed to the Tier-2s?

  12. Tier-2 Hardware Questions • How does the LHC delay affect the requirements and pledges for 2009?+ We are told to go ahead and buy what was planned but we have already seen some under-use of CPU capacity and we have seen this for storage as well

  13. Tier-2 Hardware Questions • How does the LHC delay affect the requirements and pledges for 2009?+ We are told to go ahead and buy what was planned but we have already seen some under-use of CPU and we are now starting to see this for storage as well • We need to use something other than SpecInt2000!+ this benchmark is totally out-of-date & useless for new CPUs + continued delays in SpecHEP can cause sub-optimal decisions

  14. Tier-2 Hardware Questions • Networking to the nodes is now an issue.+ with 8 cores per node, 1 GigE connection ≈ 16.8 MB/sec/core + Tier-2 analysis jobs run on reduced data sets and can do rather simple operations have seen 7.5 MB/sec at ATLAS and much more (x10?) + Do we need to go to Infiniband? + We certainly need increased capability for the uplinks; we should have a minimum of fully non-blocking GigE the worker nodes.  We need more guidance from the experiments The next round of purchases is now!

  15. Summary • The role of the Tier-2 centres has increased markedly in the last year >50% of Monte Carlo simulation is done in the T2s now. • The CCRC’08 exercise is considered a success by the Tier2s and by the experiments. • Availability and reliability are up, but still need improvement. • Resource acquisition vs pledges is better but still needs work • Issues for Tier2s: - communication should be (& is being) improved - work should ramp up on chaotic user analysis - reporting actual resources should be established - improved user support is needed

More Related