120 likes | 247 Views
User Board Overview. Dan Tovey University Of Sheffield. Tier-1 Planning. Quarterly UB meeting in April (see minutes) updated Tier-1 planning figures Shortfall of T1 resources in future years, (especially 2008) evident.
E N D
User Board Overview Dan Tovey University Of Sheffield
Tier-1 Planning • Quarterly UB meeting in April (see minutes) updated Tier-1 planning figures • Shortfall of T1 resources in future years, (especially 2008) evident. • Will need to consider if expt. requirements can be met by Tier-2 resources need to demonstrate clear need for Tier-1 functionality. • Requests which can be met by Tier-2 to be discussed with Tier-2 board. • ‘Other Experiments’ line removed from Tier-1 Schedule following detailed Tier-1 board plan all users must make representation to UB to get access to resources
Tier-1 Planning • Tier-1 utilisation figures frequently fall significantly short of both requests and allocations • sends the wrong message • Often not fault of experiments (e.g. middleware / operational problems) but experiments must work to produce more realistic estimates • Move to strict allocation of Disk resources (no over-allocation) helps Tier-1 team. • Also synchronise with spending cycle aim to ensure complete use of all new resources as soon as on-line
DB Links • Stronger links with Deployment Board are seen as vital standing invitation for DB representation at UB meetings.
UB Concerns • How are experiments that globally are not moving to the Grid to be handled? • Site stability & User support • Balance of effort at Tier-1: much used for CMS (SRM) and later LCG SC, but what about smaller user communities? • What about ‘non-standard’ OS at Tier-2 sites can render useless to some experiments. UB and Tier-2 board need to persuade to work towards standardisation.
Questionnaire • User Board questionnaire updated for latest OsC process. • No big changes from February • Some new comments/concerns: • fragmented support structure • All stick and no carrot • held up by problems with establishing the VO • Not all experiments supported by large Tier-2s • Further details at: • http://www.gridpp.ac.uk/eb/workdoc/gridusebyexpts_0605.doc
Pleasure: LHCb Shared data (LHCb RTTC production May/June) The data reported are preliminary (accuracy at 5%) 5% produced with plain DIRAC sites 95% produced with LCG sites
Pleasure: ATLAS • Using the Grid for 100% of Simulation, Digitisation and Reconstruction. • 8.5M fully simulated ATLAS events produced • 20% of LCG jobs in UK • Overall throughput good, and improving …
Pain: ATLAS • But … experience has been painful! • Significant throughput problems experienced in January/February • production goals descoped (15M events planned vs. 8.5M ev. actual). • Identified problems (highlights – see also questionnaire): • System appears to function best when only one person submitting jobs! • Lack of a distributed mechanism for prioritising jobs • Lack of inter-operability between LCG and other Grids: load balancing and data replication have to be done 'by hand'. Leads to production errors (e.g. same sample produced multiple times on different grids) • Too much human intervention required to set, adjust and enforce priorities • Could not saturate CPU resources on LCG easily (rate doubled with a simple change of scripts/person!): production time does not scale with cpu requirements • Job definition/submission very (expert) labour intensive • Absolute need for a SE/SRM solution for small files. • Urgent need for VOMS, integrated with other grid tools for resource allocation/access/monitoring/accounting
H1 Tests 30 Jobs failed: 22 due to Grid problems (gridproxy/misc.)