320 likes | 331 Views
This document provides an overview of the timeline of GridPP3, including reviews, proposal writing and defence, referee comments, PPRP feedback, and scenario planning. It also discusses the advantages of the GridPP model, priorities for investment and potential descope options. The document is presented in English.
E N D
Status of GridPP3 David Britton 1/Nov/06
Timeline – 1 6th September – 1st PPRP review 16th June – GridPP16 at QMUL 13th July – Bid Submitted 1st November – GridPP17 31st March – PPARC Call CB OC CB 8th November PPRP “visiting panel” Proposal Writing Proposal Defence Apr May Jun Jul Aug Sep Oct GridPP3
Referee Comments The proposal was reviewed by 2 referees prior to the 1st review on Sep 6th; and by 5 (!) additional referees to date. GridPP has responded to 120 referee comments (albeit some of them were identical and some of the answers were as trivial as “we agree”). Won’t go through these … other than to quote a favourite: "Universities are beginning to realise that the power and collingare a significant cost that they have not budgeted for and is a liability they were not expecting". We’ve reassured the referee that the cost of Dave Colling was fully anticipated and that he is in no way a liability. GridPP3
PPRP Feedback PPRP feedback arrived in mid September in the form of 10 questions. GridPP has prepared a 22-page written response, with particular consultation with the three large LHC experiments on four of the questions. This draft has been presented/approved to/by the PMB and CB. GridPP3
PPRP Question-1 1. The Panel would like to further understand the advantages of the proposed overarching GridPP model for operations (as opposed to development) as against each experiment making its own arrangements. • GridPP has previously documented the “value added” by having a coordinated • project and this is summarised as part of the written response. Briefly: • The GridPP Identity. • Enabling the LCG Project. • Leading contributions to Grid Middleware. • The Tier Center structures. • The Deployment Team. • The UK Particle Physics Grid. • In addition, statements have been received (and presented in full to the PPRP) from • the three large LHC experiment which: • All support the concept. • See no viable alternative. GridPP3
PPRP Question-2 • 2. The Panel would like to explore the priorities and potential options for descope. • If funding were only available to support, 30%, 50% or 70% of the total request what would be the priority areas for investment in terms of obtaining the best UK science return? • What would be the political and experimental impacts of funding at a much lower level? • How would you prioritise the work packages? This is a “standard” PPRP question, asked of other proposals. GridPP3
Input to Scenario Planning –Resource Requirements Changes in the LHC schedule have prompted another round of resource planning - presented to CRRB Oct 24th. New UK resource requirements have been derived and incorporated in the scenario planning. GridPP3
Input to Scenario Planning –Hardware Costing Hardware prices have been re-examined following recent Tier-1 purchase (CPU was much cheaper than expected). Our “best empirical estimate” may be a bit aggressive but we have also increase the declared contingency on hardware spend from 15% to 25% over the lifetime of the project. GridPP3
Input to Scenario Planning - ATLAS The priority of the ATLAS-UK collaboration to ensure the best science return is the hardware and its operation. Within this, ATLAS notes that UK Tier-2 resources contribute directly to the UK output, whereas shortages in Tier-1 resources affect all ATLAS physicists globally. For Tier-1 resources, ATLAS regard the 70% scenario as barely manageable, and 50% would do serious damage to the analysis capacity for the large UK physics community and it would also threaten the calibration and commissioning of the SCT. To reduce the Tier-2 hardware, cuts would have to be made in simulation, calibration, and then analysis capability but even the first of these will degrade physics output. Tier-2 cannot be cut below the 70% scenario. ATLAS has derived the UK fraction of the global requirements by noting that UK authorship is 12.5% of the Global ATLAS Tier-1 authorship and that there are 4 out of 30 (13.3%) of ATLAS Tier-2s are in the UK. GridPP3
Input to Scenario Planning - CMS The priority of the CMS-UK collaboration is access to Tier-2 resources in the UK and access to Tier-1 resources preferably in the UK. CMS argue that, given the savings in hardware due to changes in the cost estimates and the change in the LHC schedule, the 70% scenario could be achieved with only a small reduction in the level of hardware compared to the request. This would be at the threshold for CMS to host a UK Tier-1. In the 50% scenario, the priority for CMS would be to protect their Tier-2 resources which would have to be hosted by a Tier-1 external to the UK. The revised CMS UK hardware request is based on a more detailed algorithm than a simple fraction of the global requirements. The scale is set by dual requirements of (a) a minimum size for a CMS Tier-1 of 50% of average CMS Tier-1 (~7% of global requirements) and (b) the UK fraction of Tier-1 authors (same bases at ATLAS) of ~8%. The details are calculated from the dual requirements to accept 4 out of CMS’s 50 data-streams (8%) and the need for the Tier-1 to serve an entire AOD dataset. GridPP3
Input to Scenario Planning - LHCb The LHCb collaboration has a somewhat different computing model from ATLAS and CMS with most analysis performed at the Tier-1 and the Tier-2 used predominantly for Monte Carlo simulation. LHCb prioritizes Tier-1 hardware and its operation, followed by Tier-2 hardware and its operation and finally support etc. As with the other experiments, the revised hardware requests from UK LHCb are based on the new global requirements presented to the CRRB on Oct 24th 2006. The UK fraction is calculated from the UK authorship fraction of 18.6% (revised from 16.6% at the time of the GridPP3 submission). The Tier-2 resource request also includes 18.6% of the global LHCb Tier-2 resource shortfall of 30% to give a total of about 24% of the global Tier-2 requirements. It is noted that any fall below the global authorship fraction of 18.6% at either the Tier-1 or Tier-2 would have to be negotiated in a global context. GridPP3
70% Scenario An example 70% scenario based on Experiment Inputs GridPP3
50% Scenario An example 50% scenario based on Experiment Inputs. GridPP3
30% Scenario GridPP has examined the original PPARC call and has determined that it is unable to form a proposal that meets any of the criteria listed with funding at the 30% level: 2. a) Underpin the particle physics programme by delivering the functional Tier 1 centre for the LHC experiments and for the other experiments where UK groups will require computing GRID access and facilities. The 50% scenario presented above already fails to meet this criterion because the Tier-1 would be sub-threshold for at least one of the LHC experiments. At the 30% funding level there could only be a Tier-1 for (probably) one LHC experiment. Most likely, in a 30% scenario there would be no Tier-1 and the resources would be used as a Tier-2 (though it is not clear what to do about LHCb). Etc….. (see document) GridPP3
Q2 Summary GridPP has taken input from the 3 large LHC experiments as guidance in an attempt to design a GridPP3 project in 70% and 50% funding scenarios. The outcome is a 74% funding scenario that preserves 85% of the hardware (just about the threshold for a UK CMS Tier-1) and a 55% scenario that basically doesn’t work (it does not respect the criteria of the call and there are large political and financial unknowns associated with delivering less than a pro-rata share of LHC hardware). Given the latter, we did not attempt to address the 30% scenario in a detailed manner. We do not regard the fine details of these scenarios as fixed but they are proposed as the starting points to address reduced funding outcomes. GridPP3
PPRP Question-3 3. The UK would like to play a key role in this important project but the current financial constraints necessitate focusing on the crucial areas and what needs to be done. The Panel would like to identify these areas, giving consideration to the current LHC timescale, and to understand the implications of delaying parts of the project, especially with regard to hardware (e.g. same CPU performance with fewer, fast processors). Identifying crucial areas is covered the Scenario Planning presented in response to Question-2 and by each of the responses to GridPP from the three large LHC experiments. The new LHC timescale has been included in the new resource requirements prepared by the LHC experiments and presented to the CRRB on October 24th 2006. These new global requirements have been used to derive new UK requirements, as described in the response to Question-2 and in the experiment documents. The resource requirements are effectively shifted which, combined with reduced hardware cost estimates used by GridPP, have resulted in about a 10% saving on the project cost. This is embedded in the 70% and 50% scenario plans. GridPP3
PPRP Question-4 • The Panel wishes to understand better the apparent disparity between the estimated Tier-1 needs of CMS and ATLAS. It seems that ATLAS requires roughly twice the CPU and disk resource, but less tape than CMS. Given the similar computing models between the two experiments, relatively small differences in the parameters chosen seem to have significant implications on the assessment of need and hence cost.(Differences now much smaller) • How has GridPP interacted with the experiments to ensure that the most cost effective solution has been arrived at? (Rely on the LHCC/CRRB) • The Panel wishes to understand the levels of requests for tier-1 facilities by the different experiments relative to the UK contribution to the each experiment. (Noted earlier) GridPP3
PPRP Question-5 5. The Panel would like the applicants to justify the rationale behind the proposed regional Tier-2 structure in GridPP3 and to set out the pros and cons of other possible structures, for example, experiment based or rationalised structure with fewer Tier-2 sites, or fewer institutes. The Panel would like the applicants to consider possible cost savings and improvements in efficiency and service delivery that different structures might produce. Need to discuss The Past, The Present, and The Future. GridPP3
PPRP Question-5 History of the Tier-2 Structure The current Tier-2s were formed naturally in response to local and regional funding opportunities and other geo-political considerations. Many assumed (used as leverage) a continuing relationship with the Particle Physics community. It is natural that all Particle Physics groups wished to be associated to a T2, but this was not a GridPP requirement. However, clearly it was uniformly perceived as beneficial for the local physicists and the institute. In GridPP1 there was no PPARC funding for Tier-2s and in GridPP2 there was PPARC funding for some manpower at Tier-2s (plus some specialised servers) but not for the bulk of the computing resources. Nevertheless large amounts of resources were made available. GridPP has interacted with four Tier-2 centers through their management boards. The overhead of having more than one site within the Tier-2 is, to first order, an internal choice (the JeS submission requirement for the GridPP3 proposal broke this model). GridPP3
PPRP Question-5 • Current Status of Tier-2 Structure: • There are currently 17 Institutes organised into 4 Distributed Tier-2s. Of the 17 • Institutes, 4 have no GridPP manpower, 8 have less than one FTE and 5 have one or • more FTEs of GridPP manpower. The total of 9 FTE funded by GridPP for hardware • support (plus 5.5 FTE specialist posts) is clearly is a very cost effective situation • given the 3703 KSI2K of CPU and 263 TB of disk available (06Q1 numbers). For • comparison, the Tier-1 made available 830 KSI2K and 180 TB in the same period. • Performance measures are being developed (within GridPP and wLCG). The UK is • probably ahead of the game here. There are more details in the written response • but the UK Tier-2 performance is: • good relative to other counties; • improving even though the hurdles are getting higher; • on track to meet the MOU requirements. GridPP3
PPRP Question-5 • Future of Tier-2 Structure: • GridPP proposes to continue to develop 4 Regional Tier-2 centers. • GridPP would like to remain neutral on the number of sites and institutions within • each Tier-2, and simply offer a packaged of hardware money and effort to each • Tier-2 in return for the delivery of a specified quantity of resource and a specified • service level. We believe this approach: • Allows a market-driven optimisation of resources according to constraints which are outside the control and knowledge of GridPP (e.g. Other sources of funding; Institutional priorities and strategies; prior commitments and aspirations.) • Builds upon a system that is both viewed and measured as successful. • Is in the best interests of Physicists at all Institutes; allowing some small measure of local control whilst enabling Grid access to vast resources; and providing on-site expertise in as many places as possible. GridPP3
PPRP Question-5 • Future of Tier-2 Structure: • Alternate structures have been considered: • Fewer Tier-2s – foresee no advantage in having the same number of institutes associated with fewer Tier-2s. Clear disadvantages. • Fewer Institutes – Hardware and manpower costs remain the same; running and infrastructure costs likely to become more visible. Some gains in the efficiency of staff effort by concentration of resources (though this means less levered effort, not less GridPP effort; service level may be easier to achieve). May alienate some institutes; will result in less leverage of resources; will leave some institutes without local expertise. Conclude: It will cost more; deliver less resources; service level might be better but physicists less supported. Not the optimisation we chose. • Experiment-based Tier-2s – runs against the grain and would leave the UK at odds with the rest of the wLCG; not a sensible Grid structure and would limit peak resources available to individual Experiments. Would most likely lead to a divergence from standards and a fragmented UK Grid. GridPP3
PPRP Question-6 6. The Panel would like to explore the impact to the UK of leadership roles within LCG. What are the benefits and costs to the UK of this, particularly with regard to middleware? The Big Picture: Roles and duties for the LCG project must be shared between the members. This allows the common project to benefit from all the available skills and expertise; it provides a contribution in kind that should broadly reflect the size of the contributing group; it demonstrates the engagement of all partners; and in return, it enables strategic influence and other benefits. Appendix-D of the proposal listed 86 external roles of members of GridPP within related projects, 17 of which are specifically LCG related, 22 are within EGEE, and a further 8 associated with computing within the LHC Experiment collaborations. Specific Examples: a) David Kelsey: Coordinator of LCG Grid Security, Chair of Joint (LCG/EGEE/OSG) Security Policy Group and Deputy Director of EGEE Security. GridPP3
PPRP Question-6 Specific Examples (continued): b) Jeremy Coles: Secretary of LCG Grid Deployment Board. c) John Gordon – UK Representative on LCG Management Board and a Deputy Chair. d) Neil Geddes – UK member of the LCG Oversight Board (OB) and LCG Collaboration Board Chair. e) EGEE: Project Executive Board: Frank Harris; Dave Kelsey, and previously Pete Clarke. Project Management Board Chair: Robin Middleton (to summer 06). Project Collaboration Board: Dave Colling; John Gordon; Jeff Tseng; Tony Doyle; and Roger Barlow. f) EGEE JRA1 (Middleware re-engineering) Cluster Leader (UK): Steve Fisher. GridPP3
PPRP Question-6 Related Examples : i) Nick Brook (formally GridPP UB Chair and PMB member) is the LHCb computing coordinator. ii) Roger Jones (currently GridPP Applications Coordinator and PMB member) is the chair of the ATLAS International Computing Board. iii) Dave Newbold (formally GridPP UB chair and PMB member) is the chair of the CMS Computing Committee. Conclude: That as a consequence of investment and hard work over the last five years, the current overall influence of the UK in computing for the LCG is high in all areas. This ultimately benefits UK physicists and has been a good investment. GridPP3
PPRP Question-7 7. Before making a recommendation to the office about the extension to GridPP2 the Panel would like more information about each of the posts and to know whether they are core activities. What are the implications of not funding these posts and what evidence is there that a delay in resolving this will lead to a loss of staff who might be expected to continue into GridPP3? Detailed information on the areas covered by the GridPP extension was provided in the GridPP3 proposal. Specific information on each individual post was provided on the Institutional JeS forms submitted to PPARC. The latter has been extracted and collated for PPARC. All these posts are considered core to the current programme during the 7-month period of GridPP2+ when it will be necessary in the build-up of the Production Grid prior to LHC data-taking. If not funded: We will lose our entire pool of highly skilled staff; the UK will not be ready for LHC data; much of the current work will be abandoned in and large amounts of resources will have been wasted. Evidence: 25% turnover of staff since proposal submission, c.f. ~10% p.a. previously. GridPP3
PPRP Question-8 8. The Panel would like to see a full justification for each of the posts requested in GridPP3 and to see the cost to PPARC (including estates and indirect costs) of each post. A separate document has been provided for PPARC staff including full details extracted from the Institutional JeS submissions. This incorporates a compilation of the Institute submissions organised by work package, giving the justification and costs for each post that should be read in conjunction with the proposal and relevant appendices. GridPP3
PPRP Question-9 9. The Panel would like to explore the issues of quality assurance in both Tier-1 and Tier-2 activities. How will the applicants ensure that GridPP3 provides an adequate and cost-effective service to its users? The service levels at the Tier-1 and Tier-2 are defined by the International Memorandum of Understanding. The Tier-1/A Management Board, including PPARC representation, advises all stakeholders on whether the Tier-1/A Service at RAL is delivering its objectives on time and making appropriate use of its available resources. The main instrument for assuring quality and levels of service at the Tier-2s will be a new Memorandum of Understanding between GridPP and the institutes as described in the Tier-2 Appendix to the GridPP3 Proposal. This would set out the required levels of services in order for the UK to meet its WLCG MoU commitments and provide the necessary service to UK physicists. (continued…) GridPP3
PPRP Question-9 Quality Assurance is performed by monitoring the performance of the Tier-1 and Tier-2 compared to MOU commitments, and the performance compared to international partners. As previously described, monitoring is already advanced and being developed further. We currently monitor: - CPU and storage usage; - Site functional test; - Configuration tests; - Ticket response times; - Upgrade timescales; - Schedule downtime; - VO support; - Transfer tests. GridPP3
PPRP Question-10 • 10. The Panel would like information on where the Tier-1 centre will be housed at RAL • Is any construction or refurbishment of an appropriate building on the critical path • for the GridPP project? • - Will the centre have sufficient space available to meet GridPP's requirements? • - What are the risks associated with this? • - How will this be funded? Atlas Centre at RAL has sufficient capacity to house the full GridPP3 requirements for 2008 LHC running as given in the proposal. CCLRC has approved construction of a new computer building at RAL budgeted at approx £17M and will be funded by the CCLRC Capital Investment Plan. Completion is due in summer of 2008 in time for the autumn delivery which will meet the 2009 data-taking requirements. This has sufficient space for capacity to grow to 2012 when the number of racks is expected to have reached a steady state. GridPP3
PPRP Question-10 • The main risks are: • Late completion. There is some slack in the schedules to meet the data taking requirements for April 2009 which mitigates this risk. • b) Power and cooling required to deliver the required resources exceed the estimates. This is mitigated by inclusion of chilled water mains in the new building to allow direct water cooling of the hottest racks if power densities exceed current estimates. • c) Electricity charges for power and cooling which are currently met by CCLRC overheads charges. It is possible that at some future time these may be attributed directly to GridPP. This is explicitly listed as a potential call on contingency in the GridPP3 proposal. GridPP3
Timeline – 2 6th Dec – PPRP recommend to SC 8th Nov –PPRP Visiting Panel Science Committee PPARC Council Grants etc. Nov Dec Jan Feb Mar Apr May GridPP3