210 likes | 329 Views
Ian Bird WLCG Workshop, Barcelona, 9 th July 2014. Topics for the future. Introduction. Successfully negotiated Run 1 – with great success Computing has become a recognised key tool Significant experience and lessons gathered Many ideas for evolution for Run 2
E N D
Ian Bird WLCG Workshop, Barcelona, 9th July 2014 Topics for the future Ian Bird; Barcelona
Introduction Ian Bird; Barcelona • Successfully negotiated Run 1 – with great success • Computing has become a recognised key tool • Significant experience and lessons gathered • Many ideas for evolution for Run 2 • Documented in the Computing Model Update document • Roles of Tiers, use of networks, federated data, clouds, opportunistic resources, etc • Huge efforts in the experiments to improve efficiency – huge gains in performance and memory use • Consolidation of operations/support activities • Still much effort needed to live within likely resources (hardware and personnel) • Now also time to look further forward
Physics LHC roadmap: schedule beyond LS1 Shutdown Beam commissioning Technical stop LS2 starting in 2018 (July) => 18 months + 3 months BC LS3 LHC: starting in 2023 => 30 months + 3 months BC Injectors: in 2024=>13 months + 3 months BC (Extended) Year End Technical Stop: (E)YETS 30 fb-1 YETS EYETS YETS YETS Run 2 LS 2 Run 3 LS 2 Run 2 Run 3 PHASE 1 YETS LS 3 Run 4 YETS LS 3 Run 4 300 fb-1 PHASE 2 LS 4 Run 5 LS 5 LS 4 Run 5 LS 5 3’000 fb-1
Scale of challenge … Data: ~25 PB/yr 400 PB/yr 10 Year Horizon 2010 2015 2018 2023 Compute: Growth > x50 What we think is affordable unless we do something differently Ian Bird; Cloud Expo 2014
What do we need to do? Ian Bird; Barcelona In the face of a huge increase in resource needs: • Reduce costs • Flat budgets very optimistic • Must optimise the use of what we (will) have • And easily make use of opportunistic and other resources • Reduce cost of operation • Needs to be self sustaining, minimal staffing • Effort is limited and do not have large EC etc projects • Move with evolving technology • Clouds, IPV6/networks, processor architectures • Long term planning for HL-LHC timescale • Probably needs significant changes in models
Optimisation Ian Bird; Barcelona • Software improvements – to make better use of new CPU • But needs to be optimised vs memory, storage, I/O performance, network etc • We need a global optimisation • Needs some quantitative results for further planning • Some work started • Dirk Duellmann – TEG, and subsequent efforts • Some useful investigations CERN-Wigner • Probably need to strengthen this work and start to look at some key metrics • Must be able to justify in concrete terms how we improve overall performance / € • What are the metrics – useful and meaningful for us and understandable to FA’s etc
Summary of the workshop held at CERN on April 3-4 2014 HEP Software Collaboration Foundation
HEP SW: Goals Ian Bird; Barcelona • Goals of the initiative are to: • better meet the rapidly growing needs for simulation, reconstruction and analysis of current and future HEP experiments, • further promote the maintenance and development of common software projects and components for use in current and future HEP experiments, • enable the emergence of new projects that aim to adapt to new technologies, improve the performance, provide innovative capabilities or reduce the maintenance effort • enable potential new collaborators to become involved • identify priorities and roadmaps • promote collaboration with other scientific and software domains.
Status today Ian Bird; Barcelona • 12 White papers received (inc. original) • http://hep-software-foundation.web.cern.ch/documents/white-papers-contributed-discussion-hep-software-foundation • A lot of commonality – broad agreement on goals, the need, and lightweight structure • Authors of these have been contacted to meet to try and reach a consensus on goals, structure, and how to set up a foundation • 1st meeting next week; • aim for 2nd workshop with a solid proposal later this year
Clouds … ??? Ian Bird; Barcelona
Clouds all the way? Ian Bird; Barcelona • Many sites are deploying cloud stacks • Experiments have used many cloud instances • WLCG sites; HLT farms; Helix Nebula; opportunistic uses; Amazon, Google, etc. • What is our long term strategy here? • Some observations: • Use of commercial and other resources is likely to be via some sort of cloud-like software • Same for opportunistic use of academic or public e-infrastructures • Grid software requires support from our community • no one else will do it • Not obvious we have the effort available now • Many cloud stacks have significant support/development/user communities behind them
Options Ian Bird; Barcelona • Do nothing in particular • Continue to run grid services on top of these cloud installations • Move to fully embrace cloud sw as the primary means to submit jobs to our sites • Focus on pilot factories as the site entry and not worry (as a community) about how the site is managed
But: Ian Bird; Barcelona • Can’t avoid some level of fairly complex scheduling • We are not elastic – we are always resource-constrained • Should clarify our strategy wrt • Cloud interfaces, pilot factories, CE’s
Reducing operational costs Ian Bird; Barcelona
Possible actions Ian Bird; Barcelona • Reduce amount of middleware that we need to support ourselves • Moving to clouds may help – but in short term maybe more issues to address • BUT: there is middleware we require, that has missing support • Continue to simplify site management and services that are needed • See Wahid’s comment about storage • ATLAS comment about small sites • Use of BOINC-like service to better use sites and opportunistic resources
Can the EC help? Ian Bird; Barcelona • CERN has been discussing with the EC, in the context of its Helix Nebula activities, about support for the procurement of IT products and services by the public sector. • It is our understanding that the EC intends to publish details of a funding call, ICT-8, in November 2014 with a deadline for submission of April 2015. • It is our belief that the ICT-8 call could offer an opportunity for EU-T0 to gain support from the EC for the procurement of state-of-the-art hardware. • The proposal should focus on preparing and executing a cross-national procurement process linked to the foreseen upgrades of the EU-T0 sites across Europe. • It would imply the sites agreeing to a common procurement technical specification and committing to procure for a stated amount. • The EC would then provide an additional 20% to supplement the procurement engagement. • An important innovation that EU-T0 could bring would be to consider specifications that are compatible with the open compute project (http://www.opencompute.org/) and show how this cross-national procurement model could be a model for ESFRI research infrastructure in different disciplines.
External funding Ian Bird; Barcelona • In order to benefit it is clear that HEP-only proposals will be unlikely to succeed • In Europe and USA • We need to engage with initiatives like EU-T0, OSG, EGI and via WLCG members with other sciences • We need to show that our experience can be useful to others • We need to ensure that our developments have an eye to being broadly useful outside of LHC • But: we need to be careful that our effort is used in directions that help us • Cannot afford a scatter-gun approach
WLCG scope Ian Bird; Barcelona • Several discussions on how “WLCG” could support other HEP and similar experiments • e.g. requests from Belle2, ILC, others • WLCG organisation is scoped for LHC • But the infrastructure should be common • Has been no clear answer to this • Perhaps EU-T0 is a mechanism for this in Europe • But that omits the global reach of WLCG and HEP • ???
Summary Ian Bird; Barcelona • Lots of preparation for Run 2 • Run 2 is largely an evolution of Run 1 • Need to start thinking about the longer term • Set up/reinvigorate working groups on specific areas • Brainstorming about the 10-year future • Resources will be highly constrained in the next years • Cannot afford duplicate efforts • Have to collaborate on common tools as much as possible • We have the structure to collaborate – must take advantage of it • Will also need to strongly prioritise where we should spend the available effort