400 likes | 565 Views
CHEP 2003 General Summary. Torre Wenaus, BNL/CERN CHEP 2003, UC San Diego, La Jolla March 28, 2003. I agree with all the other summaries. Thank you to the organizers, and have a safe journey home. Outline: The CHEP03 Zeitgeist. Themes and observations Rising trends Important developments
E N D
CHEP 2003 General Summary Torre Wenaus, BNL/CERN CHEP 2003, UC San Diego, La Jolla March 28, 2003
I agree with all the other summaries. Thank you to the organizers, and have a safe journey home
Outline: The CHEP03 Zeitgeist • Themes and observations • Rising trends • Important developments • Receding trends • Underrepresented • Open questions • Concerns • Major challenges • Conclusions • Thanks zeit·geist | Pronunciation: 'tsIt-"gIst, 'zIt | Function: noun | Etymology: German, from Zeit (time) + Geist (spirit) | Date: 1884 | Meaning: the general intellectual, moral, and cultural climate of an era Google zeitgeist: http://www.google.com/press/zeitgeist.html
Themes and observations • Lesson from the past: Make it simple (R. Brun) • No more complex than necessary • Users want consolidation, ease of use, and stability • Must consider also needs of the future; longer view of maintainability and evolution • In the interests of long term stability • OO and C++ is the accepted paradigm • No major OO/C++ migration or usage angst at this conference, it is done and accepted • Offline and online: “Triumph of C++ for HEP DAQ confirmed” – DAQ summary • Now we are hearing reports on Nth generation C++ software • L. Sexton Kennedy, CDF: Every component has been rewritten at least once. Implementations have now stabilized such that every new arrival doesn’t start by discarding and rewriting software • “Many more talks about redesign than about design” – Data management summary • And on the maturation and emergence of tools as broad standards, after years of development and refinement • e.g. Geant4, ROOT I/O
Themes and observations • The tyranny of Moore’s Law • Wolbers: it is not a substitute for more efficient & faster code, smaller data size • it works against thinking before doing • Optimize wherever possible • Addressing the digital divide in networking (H.Newman) • HEP is obligated as a community to work on this • A world problem in which our field can have visible impact • Farm challenges • Don’t underestimate farm installation and operations (R.Divia) • Big issues are power, cooling, space! (S.Wolbers) • Watts/$ steadily rising (R.Mount) • Tape-disk random access performance gap in analysis is receding as an issue, but disk-memory gap is hardly being addressed (R.Mount)
Rising trends • ROOT • For analysis, I/O, and much else • Now fully supported at CERN: EP/SFT section • Close interaction with experiments on new developments • Run II, RHIC, ALICE, LCG, BaBar, … • Foreign classes, PROOF, geometry, grid integration, … • Mentioned in 47+ talks at this conference • Open source databases (MySQL, Postgres, …) • Metadata, distributed computing, conditions, … • Empowering software: easy and potent • MySQL mentioned in 37 talks! Postgres in 8, Oracle in 27 • Online – offline continuum • Similar Linux farm environments, attainable time budgets • Same framework, maybe same algorithms, in HLT as in offline (V.Boisvert, ATLAS) • Stringent performance/robustness requirements on software
Rising trends • Common projects • Joint projects one of the CDF/D0 successes (Wolbers) • But hard to align running experiments with LHC • LHC Computing Grid project • Grid projects in general • Laudable but difficult; increasingly forced by the circumstances • Resource constraints and increasing scale and complexity makes go-it-alone N times too costly • cf. comments in online/DAQ context by G. Dubois-Feldmann today: somewhat less success in online where it is even harder than offline, but possible LHC inroads • Related is software reuse… • Respect what we know about long software development timescales
LCG must effectively re-use and leverage existing software, or fail This is the approach taken: cf. POOL, SEAL talks. Time will tell! cf. next CHEP LCG?
Rising trends • Modular component architectures • Many examples in offline; also in online/DAQ (XDAQ – CMS) [also in open source…] • Associated infrastructure: white boards, centrality of dictionary, plug-ins, … • XML • The no-brainer for small scale structured data storage and exchange. • [The more humane applications leave the XML generation to the computer and not the humans] • ASCII lovers [count me in] now have their standard • Many talks in many areas involving XML applications • Detector description, conditions info, configuration, monitoring, graphics, object models, data/object interchange, dictionary generation, not to mention layered apps (e.g. SOAP)… • 37 talks mention XML (same 37 as MySQL?) • But XML in itself does not define common format/schema, and much divergence and duplication exists in how XML is used • e.g. detector description • We heard (I.Foster) about an OGSA community clearing house, we have similar things ourselves (CLHEP, FreeHEP), maybe we need one for XML applications
Rising trends • Open source in general • “Open source, please. Your interests rarely in commercial vendor’s interests” (M.Purschke, PHENIX) • In the CDF/D0 success column, similarly all over • DBs, Qt, utility libraries, … and Linux, it goes without saying • Extraordinary capability and quality • Java, to a degree • Important limitations being addressed, e.g. manageable C++ interoperability (JACE autogeneration of interface) • JAS, NLC sw, IceCube, CDMS DAQ, … • But not broadly competing with C++ in usage so far • HENP as CS partner and collaborator • To our mutual benefit in the Grid and in networking
Rising trends • “New” simulation engines: Geant4, FLUKA • Geant4 as a production tool • In production in BaBar: EM validation in hand, hadronic beginning, robust and reasonably fast • ATLAS on the way to completing G4 transition after two years of physics validation • CMS, LHCb also transitioning over next year • GLAST using LHCb/Gaudi Geant4 interface • FLUKA not new – established and widely used – but new integration efforts as a detector simulation engine for the four LHC experiments • FLUGG interface to G4 geometry • ALICE Virtual Monte Carlo as uniform interface to multiple engines (FLUKA, Geant4, Geant3) • Interest from other experiments; joint LCG project starting • Used for Geant4 testing • FLUKA integration in progress
Rising trends • Automation in software development/management • Heard about several automated tools for code building and testing, release integration & tag management, configuration management • Popular new software web portal at CERN LCG/SPI • http://savannah.cern.ch • Automated textual and statistical analysis of test outputs
Rising trends – The Grid • The central importance of distributed computing to future (increasingly, present) HENP is long known • ‘The Grid’ as the means to that is now established • Major, broad successes in funding and in attracting collaboration with CS • F.Berman, Grid 2003: “HEP has set a model for integration, focus, coordination” • Progress in applying Grid software and infrastructure to real problems • Batch production • Clearly the chosen path; success to be proven, but has promise and broad commitment
The Grid • F.Berman, Grids on the horizon: • Must be useful, usable, stable; supported • More cooperative than competitive • [Not always the case today!] • Applications are key to success • Not a “Field of Dreams” “build it and they will come” R&D field any more • Grid killer app: a focus on data. Good match to us • Still a long way to go
The Grid • Miron Livny: • Benefit to science: democratization of computing • Still very manpower intensive: when the support team goes on holiday, so does the Grid (CMS testbed in Dec) • Best practice middleware requires • True collaboration, “open minds” (cf. Berman) • Testing, deployment/adoption, evaluation metrics, robustness, professional support, longevity, responsiveness to show stoppers, … • Much to do and improve but important progress • E.g. VDT as standard middleware suite
Important developments • Community consensus on a C++ object store: ROOT I/O • Though many approaches to its use • Combined with RDBMS for physics data storage • CDF, RHIC, LHC, BaBar, GLAST, … • Software engineering is catching up to us – F. Carminati • “High ceremony processes” are not an obvious success • And we are not alone… • “Agile methodologies”, Extreme Programming (XP), is SE’s response • Extremely close to a successful HENP working model • Adaptive, simple, incremental, tight iterations, plan for change, adjust the methodology for your environment • “I just learned we use XP”, comment from CDF • Means of responsibly formalizing and addressing – in a useful way – software engineering in HEP, and software management • Both must be effective and lightweight: Agile
Important developments • Major strides in networking • HENP a leading applications driver and a co-developer of global networks (H. Newman) • Require rapid global access to event samples and analyzed physics results drawn from massive data stores • PB by 2002, ~100 PB by 2007, ~1 Exabyte by ~2012 • Rate of Progress >> Moore’s Law • Factor of ~1M in 1985-2005 (~5k during 1995-2005) in global HENP network bandwidth • Factor of 25-100 Gain in max sustained throughput in 15 months on some US+TransAtlantic routes • Network providers see us as an opportunity because we push real production applications • Future promise: Optiputer (P.Papadopoulos) • “Key driving applications keep the IT research focused”
Important developments • The LHC Computing Grid Project • Major new internationally supported effort to build the distributed computing environment of the LHC • Encompasses • the distributed computing facility • Site fabrics (facilities), middleware selection, integration, testing, deployment at distributed sites, operations and support, … • the common physics applications software • Persistency, core libraries and services, physics analysis interfaces, simulation and other frameworks, all in a distributed environment • Must succeed if LHC computing is to succeed! • An impressive effort by the experiments together with CERN to work in accord across the cope of computing • Managed so as to ensure comprehensive oversight by the experiments • First testbed deployment is this summer (LCG-1) • Including the first major applications deployment, POOL persistency framework (ROOT I/O + MySQL hybrid)
Important developments • Success of mass stores • Castor “reliable and effective” (ALICE) • D0/CDF convergence on successful Enstore/SAM • HPSS successful at RHIC • Exciting new generation of specialized lattice gauge computers (B. Sugar) • Two tracks: • QCD on a chip: QCDOC, a “technical marvel”, project with IBM • $1M/Tflop, aiming at 10+ Tflop at BNL in 04 • Optimized commodity clusters • Pentium 4, Myrinet/Gbit Ethernet • 10+ Tflop at FNAL and JLAB by 06 • SCIDAC grant to improve software usability
Receding trends • Objectivity and ODBMS in general • “Jury still out” at CHEP 2000 (P.Sphicas), but now clear • Objectivity dropped or being phased out by LHC experiments, COMPASS, BaBar event store • In PHENIX “becoming a liability” (compiler issues); augmented with RDBMSs • Not due to technical failure but a mix of technical problems, commercial concerns, manpower costs, availability of an alternative • Its replacements are not other ODBMSes but files (often ROOT) + RDBMS (mySQL, Oracle, Postgres…) for metadata • Magnetic tape (apart from archival) • PASTA: “unlimited” multi-PB disk caches technically possible but true cost is unclear (reliability, manageability) • File system access under urgent investigation • “tapes as random access device no longer a viable option” – large disk caches needed for LHC analysis
Receding trends • Commercial software? No… • Some in decline (Objy, LHC++), but new prospects opening (IBM, Sun, MS, …) in Grid • Open source now has an important commercial element we derive great benefit from (even post-.com crash) • Red Hat, MySQL, Qt, …
Underrepresented • Collaborative tools • Was represented this week, but only lightly • Vital for distributed collaboration on software development and physics analysis • H. Newman: need culture of collaboration • Distributed and remote collaboration should be the norm • Not solely, or even predominantly?, a matter of tool development in the community • How is the exponential commercial side evolving and how can we leverage it • What is the evolutionary path, strategy, role for community-developed tools such as VRVS • Why is the user experience often poor • Poor physical facilities/configurations, instabilities, heterogeneous tools/protocols, support issues, … • Current experience sometimes competes unsuccessfully with the telephone, despite all the shortcomings
Open questions • Distributed analysis • What will it look like? What development line(s) are taking us there? Still very much R&D pursued in multiple directions • Several models (e.g. R.Brun) with varying degrees of Grid exploitation/distributed character • H.Newman – where is the comprehensive global system architecture? M.Livny – have to proceed incrementally, step by step, from the bottom up • Some efforts were reported which are incrementally extending established analysis tools into Grid-based analysis • PROOF, JAS • Others working from various starting points • Genius, Ganga, Clarens, …
Open questions • Distributed analysis continued • Production: environments more well-defined, tools more advanced, a few in production, varying levels of middleware usage • AliEn (ALICE), SAM-Grid (Run2), CMS tool suite, GRAT, Magda (ATLAS),DIRAC (LHCb) … • Not a lot of sharing/collaboration above the middleware level!! • Necessary precursor to the more complex analysis environment, and hard in itself • What analysis improvements will the Grid really provide? (panel discussion)
What analysis improvements will the Grid really provide? (panel discussion) • Some of the comments… (what I heard, not what was said) • Murphy’s Law needs to be beaten, not Moore’s Law (V.Innocente) • From a technical point of view, the realization of a successful grid will be a single integrated distributed computing center (R.Mount) • But beyond the technical, a successful grid will grow human resources, drawing in distributed people not otherwise involved, as well as material resources (M. Kasemann) • The grid is more than this. The LHC will build the first global collaboration, reaching out to uninvolved countries. This incurs on us an obligation. Through the grid we must make their participation possible and their resources useful. (H. Newman) • It is an unprecedented opportunity to screw up. But we have no choice, we cannot put it all in one place. Focus on reliability.
Grid panel 2 • The grid is something new. We can’t let a ‘one virtual computing center’ be the dominating thing. There should be no dominant force and we should avoid centralized decision making. This will help analysis. (L. Robertson) • Grids enable collaboration at a scale not attempted before. Distributed efforts are motivated to compete with one another and with the central site, and this brings benefits and resources. Analysis groups are teams, spread across continents and time zones. How do they collaborate? The grid should provide the solution. Also, provenance is largely overlooked, but it is key to analysis. (P. Avery) • We have no model for how 5000 users will use a globally distributed facility. System issues must be addressed now. (H. Newman) • Physicists should not see the grid at all. It should be transparent. (P. Mato)
Grid panel 3 • The grid will be successful if we make it simple. Will force some coherence in the development of distributed analysis tools. Too much process will kill the process. There is not enough prototyping going on. (R.Brun) • Agree, we need more prototyping. We need candidate strategies, then build prototypes, and see what works. You have to do this before you will be able to abstract from experience and automate & make transparent the approaches that work. (H. Newman) • Funding agencies, computer scientists, other sciences are excited by the HEP grid work, eg. on provenance. Possibility of infusion of funding. Could pursue google-like response to what now takes 3 months. (R. Mount) • The grid will enable collaborative work and harness distributed brainpower. It will allow HENP to be more present as a field at the home university. This is important for the health of our field. (H. Newman) • There is lots to learn from existing experiments. (R. Brun)
Open questions • Impact of facility security on Grid computing • Site security in the grid era – Dane Skow • Avoid complexity in designing security; it is the bane of secure systems • Must be agile in the face of change; resistant to attack • Risk management, not elimination; must accept some risk to carry on work • No clear answers in the bottom line, there is much yet to be resolved and understood, and many are working • Workable resolution is vital, since you don’t have a usable grid if the walls don’t have sockets
Open questions • Impact of OGSA migration (Globus) on middleware • Open Grid Services Architecture • Leveraging industry standard web services • Much industry involvement • IBM, Sun, NEC, Oracle, … • Attention given to backward compatibility • Promising approach; may the migration go well! • Alpha is under test; production release in June • Major dependency given Globus’ foundation role in our middleware • Current Globus2 will be supported for some time but we will be interested in new functionality
Open questions • Utility and practicality of generate-on-demand virtual data (‘virtual data by materialization’) • Networking going well; cost/complexity equation favors copying • Interesting talk (C.Jones) on successful implementation and use for many years in CLEO • Relies on user discipline to ensure regenerated data is trustworthy • Utility of data provenance management, needed for secure trust of on-demand data, is a separate question • Should have important utility, not only for virtual data (reproducibility, trust) but as a communication mechanism in widely distributed collaborations • Cannot allow reliance on hallway conversations with production gurus
Concerns • Data analysis as “the last wheel of the car” (R. Brun) • Clear message from current generation (e.g. Run 2, BaBar): don’t leave data analysis systems and infrastructure too late, it will lead to problems • Vastly more true when we are talking about doing globally distributed analysis, for the first time • with unprecedented volume and complexity, e.g. Terabyte scale at the LHC • Making dist analysis both very difficult and mandatory • We cannot bootstrap ourselves into a global analysis system, it will take long incremental work, so we better be working in a coordinated & effective way now • R. Brun: Will not converge on one system; will be multiple competing systems, and that will not be bad [hopefully a small number]
Concerns • Are we doing enough to ensure senior people can contribute directly to physics analysis? • How do we interpret the fact (R. Brun) that PAW usage is still rising? • Has everyone bought the C++/OO paradigm shift? • Are we developing and/or providing the right tools? • Is there enough engagement of senior physicists in the (limited) exploratory work being done on future physics analysis environments? • Almost certainly no, and may be difficult to attract their attention unless/until attractive prototypes can be turned loose on them
Major Challenges • Storage architecture “possibly biggest challenge for LHC” (PASTA) • Seamless integration from CPU caches to deep archive • Currently very poor data management tools for storage systems • More architectural work needed in next 2 years
MBytes/s Future ALICE Data Challenges • New technologies • CPUs • Servers • Network R. Divia
Conclusions (1) • Coming experiments must learn from prior generations: give early (ie for LHC, immediate) attention to data analysis • It will take generations of incremental iterations of design, prototyping and stressful deployment to get it right • Particularly in the unprecedented global collaborative environment of the LHC • C++ is a mature and accepted standard • Several generations of C++ code in production experiments (BaBar, Run 2, …) • Maturation of tools into broad usage (Geant4, ROOT I/O) • No sign of a major new language migration so far [thank goodness] • But beware excessive complexity and remember the promise of accessible, usable software
Conclusions (2) • Grids and networking are making great strides • HENP is a successful and valued partner with CS • We provide a community focused on challenging large-scale deployments in real research settings • But Murphy’s Law is a potent adversary today; far from robust transparency, and much much more to do • Global collaborative computing must become a successful norm for us • Down to the global researcher at the home institute • Rich leadership potential for our field • Important new common endeavours like the Grid and LCG have much invested in their success… will be interesting to measure the degree of success at next CHEP
Thanks • Thanks to Jim Branson and his team of organizers for giving us • A stimulating program and comfortable schedule • More-than-pleasant facilities and surroundings • Terrific banquet, I hear! • A very successful conference. • I for one will return to La Jolla any time
I agree with all the other summaries. Thank you to the organizers, and have a safe journey home