280 likes | 353 Views
Technical Evolution in LHC Computing (with an ATLAS slant). Torre Wenaus Brookhaven National Laboratory July 18, 2012 GRID ’ 2012 Conference JINR, Dubna. Fabiola Gianotti , ATLAS Spokesperson Higgs search seminar July 4 2012, CERN. This Talk. LHC computing – an outstanding success!
E N D
Technical Evolution in LHC Computing(with an ATLAS slant) Torre Wenaus Brookhaven National Laboratory July 18, 2012 GRID’2012 Conference JINR, Dubna
FabiolaGianotti, ATLAS Spokesperson Higgs search seminar July 4 2012, CERN
This Talk • LHC computing – an outstanding success! • Experiments, Grids, Facilities, Middleware providers, CERN IT collaborated closely and delivered • Where do we go from here? Drawing on what we’ve learned, and the technical evolution around us, what tools and technologies will best enable continued success? • A hot topic as we approach a long shutdown 2013-14 • Spring 2011: ATLAS established R&D programs across key technical areas • Fall 2011: WLCG established technical evolution groups (TEGs) across similar areas • This talk draws on both in surveying LHC computing technical evolution activities and plans • Subjectively, not comprehensively
Computing Model Evolution in ATLAS Originally: Static, strict hierarchy Multi-hop data flows Lesser demands on Tier 2 networking Virtue of simplicity Today: Flatter, more fluid, mesh-like Sites contribute according to capability Greater flexibility and efficiency More fully utilize available resources Principal enabler for the evolution: the network Excellent bandwidth, robustness, reliability, affordability
Requirements 2013+In Light of Experience and Practical Constraints • Scaling up: processing and storage growth with higher luminosity, energy, trigger rate, analysis complexity, … • With ever increasing premium on efficiency and full utilization as resources get tighter and tighter • Greater resiliency against site problems, especially in storage/data access, the most common failure mode • Effectively exploit new computer architectures, dynamic networking technologies, emergent technologies for large scale processing (NoSQL, clouds, whatever’s next…) • Maximal commonality: common tools, avoid duplicative efforts, consolidate where possible. Efficiency of cost and manpower • All developments must respect continuity of operations; life goes on even in shutdowns
WLCG Technical Evolution Groups (TEGs) • TEGs established Fall 2011 in the areas of databases, data management and storage, operations, security, and workload management • Mandate: survey present practice, lessons learned, future needs, technology evolution landscape, and recommend a 2-5 year technical evolution strategy • Consistent with continuous smooth operations • Premium on commonality • Reports last winter, follow-on activities recently defined • Here follows a non-comprehensive survey across the technical areas of • Principal TEG findings (relevant to technology evolution) and other technical developments/trends • Some ATLAS examples of technical evolution in action
Databases • Great success with CORAL generic C++ interface to back end DBs (Oracle, Frontier, MySQL, …). Will ensure support, together with the overlaid COOL conditions DB • Conditions databases evolved early away from direct reliance on RDBMS. Highly scalable thanks to Frontier web service interfacing Oracle to distributed clients served by proxy caches via a RESTful http protocol • Establishing Frontier/Squid as a full WLCG service • Distributed apps (NoSQL) • Oracle as the big iron DB will not change; use ancillary services like Frontier, NoSQL to offload and protect it • NoSQL: R&D and prototyping has led to convergence on Hadoop as a NoSQL ‘standard’ at least for the moment • CERN supporting a small instance now in pre-production • ATLAS is consolidating NoSQL apps around Hadoop
Hadoop in ATLAS • Distributed data processing framework: filesystem (HDFS), processing framework (MapReduce), distributed NoSQL DB (Hbase), SQL front end (Hive), parallel data-flow language (Pig), … • Hadoop in ATLAS data management system: • Aggregation and mining of log files – HDFS (3.5TB) and MapReduce (70min processing at 70 MB/s I/O) • Usage trace mining – HBase (300 insertions/sec, 80 GB/mo, MapReduced in 2 min) • Migrated from Cassandra in two days • File sharing service – copied to HDFS and served via Apache • Wildcard file metadata search – HDFS and MapReduce • Accounting – Pig data pipeline MapReduces 7GB to 500MB of output summaries for Apache publication in 8min • Stable, reliable, fast, easy to work with, robust against hardware failure • Complements Oracle by reducing load and expanding mining capability • Future potential: storage system, parallel access to distributed data
Data Management and Storage Evolution • Federated storage based on xrootd for easy to use, homogeneous, single-namespace data access • NFS 4.1 an option in principle but not near ready • Support for HTTP storage endpoints and data access protocol • Wide area direct data access to simplify data management and support simpler, more efficient workflow • Caching data dynamically and intelligently for efficient storage usage, minimal latencies, simplified storage managment • Point to point protocols needed: GridFTP, xrootd, HTTP, (S3?) • Without SRM overheads • Adapt FTS managed transfer service to new requirements: endpoint based configuration, xrootd and HTTP support (FTS3) • On their way out: LFC file catalog, SRM storage interface (except for archive)
Data Placement with Federation Diagram from the storage & data management TEGs
Why xrootd as Basis for Storage Federation? • Xrootd is mature, proven for over a decade, tuned to HEP requirements • Stable, responsive support • Commonality (Alice, ATLAS, CMS) • Very efficient at file discovery • Transparent use of redundant replicas to overcome file unavailability • Can be used to repair placed data • Works seamlessly with ROOT data formats used for event data • Close ROOT/xrootd collaboration • Supports efficient WAN data access • Interfaces to hierarchical caches, stores • Uniform interface to back end storage • dCache, DPM, GPFS, HDFS, xrootd… • Good monitoring tools Alice: xrootd in production for years CMS: federation in pre-production ATLAS: federation emerging from R&D ATLAS xrootd federation currently Now expanding to EU
WAN Data Access and Caching • I/O optimization work in ROOT and experiments in recent years makes WAN data access efficient and viable • Caching at source and destination further improve efficiency • Source: migrate frequently used data to a fast (e.g. SSD) cache • Destination: addressable shared cache reduces latency for jobs sharing/reusing data and reduces load on source. Requires cache awareness in brokerage to drive re-use • Asynchronous pre-fetch at the client further improves; WAN latencies don’t impede processing. Now supported by ROOT • Major simplification to data and workflow management; no more choreography in moving data to processing or vice versa • Demands ongoing I/O performance analysis and optimization, knowledge of access patterns (e.g. fraction of file read), I/O monitoring • Collaborative work between ROOT team and experiments • Will benefit from integrating network monitoring into workflow brokerage
Operations & Tools • Strong desire for “Computing as a Service” (CaaS) especially at smaller sites • Easy to install, easy to use, standards-compliant packaged services managed and sustained with very low manpower • Graceful reactions to fluctuations in load and user behavior • Straightforward and rapid problem diagnosis and remedy • “Cloud is the ultimate CaaS” • Being acted on in ATLAS: cloud-based Tier 3s currently being prototyped • More commonality among tools • Availability monitoring, site monitoring, network monitoring • Carefully evaluate how much middleware has a long term future • Adopt CVMFS for use as a shared software area at all WLCG sites • In production for ATLAS and LHCb, in deployment for CMS
CERNVM File System (CVMFS) Caching HTTP file system optimized for software delivery Efficient: compression over the wire, duplicate detection Scalable: works hand in hand with proxy caches ATLAS deployment Aim is 100% by end 2012 • Originally developed as lightweight distributed file system for CERNVM virtual machines • Keep the VM footprint small and provision software transparently via HTTP and FUSE, caching exactly what you need • Adopted well beyond CERNVM community and very successful for general software (and other file data) distribution • ATLAS, CMS, LHCb, Geant4, LHC@Home 2.0, … • Currently 75M objects, 5TB in CERN repositories
Security • No longer any security through obscurity • And lack of obscurity makes security incidents all the worse (press and publicity) • Need fine-grained traceability to quickly contain and investigate incidents, prevent recurrence • Bring logging up to standard across all services • Address in the future through private clouds? VM is isolated and is itself fully traceable • Incorporate identity federations (next slide) • Centralized identity management also enables easy banning
Identity Management Identity management today: • Complex for users, admins, VOs, developers • ID management is fragmented and generally non-local • Grid identity not easily used in other contexts Traditional access to grid services Federated ID management: • Common trust/policy framework • Single sign-on • ID managed in one place • ID provider manages attributes • Service provider consumes attributes • Support services besides grid: collab tools, wikis, mail lists, … Federated access to grid services R. Wartel WLCG Pilot project getting started
Workload Management • Pilots, pilots everywhere! • Approach spread from ALICE and LHCb to all experiments • Experiment-level workload management systems successful • No long term need for WMS as middleware component • But is WMS commonality still possible then? Yes… • DIRAC (LHCb) used by other experiments, as is AliEn (ALICE) • PanDA (ATLAS) has new (DOE) support to generalize as an exascale computing community WMS • ATLAS, CMS, CERN IT collaborating on common analysis framework drawing on PanDA, glideinWMS (pilot layer) • Pilot submission and management works well on foundation of Condor, glideinWMS (which is itself a layer over Condor) • Can the CE be simplified? Under active discussion
ATLAS/CMS Common Analysis Framework Proposed Architecture Workflow engine Status: CMS able to submit PanDA jobs to the ATLAS PanDA server
Workload Management 2 • Multi-core and whole node support • Support is pretty much there for simple usage modes – and simplicity is the aim – in particular whole node scheduling • Extended environmental info (eg. HS06, job lifetime) • Environment variables providing homogeneous information access across batch system flavors (could have done with this years ago) • Virtualization and cloud computing • Virtual CE: better support for “any” batch system. Essential. • Virtualization clearly a strong interest for sites, both for service hosting and WNs • Cloud computing interest/activity levels vary among experiments • ATLAS is well advanced in integrating and testing cloud computing in real analysis and production workflows, thanks to robust R&D program since spring 2010
Cloud Computing in ATLAS Concurrent jobs at Canadian sites • Several use cases prototyped or in production • PanDA based data processing and workload management • Centrally managed queues in the cloud • Elastically expand resources transparently to users • Institute managed Tier 3 analysis clusters • Hosted locally or (more efficiently) at shared facility, T1 or T2 • Personal analysis queues • User managed, low complexity (almost transparent), transient • Data storage • Transient caching to accelerate cloud processing • Object storage and archiving in the cloud PanDA production in the cloud Personal pilot factory Amazon analysis cloud status ActiveMQ Monitoring Personal PanDA Amazon analysis site prototype EC2 analysis cloud
Networking • Network has been key to the evolution to date, and will be key also in the future • Dynamic intelligent networks, comprehensive monitoring, experiment workflow and dataflow systems integrated with network monitoring and controls • Network-aware brokerage • Recent WLCG agreement to establish a working group on better integrating networking as an integral component of LHC computing • LHCONE: progressively coming online to provide entry points for T1, T2, T3 sites into a dedicated LHC network fabric • Complementing LHCOPN as the network that serves T0-T1 traffic
LHCONE and Dynamic Networks LHCONE initiative established 2010 to (first and foremost) better support and integrate T2 networking given the strong role of T2s in the evolving LHC computing models • Managed, segregated traffic for guaranteed performance • Traffic engineering, flow management to fully utilize resources • Leverage advanced networking • LHCONE services being constructed: • Multipoint, virtual network (logical traffic separation) • Static & dynamic point to point circuits for guaranteed bandwidth, high-throughput data movement • Comprehensive end-to-end monitoring and diagnostics across all sites, paths • Software Defined Networking (OpenFlow) is probable technology for LHCONE in long term; under investigation as R&D. Converge on 2 year timescale (end of shutdown) • In the near term, establish a point-to-point dynamic circuits pilot
Conclusions Focus for the future: • Leverage the network • Monitoring, diagnostics • Robustness, reliability • Simplicity, ease of use • Support, sustainability • Pilots, Condor • Federation • Oracle + NoSQL • Caching, WAN access • HTTP, RESTful APIs • Virtualization • Commonality • Open standards • Leverage beyond HEP • Left out of the focus areas: scalability • Of course it is a principal one, but the success of LHC computing to date exemplifies that we have a handle on scalability • Thanks to success to date, the future will be evolutionary, not revolutionary • For the future, focus on • Technical tools and approaches most key to having achieved scalability and manageability • New tools and technologies with the greatest promise to do the same in the future • So ensure sustainability in the face of ever more challenging requirements and shrinking manpower • On the foundation of powerful networking that we are still learning to exploit to its fullest • And on the foundation of the Grid, which has served us well and is evolving with us Farewells to: • LFC, WMS, SRM (mostly)
Thanks • This talk drew on presentations, discussions, comments, input from many • Thanks to all, including those I’ve missed • Artur Barczyk, Doug Benjamin, Ian Bird, Jakob Blomer, Simone Campana, Dirk Duellmann, Andrej Filipcic, Rob Gardner, Alexei Klimentov, Mario Lassnig, Fernando Megino, Dan Van Der Ster, Romain Wartel, …
Related Talks at GRID’12 • I. Bird (Monday) • A. Klimentov, Distributed computing challenges, HEP computing beyond the grid (Tuesday) • A. Vaniachine, Advancements in Big Data Processing (Tuesday) • O. Smirnova, Implementation of common technologies in grid middleware (Tuesday) • A. Tsaregorodtsev, DIRAC middleware for distributed computing systems (Tuesday) • K. De, Status and evolution of ATLAS workload management system PanDA (Tuesday) • ATLAS computing workshop (Tuesday) • D. Duellmann, Storage strategy and cloud storage evaluations at CERN (Wednesday)
More Information • WLCG Technical Evolution Groups https://twiki.cern.ch/twiki/bin/view/LCG/WebHome • Frontier talks at CHEP 2012 • Comparison to NoSQL, D. Dykstra http://goo.gl/n0fcp • CMS experience, D. Dykstra http://goo.gl/dYovr • ATLAS experience, A. Dewhurst http://goo.gl/OIXsH • NoSQL & Hadoop in ATLAS DDM, M. Lassnig, CHEP 2012 http://goo.gl/0ab4I • Xrootd http://xrootd.slac.stanford.edu/ • Next generation WLCG File Transfer Service (FTS), Z. Molnar, CHEP 2012 http://goo.gl/mZWtp • Evolution of ATLAS PanDA System, T. Maeno, CHEP 2012 http://goo.gl/lgDfU • AliEn http://alien2.cern.ch/ • DIRAC http://diracgrid.org/ • CVMFS http://cernvm.cern.ch/portal/filesystem • Status and Future Perspectives of CVMFS, J. Blomer, CHEP 2012 http://goo.gl/82csr • LHCONE http://lhcone.web.cern.ch/
Frontier/Squid for Scalable Distributed DB Access • LHC experiments keep time-dependent data on detector conditions, alignment etc in Oracle at CERN • ATLAS alone runs 120,000+ jobs concurrently around the world. Many of these need conditions data access • How to provide scalable access to centrally Oracle-resident data? • The answer, first developed by CDF/CMS at FNAL, and adopted for LHC: Frontier • Frontier is a web service that translates DB queries into HTTP • Far fewer round trips between client and DB than using Oracle directly; fast over WAN • Because it is HTTP based, caching web proxies (typically squid) can provide hierarchical, highly scalable cache based data access • Reuse of data among jobs is high, so caching is effective at off-loading Oracle CMS Frontier deployment