1 / 29

Athena & the Grid Architectural View

Athena & the Grid Architectural View. Craig E. Tull HCG/NERSC/LBNL ATLAS/LHCb/GridPP Workshop Cosener's House - May 23, 2002. What this talk is:. What this talk is not: Another presentation of GRAPPA. See Rob's talk of yesterday. What this talk is:

nedra
Download Presentation

Athena & the Grid Architectural View

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Athena & the GridArchitectural View Craig E. Tull HCG/NERSC/LBNL ATLAS/LHCb/GridPP Workshop Cosener's House - May 23, 2002

  2. What this talk is: • What this talk is not: • Another presentation of GRAPPA. • See Rob's talk of yesterday. • What this talk is: • An ATLAS perspective on the view of the Grid from the Athena/Gaudi Framework. • A seat of the pants distillation of some impressions from this workshop's presentations. • Food for thought and discussions in this afternoon's session. • … and slightly Random.

  3. Converter Converter Application Manager Converter Transient Event Store Data Files Message Service Persistency Service Event Data Service JobOptions Service Algorithm Algorithm Algorithm Data Files Transient Detector Store Particle Prop. Service Persistency Service Detec. Data Service Other Services Data Files Transient Histogram Store Persistency Service Histogram Service Athena/GAUDI Architecture

  4. Grid vs. Athena Services

  5. Converter Converter Application Manager Converter Event Selector Transient Event Store Message Service Persistency Service Event Data Service JobOptions Service Algorithm Algorithm Algorithm Transient Detector Store Particle Prop. Service Persistency Service Detec. Data Service Other Services Transient Histogram Store Persistency Service Histogram Service Bigger Picture DataSet DB OS Job Service Mass Storage Monitoring Service Config. Service Event Database PDG Database Analysis Program Other Other Histo Presenter

  6. Bucket of Cold Water

  7. Grid: The new paradigm ? ? • The Grid offers a vision of computer resources that are: Distributed, Heterogeneous, Robust, and Integrated. • Some concepts are qualitatively new. • Resource Discovery, Virtual Data, Reserved QoS • Some concepts are quantitatively "new". • Number of sites/jobs/nodes/users. • Some concepts are old wine in new skins. • Distributed processing • Some are natural & "obvious" extensions of old concepts. • Unix GroupsVO, LFNs

  8. Grid Projects: Integrated? • We've heard here about: • GANGA, GRAPPA, BOSS, AliEn • CMT, Pacman, Packman, DAR • WP1 JSS, GriPhyN Planner • Magda, WP2 Replica Service • NetLogger, Prophesy, GMA, R-GMA, GridView, Ganglia • VDL/IVDL, WP1 JDL, Condor ClassAds • EDG, PPDG, GriPhyN, GridPP, InfoGrid, CrossGrid, GGF, Monarch,… • How do we take advantage of Grid capability while protecting ourselves from potential duplication/conflicts of roles & responsibility?

  9. Grid: Ready for PrimeTime? • CHEP'98 -First HENP Grid (Clipper) Talk • #237 Directions and Issues for High Data Rate Wide Area Network Environments • Many Grid projects are CS R&D. But production grids do exist (eg. NASA InfoGrid) and indications are that Grid computing is gaining momentum in non-HENP (ie. mainstream) world. • IBM/Globus Partnership - 12 developers

  10. ATLAS SW & Grid Projects • The Grid does now offer advantages & functionality. More will certainly come. • We cannot afford to wait to be handed the solution. • APIs to Grid services need to be compatible or adapted with Athena Services • ATLAS interests/requirements need to be communicated to Grid researchers/developers & DOE/NFS. • Timelines for ATLAS need to be defined. • Grid timeline is not the same as some others • FTE resources avail. are critical input • Much current work concentrates on issues like: • Data Volume, Data Set Distribution, ATLAS Resources (Disk, CPU, HMS), Network Connectivity, $$$, FTE, etc. • Distributed Computing Model must be defined. • Control Framework • Grid-compatible / Grid-aware, but not Grid-dependent

  11. Grid aware, but not dependent. • Interface Technologies • Programmatic API (eg. C, C++, etc) • Scripting as Glue ala Stallman (eg. Python) • JobOptions.{txt,py} • Sandbox • Others? • eg. SOAP, CORBA, RMI, DCOM, .NET, etc. • International Standards would help! • Global Grid Forum • Staged approach is called for. • Simple Batch model to begin. Add simple Grid functionality via Services. Continual feedback.

  12. Athena/Grid Interface • For the programmatic interface to Grid services, we are thinking in terms of Gaudi services to capture and present the functionality of the grid services (not necessarily a one-to-one mapping, BTW). • I think it is important at this stage (maybe forever) to insure that the framework is "grid-capable" without being "grid-dependent". IE- We should always be able to run without grid services available. • Gaudi's component architecture makes this approach to using the grid quite natural. • How do we switch between Grid/non-Grid?

  13. Jul’01: PSEUDOCODE FOR ATLAS SHORT TERM UC01 Logical File Name LFN = "lfn://"hostname"/"any_string Physical File Name PFN = "pfn://"hostname"/"path Transfer File Name TFN = "gridftp://"PFN_hostname"/path JDL InputData = {LFN[]} OutputSE = host.domain.name Worker Node LFN[] = WP1.LFNList() for (i=0;i<LFN.list;i++){ PFN[] = ReplicaCatalog.getPhysicalFileNames(LFN[i]) j = Athena.eventSelectonSrv.determineClosestPF(PFN[]) localFile = GDMP.makeLocal(PFN[j],OutputSE) Athena.eventSelectionSrv.open(localFile) } PFN[] = getPhysicalFileNames(LFN): PFN = getBestPhysicalFileName(PFN[], String[] protocols) TFN = getTransportFileName(PFN, String protocol) filename = getPosixFileName(TFN)

  14. WP2: Replica Manager API(old: pre-SFN terminology) • addPhysicalFileName(LogicalFileName, PhysicalFileName) • deletePhysicalFileName(LogicalFileName, PhysicalFileName) • SFN = getPhysicalFileNames(LogicalFileName) • copy(PhysicalFileName source, PhysicalFileName destination, String protocol) • copyAndAddPhysicalFile(PhysicalFileName source, PhysicalFileName destination, LogicalFileName lfn, String protocol) • generatePhysicalFileName(LogicalFileName filename, PhysicalFileNamePattern) • estimateCostForCopy(PhysicalFileName source, PhysicalFileName destination, String protocol) • SFN = getLocationOfBestReplica (LogicalFileName) • getBestPhysicalFileName (PhysicalFileNameList, ProtocolList) • getTransportFileName (PhysicalFileName, Protocol)

  15. Athena Distributed Instrumentation • Part of SuperComputing 2002 ATLAS demo • IMonitorSvc  IChronoStatSvc extension? • Abstract application monitoring service. • Prophesy (http://prophesy.mcs.anl.gov/) • An Infrastructure for Analyzing & Modeling the Performance of Parallel & Distributed Applications • Normally a Parse & auto-instrument approach (C & FORTRAN). • NetLogger (http://www-didc.lbl.gov/NetLogger/) • End-to-End Monitoring & Analysisof Distributed Systems • C, C++, Java, Python, Perl, Tcl APIs • Web Service Activation

  16. WP1: Sandbox • Working area (input & output) replicated on each CE to which Grid job is submitted. • Very convenient & natural. • My Concerns: • Requires network access (with associated privileges) to all CEs on Grid. • Could be a huge security issue with local administrators. • Not (yet) coordinated with WP2 services. • Sandbox contents not customizable to local (CE/SE/PFN) environment. • Temptation to Abuse (not for data files)

  17. Grid System Logical filenames ATLAS planner WP2 Rep Mgr WP1 JSS Planner Job JDL Specify input Sandbox Physical File JobOptions GDB Output fragment GDB input Register output GDB Magda

  18. ATLAS SW & the Grid • What are the implications of a distributed computing model and grids for: • The database domain? • Extensive in almost any case • The control framework? • Depends upon the model (e.g., distributed data sources versus distributing executables versus distributed execution) • Other ATLAS software infrastructure? • eg. Build & install tools & kits

  19. Distributed Processing Models • Batch-like Processing (ala WP1) • Distributed Single Event (MPP) • Client-Server (interactive) • WAN Data Access (AMS, Clipper) • File Transfer and Local Processing (GDMP) • Agent-based Processing (distributed control) • Check-Point & Migrate (save & restore) • Scatter & Gather (parallel events) • Move the data or move the executable? • No experiment is planning to write PetaBytes of Code!

  20. ATLAS Distributed Processing Model • At this point, it is still not clear what the final ATLAS distributed computing model will be. Although newer ideas like Agent-based Processing have a great deal of appeal, they are as yet unproven in a large-scale production environment. • A conservative approach would be some combination of Batch-like Processing and File Transfer and Local Processing for batch jobs, with perhaps a Client-Server or Scatter-Gather approach for interactive/analysis jobs. • PPDG CS-11 - Interfacing and Integrating Interactive Data Analysis Tools with the Grid and Identifying Common Components and Services

  21. Data Access Patterns • Data access patterns of physics jobs also heavily influence our thinking about interacting with the Grid. It is likely that all possible data access patterns will be extant in ATLAS data processing at various stages in that processing.We may find that some data access patterns lend themselves to efficient use of the Grid much better than others. • Data access patterns include: • Sequential Access (reconstruction) • Random Access (interactive analysis) • File/Data Set Driven (LFN-friendly) • Navigational Driven (OODB-like) • Query Driven (SQL/OQL/JDO/etc)

  22. DB Architectural Elements • Events are write-once • Three capabilities to support optimization: • Event sharing • Data sharing • Data placement (clustering) • Therefore, different storage formats • Does not mean different technologies! • Different ways to represent events and sets of events. • Possible because navigation is separated from storage. • Examples… ATLAS DataBase Architecture - Ed Frank

  23. Architectural Motif- Extract & Transform • Architecture will express many storage formats • Any job can read any of them without reconfiguration • Can always extract events for transport, regardless of format • Cost depends upon the storage format • Tier 0 assigned responsibility of keeping a copy of the data in a format such that extraction costs are affordable • Archival data format • Can always transform (write) data into a new format • Store in format for local optimization

  24. Extract and Transform Site 1 Extract & transform Just Extract Transport, transform & Install Transport & Install Site 2 Site 3 ATLAS DataBase Architecture - Ed Frank

  25. Object Access vs File Access • ATLAS (like others) is basing our Event Data Model (EDM) on a (transient) Object Data Model. • This transient model maps onto a persistent Object Model (not necessarily 1-to-1) • We require users to think of objects in the transient store at the Algorithm level. • Transient Data Store has data access proxy concepts built in to read-in objects from persistency to TDS. • Current Grid products heavily oriented towards LFN-like view of data. • Perfectly natural as this is the system-level view of data & convenient unit for atomic data transfer across the network. (eg. FTP, gridFTP) • BUT, if we want users to think objects, the object to LFN/PFN mapping has to be somewhere.

  26. Ganga Senarios • Scenario 1 • User makes a "high-level" selection of data to process and defines processing job. • "High-level" means based on event characteristics and not on file or even identity. • High-level event selection uses ATLAS Bookkeeping DataBase (similar to current LArC Bookkeeping data base or BNL's Magda) to select event & logical file identities. • Construct JDL for WP1 using LFNs • Construct jobOptions.py using PFNs (w/ WP2) • Submit job(s) using JDL & jobOptions.py in sandbox. • Scenario 2 - The same except jobOptions.py now contains LFNs. This requires the Replica Service API-enabled EvtSelector or ConversionSrv.

  27. Observation about GUIs • Several projects are promoting GUIs. • WP1, Grappa, AliEn, others. • Independently written "native" GUIs are notoriously difficult to integrate/make coherent. • Web-based GUIs are easier to integrate, but offer limited functionality.

  28. Rule #1: Protect the User • Real Data vs. Virtual Data • LFN vs. PFN/TFN/SFN • Grid Enabled vs. Standalone • We do not want the user of the Framework to know or care about details like this. • Implies: Uniform, abstract access to/specification of data sets (ie. if Real and Virtual Data are to be used). • Dummy (non-Grid) implementations of Grid-enabled Services?

  29. Way Forward/Discussion • Goal: Give direction to new hires funded by GridPP to ensure that their work has the widest applicability in both ATLAS & LHCb. • Discussion Questions: • Data-File or Data-Object level access? • Heterogeneity - How much? (Client vs. Server) • Communication Protocols? • How to synchronize/coordinate? • ATLAS world-wide & Large Active US effort • LHCb - no US component => more EDG-centric • GAUDI/Athena - Where to draw the line? • Grid middleware/Svc Interfaces/Implementations • Balance Short-term Usability vs. Long-term Functionality - Remember the mainstream.

More Related