1 / 24

The ATLAS Tier2 Federation INFN

The ATLAS Tier2 Federation INFN. Aims, functions. structure Schedule Services and INFN Grid. Layout. The Tier2s for ATLAS in Italy 3 slides from our dear Referee at CSN1 in April Structure and functions of the Federation The schedule for the near future Mostly SC but not only..

Download Presentation

The ATLAS Tier2 Federation INFN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The ATLAS Tier2 Federation INFN Aims, functions. structure Schedule Services and INFN Grid L.Perini Workshop CCR @ Otranto

  2. Layout • The Tier2s for ATLAS in Italy • 3 slides from our dear Referee at CSN1 in April • Structure and functions of the Federation • The schedule for the near future • Mostly SC but not only.. • The Grid tools and services • Relation with INFN production Grid • Status of specific tools and services • No mention of Money whatsoever…. L.Perini Workshop CCR @ Otranto

  3. Tier2 ATLAS (referee Forti @CSN1 Aprile) • Approvazione piena • Roma1 • Napoli, che non ha costi infrastrutturali e progetto solido • Approvazione SJ • Milano, a cui si richiede • il miglioramento e chiarimento del progetto infrastrutturale • reassessment della schedule di LHC (prevista per giugno 2006) ed effettiva partenza della macchina • Incubatore (Proto-TIER2) • LNF, le cui debolezze sono: • finanziamento necessario significativo; manpower tecnico e tecnologo un po’ limitato, esperienza in grid da migliorare. • Sia le sedi approvate che le altre dovranno essere sottoposte a verifiche periodiche • Se non funziona l’etichetta Tier2 viene tolta

  4. Proposta dei referee @ CSN1 aprile • Il modello di calcolo proposto dagli esperimenti e’ ragionevole • Il costo totale infrastrutturale e’ inferiore a quello che si poteva temere • La prudenza e le incertezze ci spingono ad approvare non più di 2 Tier2 adesso. • Le risorse dell’INFN sono limitate e sono un elemento ad oggi non ben noto. • Rappresentano un punto di domanda in tutto quello che segue • Proponiamo tre livelli di approvazione: • Approvazione piena • Approvazione SJ • Incubatore di Tier2 (Proto-Tier2) • Le condizioni per la rimozione del SJ sono: • la sede deve risolvere i propri punti di debolezza • reassessment della schedule di LHC (prevista per giugno 2006) ed effettiva partenza della macchina • tempistica O(6 mesi) • Le condizioni per la l’uscita dell’incubatore sono: • la sede deve risolvere i propri punti di debolezza • mantenimento della schedule delle necessita’ di calcolo dell’esperimento • validazione del modello di calcolo distribuito dell’esperimento • Tempistica O(12 mesi)

  5. Proposta dei referee @ CSN1 aprile • Le risorse di computing • dovranno essere assegnate a tutte le sedi • per rispondere alle esigenze dell’esperimento • per mantenere attiva la comunita’ e partecipare a Grid ed ai service/data challenge • per essere pronti al momento dell’arrivo dei dati • dovranno essere pianificate attentamente • per evitare acquisti prematuri • per permettere ai gruppi italiani di prendersi le responsabilita’ sul sw derivanti dall’impegno sull’hw. • Entita’ del finanziamento da discutere • gli esperimenti devono a questo punto presentare un piano aggiornato

  6. Tier2 Federation Structure • Given the referee recommendation in the previous slide, the ATLAS federation includes also the Tier2 sj and the Tier2 inc • This choice is needed for organizing the practical work at hand • Organizing italian participation in SC4 (June-November) and the first ATLAS large test of distributed analysis (October-November) is the nearest major function of the federation (see next slides) • The analysis phase will require use training and opening of user accounts (also for remote user) with some disk space, for experimenting implementations of the analysis model • ATLAS Italy expects a decision about sj in September • Thus using the Milan resources (experienced people and hw) for supporting the about 20 users (>half of them from Genova, Pavia, Pisa, Udine) who will be active in the analysis phase and had proposed to insist on the Milan Tier2, looks to us the only rational way to follow,till the decision about sj is pending • In case Referees/CSN1 etc. think we should proceed otherwise we expect to be told and to have the opportunity to discuss with them how to proceed L.Perini Workshop CCR @ Otranto

  7. Structure and setting up • ATLAS-Italy is setting up a Tier2 federation now • Some aspects already defined some being defined • Some of the materials in these slides are fully agreed some are proposals by me • A Federation Representative L.Perini (Mi) • Typically 1 year mandate – rotation on Tier2 • A pool of federation referents for specific items: • Network: G. Lo Re (Na) • Sw distribution and related matters: A. De Salvo (Roma1) • SE and data architecture: still to be found… • Other areas may be identified in the next future • For each area local referents in all candidates Tier2 • Defaulting on the local Tier2 responsible • Regular (be-weekly) short phone conf. between the Fed. Rep., the local Tier2 responsibles (or deputy) and the fed. experts being considered L.Perini Workshop CCR @ Otranto

  8. Aims and Functions - 1 • Facilitate interface with LCG, ATLAS - Grid, INFN Grid • Relation with INFN as Funding Agency stays primarily with the National representative ( and with the computing national rep.) • L.Mandelli and L. Luminari • Foster common solutions in the areas where choices are still to be made • E.g. choose how to implement the analysis model in Italy, as well as which storage system and which local monitoring tools • Represent the Federation when a unique voice is required • The functions on the next slide will be coordinated by the Grid area coordinator (in the ATLAS-Italy Computing structure it is L.Perini) but will require the active support by the Tier2 federation, especially for the initial phase L.Perini Workshop CCR @ Otranto

  9. Aims and Functions - 2 • Organize ATLAS specific “Computing operation” work as far as Grid/Tier2 • E.g. operate efficiently the continuous ATLAS production via ProdSys, thus freeing some more expert manpower for the needed tasks of new sw-mw testing and development • Organize the training required for the above step • Coordination of the ATLAS-Italy contribution to the deployment and development effort in the area of interfacing ATLAS-LCG-EGEE mw to the ATLAS sw • ATLAS use of VOMS, LCG-executor in ProdSys, ATLAS DDM • On the first 2 items, the INFN effort is already the biggest one in ATLAS, but more is needed • To be done in close contact with ATLAS global and the ATLAS-Italy Computing representative • See next slide for the needs L.Perini Workshop CCR @ Otranto

  10. Status of ATLAS developments in the LCG-EGEE area • ATLAS is about to start an action via International Computing Board and National Representatives to address a situation felt as increasingly risky • Manpower shortage on the ATLAS collaboration side to make full use of the LCG-EGEE mw, and to be able to proactively integrate and validate new functionality into the ATLAS applications running on the LCG-EGEE Grid. • “Hero model” • INFN is today one of the major contributors but we are relaying on too few overloaded people, part of them shared with the EGEE work (which funds them). • Enlarging the pool of Grid developers, experts deployers and operators is mandatory also for ATLAS-Italy L.Perini Workshop CCR @ Otranto

  11. Schedule for next future • Largely determined by the global schedule set up by ATLAS • SC4 next big engagement (see next slides) • All the 4 existing sites are willing to participate • Count on SE certificates for Naples and LNF coming soon • For Naples came last Monday • ATLAS continuous production is part of it • Distributed Analysis first tests scheduled for October-November are of extreme interest for our community • Some Italy specific work is scheduled too • Most important Calibration, not going here in any details… L.Perini Workshop CCR @ Otranto

  12. SC4 – the Pilot LHC Service from June 2006 A stable service on which experiments can make a full demonstration of experiment offline chain • DAQ  Tier-0  Tier-1data recording, calibration, reconstruction • Offline analysis - Tier-1  Tier-2 data exchangesimulation, batch and end-user analysis And sites can test their operational readiness • Service metrics  MoU service levels • Grid services • Mass storage services, including magnetic tape Extension to most Tier-2 sites Evolution of SC3 rather than lots of new functionality In parallel – • Development and deployment of distributed database services (3D project) • Testing and deployment of new mass storage services (SRM 2.1)

  13. 2006 2007 2008 LCG Service Deadlines Pilot Services – stable service from 1 June 06 LHC Service in operation– 1 Oct 06over following six months ramp up to full operational capacity & performance cosmics first physics LHC service commissioned – 1 Apr 07 full physics run

  14. ATLAS SC4 Schedule • June :19 June till 7 July send 772 MB/sec "Raw" (at 320 MB/s), ESD (at 252 MB/s) and AOD (at 200 MB/s) from Tier 0 to Atlas Tier 1 sites, a total of 90K files per day. The "raw" to go to tape. The Tier2 subscribe fake AOD (20MB/sec) CDP=Continuous distributed production of 2M MC events/week requiring 2700 KSi2K. • CDP is being active in the last months (next slide by Ian Bird from SA1 talk in May Final EGEE EU review) All Tier2 INFN involved • Operated for > 50% by INFN people (<3!) on LCG resources • July: Distributed reconstruction setting up using local stagein from tape (1-2 drives required). CDP • August:Two 3-day slots of distributed reconstruction using local stagein from tape (1-2 drives required). Distributed analysis tests - 20 MB/sec incoming at each Tier 1. no CDP? • September: Tier 0 internal tests CDP • October: Distributed reprocessing tests - 20 MB/sec incoming at each Tier 1. AOD to Tier2s CDP • November:Distributed analysis tests - 20 MB/sec incoming at each Tier 1 at the same time as distributed reprocessing continues. Massive Tier2 involvement. CDP L.Perini Workshop CCR @ Otranto

  15. Use of the infrastructure • Sustained & regular workloads of >30K jobs/day • spread across full infrastructure • doubling/tripling in last 6 months – no effect on operations Ian Bird, SA1, EGEE Final Review 23-24th May 2006

  16. Phase 19-6 to 8-7 • Basically the first distributed test of ATLAS DDM (DQ2) • All Tier1’s involved + some Tier2’s (many?) • VOBOX only in Tier1, DQ2 servers • Data is shipped from Castor @ CERN, using FTS, to a storage area at a site. This is dummy data (no physics value), so sites may scratch it later fake ESD). Sites must report the SRM host/path where this data is to be written. In addition, we will use the LFC catalogs already available per Tier1 to catalog this dummy data - as with the real system. • DQ2 will be used to submit, manage and monitor - hopefully without significant user intervention - the Tier1 export. DQ2 is based on the concept of dataset subscriptions: a site is subscribed by the Tier0 management system @ CERN to a dataset that has been reprocessed. The DQ2 site service running at the site's VO BOX will then pick up subscriptions, submit and manage the corresponding FTS requests. • Tier2 will subscribe for fake AOD (20 MB/s target) L.Perini Workshop CCR @ Otranto

  17. Tier2 INFN in SC4 • I 4 siti sono tutti coinvolti • Importante anche per acquisire esperienza su ATLAS DDM (nuovo!) • Nella fase fino a 8 luglio i dati sono fake, ma tanti • 1.6 TB al giorno se si raggiunge il target • Lo spazio disco oggi mediamente libero su disco non ci basta neppure per 2 giorni … • Sono dati fake, li ripuliremo in continuazione e sopravvivremo • Da ottobre i dati saranno veri • Disporre di disco aggiuntivo diventerà allora indispensabile L.Perini Workshop CCR @ Otranto

  18. ATLAS SC4 • ATLAS intende utilizzare SC4 • Come test di trasferimento dati via rete (prima fase) • Ma soprattutto come test dei diversi aspetti del suo modello di calcolo (in particolare per la seconda fase) • Il punto 2 richiede per ATLAS sw e mw che al 1-6 non è ancora in “produzione” • Mw: RB gLite, nuovo FTS, VOMS enabled fair share… • Sw ATLAS : varie parti di DDM (DQ2), analysis system with friendly interface (abbiamo invece Production System) • Ritardo (sia gLite 3.0 che DQ2) rispetto a schedula originale • Si lavora per avere da Ottobre un sistema “production-like” • Non facile ma possibile, magari con paio di mesi shift? • Poi servirà ancora parecchio sviluppo e sforzo per portare le nuove features a production level L.Perini Workshop CCR @ Otranto

  19. Services-tools and INFN Grid • Relay on the tools and services developed by EGEE-LCG (INFN Grid) as much as possible • Take advantage of all the possible synergies with INFN Grid Operation structure • Full integration in the ATLAS-LCG-EGEE system • Relatively easy in ATLAS as some insulation of Eu-Grid from US-Grid and Nord-Grid is built in the ATLAS system • Specific tools and services dealt with in the next slides L.Perini Workshop CCR @ Otranto

  20. RB,CE, SE,FTS,LFC,VObox • ATLAS is using RB and Condor-G on the LCG resources • US and NorduGrid use different submission systems • The 3 interfaces (“executors”) are part of the same ATLAS ProdSys • With Condor-G friendly competition, INFN people fully engaged in RB use as developers and operators • The Condor-G workers are even less than our people..it is helping us in winning the competition…not good… • Our WMS interface (“Lexor executor”) is now adapted to the new gLite RB • Test RB servers with all last fix at Milan and CNAF seem ok NOW • Ready to start production with it in the next days! • Thanks also to the work of the ATLAS-LCG-EGEE task force L.Perini Workshop CCR @ Otranto

  21. RB,CE, SE,FTS,LFC,VObox • We are using the LCG CE • gLite and CREAM CE have some interesting features • Plan to test them on Pre-Prod TB in the Task Force • Different SE are in use • For SC4 in INFN Tier2 will be DPM as SRM is needed • ATLAS DDM uses FTS and LFC both as central and distributed catalogue • ATLAS VOboxes are only at the Tier1’s and include only “less risk category services” (=class 1) • FTS plugins are explored as a possibility for making VObox “thinner”… still way to go… L.Perini Workshop CCR @ Otranto

  22. VOMS, Accounting, HLR, Job Priority • ATLAS needs a system that • acknowledges the existence of VOMS groups and roles as defined by  the VO; • uses the priorities as defined by sites and VO to distribute jobs; • uses the VOMS groups as a basis for data storage. • The CPU and storage usage has to be accounted at the group and user level • These functions should not relay on a unique central DB • The accounting tool we plan for is the merged APEL+DGAS • Site HLR needed • Test in ATLAS TF asap: exploting the setting up already done in INFN GRID (HLR etc) In production in October????? • For Job priority and fair share the only promising tool I know is GPbox • Preview TB testing foreseen in the TF, production timing to be understood L.Perini Workshop CCR @ Otranto

  23. Monitoring and (local) management tools • The only ATLAS specific monitoring tools are now for jobs monitoring using the ProdSys DB • Understand what need to be developed in addition to GridIce and DGAS • Favour adopting solutions already is use in INFN and common development if needed • Storage monitoring looks a general need… • Participate in DGAS testing…. • In any case it would be difficult to find ATLAS manpower for developing new solutions here… L.Perini Workshop CCR @ Otranto

  24. Conclusion • The months from here to the end of 2006 are critical for setting up the ATLAS data and analysis system • And have italian users start exploiting them • A lot of work to be done • The federation will have an important role in helping organise the ATLAS-Italy effort in these areas • as well as in setting up the tools, services and structures needed for managing and running the Tier2 themselves • Our plan intend to use all our human and hw resources in the most efficient way for ATLAS-Italy as a whole L.Perini Workshop CCR @ Otranto

More Related