330 likes | 500 Views
The INFN Regional Center(s) as main node(s) of the INFN Grid Mirco Mazzucato On behalf of Expts Comp. Coord. and tech experts(~EB of INFN-Grid) Paolo Capiluppi, Domenico Galli, Alberto Masoni, Laura Perini, Fulvio Ricci, Antonia Ghiselli, Federico Ruggieri.
E N D
The INFN Regional Center(s) as main node(s) of the INFN Grid MircoMazzucato On behalf of Expts Comp. Coord. and tech experts(~EB of INFN-Grid) Paolo Capiluppi, Domenico Galli, Alberto Masoni, Laura Perini, Fulvio Ricci, Antonia Ghiselli, Federico Ruggieri M.Mazzucato – GR1 - Roma
Main conclusion of the “Hoffmann Review” • The Panel1 recommends the multi-tier hierarchical model proposed by Monarc as one key element of the LHC computing model with the majority of the resources not based at CERN : 1/3 in 2/3 out • About equal share between Tier0 at CERN, Tier1’s and lower level Tiers down to desktops • Tier0 / S(Tier1) / S (all Tier2 +… ) = 1 /1 /1 • General consensus that GRID technologies developed by Datagrid can provide the way to efficiently realize this infrastructure • All experiments should perform Data Challenges of increasing size and complexity until LHC start-up involving also Tier2 • EU Testbed : 30-50% of one LHC experiment by 2003 (Match well with INFN Grid assumptions : 10% of final size for each experiment) • Limit heterogeneity : OS = Linux + backup solution, Persistency = 2 tools max M.Mazzucato – GR1 - Roma
HEP Regional Centre Hierarchy CERN Tier 0 2.5Gbps UK Tier 1 France Italy 2.5Gbps Fermilab 2.5Gbps ~Gbps Tier2 center Tier 2 622Mbps Tier 3 Site Site Site 100Mbps-1Gbps Tier 4 desktop INFN-GRID M.Mazzucato – GR1 - Roma
Tier Computer Centers functionalities(LHC Computing Review) The Monarc model (first approximation) definitions: • Tier0(Cern):raw data storage and first calibration + reconstruction. Very large storage capacity (MSS) • Tier0+Tier1 at CERN : 10 PB/yr tapes; 2PB/yr disk; 2 M SI95 • Tier1:further calibration+reconstruction passes. Large fraction of simulation and analysis. Large storage capacitiy. Associated support. All services. • Typical Tier1 : 3 PB/yr tapes; 0.5PB/yr disk, 0.9M SI95 • Tier2 and lower levels: simulation, analysis. • Typical Tier2 = 20-30% of Tier1 • Tier0 and Tier1 are basically open to all members of a Collaboration, a MoU will specify the conditions, Tier1 will then be needed for the lifetime of LHC • Detailed quantitative estimates of resources and costs are not touched in this talk, but see F.Ruggieri talk for preliminary INFN estimates M.Mazzucato – GR1 - Roma
R&D in Italy : THE INFN-GRID PROJECT • The proposal has been submitted to INFN management and CSNs at the end of July and presented at CSN in September • The size of the project : 26 Sites, ~200 people, ~ 70 FTE’s • The requested funding for 3 years : ~ 10 M Euro • Activities are organized at national level around WP’s developed by Datagrid (9.8 ME), testbeds and experiments applications. Aim to provide multi-tier integration. • Large scale testbeds provided by LHC experiments (Regional Center prototypes) + Virgo • Hope to receive recommendation of approval by CSNs <2000 • Preliminary reserved financing of basic HW infrastructure and R&D: ~ 2 M Euro’s for year 2001 : OK compared to EU partners M.Mazzucato – GR1 - Roma
Strategy of INFN-Grid • Foster close collaboration between LHC experiments and computer professionals (~25 FTE’s in INFN Grid) Software and Middleware • Evaluation and deployment of existing GRID basic services (Globus, Condor,…) mainly looking for production quality • Robustness, scalability (hundreds of users, hundreds of jobs to run, huge data sets, …), reliability very important requirements • Use of GRID software for real applications and on real experiment production environments (CMS already started, Alice, ATLAS, LHCb coming soon..) • Implement missing functionalities in close collaboration with US teams • Incremental deliveries • Tight collaboration with EU partners in Datagrid to • Develop HEP middleware to implement multi-Tier Monarc model for expts • Coordinate relations with US projects: Globus, GriPhyN, PPDG… M.Mazzucato – GR1 - Roma
Strategy of INFN-Grid (Cont.) Hardware infrastructure • Deploy testbeds using resources of “all” INFN sites, connected to DataGrid testbed as recommended by Panel1 of “Hoffmann” Review : ”..involve most of the final distributed computing system” • Develop Computing Fabric prototypes to understand issues related to : • Architecture • Hardware choices • Performance Vs. cost of different disk technologies : SAN/RAID/SCSI/EIDE = 10/5/3/1 in cost • Can we do everything with EIDE disks ? Which tech.is more adequate for each Data set ? • Set up, maintenance and support of large systems • Define network topology and evaluate net technology services M.Mazzucato – GR1 - Roma
The target prototypes Regional Centers for LHC experiments in INFN-Grid • All 4 LHC experiments have chosen to concentrate in this prototyping phase (2001-2003) the bulk of computing resources in one place • ATLAS : Rome • CMS : Legnaro INFN National Laboratories • Alice : Torino • LHC-b(2001) : Bologna • But since INFN has a positive advanced experience of distributed computing (e.g. Condor pool INFN) and since INFN manpower (Physicists and Computing experts) is spread out, several Tier1/2 functionalities are supported by few different sites : • Atlas : Milano • CMS : Bologna, Bari, Padova, Pisa and Rome • Alice : Bologna, Bari, Catania • No common discussions were done in INFN Grid about a possible solution for the final Tier1 centers. Decision was postponed at the end of the prototyping phase. • Note:“Hofmann” Panels conclusions Comp. MoU end 2001 INFN management and CSN: anticipatedecision on RC’s M.Mazzucato – GR1 - Roma
“Hoffmann” Review outcomerelevant for INFN RC’s • For the first time planned an integrated world-wide computing system : • Tier1s open to all members of the collaboration (Up time and efficiency!!) • Very large weight given to Tier(>=2) Centers (Contrary to the past!) • Share of resources between Tier1 and Tier(>2) centers is recognized to be an internal national affair • CERN IT will provide and support, as for LEP, only the basic SW and HW infrastructure at CERN: Tier0 +Tier1 for each experiment. “Budget limited” personnel is foreseen to support experiments issues concerning the usage of this infrastructure. • As for LEP apparently there will be no IT support on specific experiment’s activities : software distributions, productions, simulations, data replications and distribution, analysis etc. These are consideredinternal experiments issues M.Mazzucato – GR1 - Roma
The issue: One vs more Tier1 in INFNSome preliminary considerations…. • Technology (almost all commodity and scalable) • Computing and storage fabric are built up from commodity components • Simple PC’s • Inexpensive network attached disks • Standard network interfaces and probably Standard Lan backbone (Whatever Fast-Giga Ethernet will be in 2006) • High bandwidth WAN connections may not become a commodity but EU Community and Countries will probably support high bandwidth Research Network for strategic reasons (see EU Geant project, Garr-B….) • Mass Storage will probably not be a commodity.. but future Optical Storage Very easy to split an re-group according to needs WAN in 2006 at Gbits like present PC bus Different from 1988 when the scene was dominated by mainframes Assume MSS is the only large not scalable single piece remaining Do we need it in INFN? Need technical evaluation ! M.Mazzucato – GR1 - Roma
One vs more Tier1 in INFN (Cont.) Can we learn from other experiences ? • A lot from experienced HEP computing centers (CERN.., Babar, CDF… ) • Technology, manpower….. • But need to adapt computing model to INFN conditions • INFN availability of computing professionals and know how is not negligible at all (~25 FTE’s in Grid) but spread out in many different sites (26 in INFN Grid) • INFN Computing Services are used to support computing facilities and network connections (A lot of people and expertise…but again distributed) • INFN-Grid provides a framework for a large common cooperative effort in computing. Common project allows to best profit of all distributed expertise. • Very positive up to know. Sharing of work between sites seems as effective as for detector construction. • In the past very successful experience of collaboration INFN wide was done with INFNet, Garr2 and Garr-B planning, INFN Condor pool. • Very important role again played by CNAF in providing central coordination and support to Datagrid middleware development and EU+INFN testbed deployment. M.Mazzucato – GR1 - Roma
One vs more Tier1 in INFN (Cont.) • INFN is unique in Europe (more similar to US) • In France expertise and manpower mostly concentrated in Lyon • Build up in 80’s around a non scalable mainframe • .. and very limited WAN badwidth • Try now to establish more sharing (Marseille, Grenoble..) • In UK computing professional manpower today is very limited. • In Germany not a central organization like INFN (Computing mainly done in University Computing Centers…like CINECA or CILEA) and DESY is not involved • In Netherland computing skill mostly concentrated in SARA (Multi-discipline CC).. M.Mazzucato – GR1 - Roma
OUR CONCLUSION ON RC’s • The choice should be made with the only and unique objective to provide the most efficient and competitive way to perform data analysis for INFN groups taking into account the INFN reality and exploiting at best all available resources. • Watch at other experiences, but no blind duplication!! • The choice should take into account as much as possible • INFN structure • Up time and efficiency • Experiment computing model • Technology evolution and Costs • The solution should provide a complete detailed plan of implementation taking into account role and commitments of the different sites, manpower available and foreseen, expertise etc. M.Mazzucato – GR1 - Roma
OPTIONS TO BE EVALUTED In the first meeting it was decided to limit the evaluation to 3 options • 1 TIER-1 MULTI-EXPERIMENT (Alice, Atlas, Cms, Lhc-B, Virgo) • Located in Bologna area in a new (built or rent) computing facility • Set up by CNAF • Preliminary document prepared by Federico Ruggieri (yet to be discussed) • 2 TIER-1 MULTI-EXPERIMENT • Located in National labs of Frascati and Legnaro • Documents in preparation • 3 (or more) TIER-1 according to experiment whishes • Compared to hyp 2 add (at least) 1 Tier1 in Torino General consensus that Tier1 have to be considered as special Grid node(s) which will provide to other experiment nodes (Tier2..Tier4) the functionalities that are missing or not convenient to duplicate (e.g. mass storage, system support manpower etc.). What is important is the integrated throughput of the overall system and not the one of one single component. M.Mazzucato – GR1 - Roma
Preliminary set of questions For each of the 3 alternatives expts were asked to provide a document answering the following preliminary set of questions • Global estimate of resources needed in Italy / Total expt. • Provide evaluation and detailed computing model containing • Role, resources , manpower etc. of each candidate Tier(n) site • Relations between Tier-1 e Tier-n (functionalities and localization) • Need of mass storage and localization • Space and services available and to be acquired • Available manpower by expts and Computing Services to support Tier • Manpower to support experiment applications (Sim, Rec, An.) • Implications on network connectivity of Tier location • Manpower for expts user support • Cost estimation Candidates Tier 1 sites asked to produce a document containing estimation of space,available and total manpower, costs etc. M.Mazzucato – GR1 - Roma
Present Status • Too short time for expts and candidates centers to produce documents discussed and agreed within the collaborations • A lot of activities going on: • Hoffman Review • Datagrid • INFN Grid • Regional Centers prototypes • Need to define timescale to have study and proposal(s) completed (CMS:June 1rst ?) • Very preliminary answers received : raw material coming independently from each expts. Not at all discussed yet ! • Shown to give flavor of starting views. M.Mazzucato – GR1 - Roma
MODELLI CONSIDERATI • LA COLLABORAZIONE ALICE HA DISCUSSO (21-22/02/00) LE POSSIBILI ALTERNATIVE PER I CENTRI REGIONALI, LE CONCLUSIONI DELLA DISCUSSIONE SONO STATE: • AVERE UN CENTRO TIER-1 DEDICATO ALL’ ESPERIMENTO ED UNA DISTRIBUZIONE DI RISORSE FRA DIVERSI TIER-2 E’ APPARSA LA SOLUZIONE MEGLIO RISPONDENTE ALLE ESIGENZE DI ALICE • LA DISTRIBUZIONE OTTIMALE DELLE RISORSE FRA I VARI TIER VERRA’ EFFETTUATA SULLA BASE DELLA SPERIMENTAZIONE • UNA NUOVA DISCUSSIONE SUI TRE MODELLI E’ AL MOMENTO IN ATTO ITALIA M.Mazzucato – GR1 - Roma
LOCALIZZAZIONE CENTRI • IPOTESI 1/2 (TIER-1 NON DEDICATO ALL' ESPERIMENTO) • TIER-1 • IPOTESI 1: CNAF • IPOTESI 2 LNF, LNL • QUESTE DUE IPOTESI SONO AL MOMENTO ALLO STADIO DI DISCUSSIONE PRELIMINARE ALL’ INTERNO DELLA COLLABORAZIONE • CERTAMENTE (cfr. SLIDES PRECEDENTI) IN QUESTI DUE CASI DIVENTA RILEVANTE L’ ORGANIZZAZIONE DEL SUPPORTO SPECIFICO ALL’ ESPERIMENTO, PARTE DEL QUALE,POTREBBE ESSERE SPOSTATO SUI TIER-2 • IPOTESI 3 (TIER-1 DI ESPERIMENTO) • TIER-1: Torino • TIER-2 Bari, Bologna,Catania • TIER-3 Cagliari,Catania,Padova,Salerno,Trieste • N.B. LE SCELTE DEFINITIVE SUI SITI E SUL BILANCIAMENTO RELATIVO DELLE RISORSE SI PRENDERANNO SULLA BASE DEI RISULTATI DELLA SPERIMENTAZIONE ITALIA M.Mazzucato – GR1 - Roma
ATLAS and the INFN RC’s ATLAS views about INFN Tier1’s • In no ATLAS INFN site enough manpower resources are available for supporting a Tier1, as far as system management and operation are concerned. • According to a 1998 estimate, the total manpower of this kind, that could be extracted at LHC start from all the Italian ATLAS sites, amounts to few FTE’s. In no site as much as 2 full FTE’s will be available. • A new recognition will be performed in the next months, taking into account possible reassignment of resources, e.g. from LEP experiments. However, no huge variation is expected. • ATLAS thus need outsourcing of this kind of manpower. We would also accept a solution where INFN provides for the housing, system management and operation of the ATLAS computing h/w in one or more INFN center(s). M.Mazzucato – GR1 - Roma
ATLAS and the INFN RC’s The manpower for s/w maintenance and user support • ATLAS thinks that the personnel who takes care of the experiment s/w (installation, maintenance, user support etc.) and some core tools (DB, GRID etc.) has to work in close contact with the physicists taking care of analysis, simulation, reconstruction,etc. • At least the leading part of this personnel needs to have gained real experience of the experiment s/w, having also normally taken part in the development of some parts. • The personnel with this profile can only be located and trained in the sites where there are groups of physicists actively working in ATLAS s/w and computing. • These requirements apply mostly for the initial phase, after the first years of running there may be less need for the close contact with physicists. M.Mazzucato – GR1 - Roma
ATLAS and the INFN RC’s The ATLAS Tier2’s and Tier3’s • Just “desktops” will probably count for 10-15% of the CPU resources of an experiment. They will be most probably clustered around servers providing common s/w to all the users of a site (Tier3). In this context, a Tier2 differs from a Tier3 mainly because it serves users of different sites. Clearly the mode of operations will evolve with time, starting from one or few prototypes, used by all the groups. • The ATLAS sites who have already expressed interest in participating to the GRID experimentation with the aim of assuming a TierN (N>1) role are: • Rome1, Milan, Naples, Pavia, Rome2 starting in 2001 • Genova, Pisa starting in 2002 • The s/w experts could cluster around ATLAS Tier2 sites, which would provide h/w and system resources for testing and implementing the new developments (s/w, GRID, DB etc.) M.Mazzucato – GR1 - Roma
ATLAS and the INFN RC’s ATLAS Tier2’s functionality • As for the functions and the h/w resources: • in the next months, we will finalize the layout for the next 3 years (the GRID project span); • for the LHC time, it will be decided in ~1 year from now (in time for the computing MoU); M.Mazzucato – GR1 - Roma
CMS current understandings (1) • CMS Italia preferisce un Tier1 dove esistano diretti interessi per l’Esperimento • Se cosi’ non dovesse essere la gestione del Centro diventa “critica” ed inoltre del personale direttamente coinvolto nel Sotware e nelle attivita’ di Calcolo di CMS dovrebbe essere presente al Centro permanentemente • CMS Italia ritiene che il Tier1 debba essere confrontato con: • Gli altri Tier1 della Collaborazione • Le dimensioni dei Tier2 (dislocati presso le Sezioni) • Le funzionalita’ che modificano il Modello originale di MONARC adattandolo/integrandolo con i Tools di GRID • CMS ritene che i Tools alla GRID possano rappresentare un notevole arricchimento delle potenzialita’ del Computing, oltre che del personale coinvolto • Ed infatti CMS e’ fortemente impegnata in questo campo M.Mazzucato – GR1 - Roma
CMS current understandings (2) • La nuova flessibilta’ introdotta nel Modello Gerarchico di MONARC impone di tenere conto anche dei Tier “inferiori” ai Tier1. Pertanto non si possono disegnare e programmare soltanto i Tier1 (ed il Tier0 al CERN). La regola 1/3 2/3 va correttamente applicata tenedo conto anche delle Istituzioni che non avranno un Tier1, ma che dovranno poter fare l’Analisi dei dati. • Eseguire l’analisi dei dati in modo efficiente e competitivo e’ il primo ed ultimo scopo del Computing di CMS (ed ovviamente anche della parte italiana di CMS) • Il modello sopra citato (MONARC + GRID) risulta piu’ flessibile ed efficiente nell’utilizzo delle risorse di Computing e di Personale, ed assumono un ruolo “essenziale” i Tier2, che CMS sta valorizzando a tutti i livelli di funzionalita’ • I Tier2 permettono una piu’ diretta interazione nell’analisi e trattamento dei dati da parte dei Ricercatori • La “Gerarchia distribuita” che cosi’ viene a crearsi valorizza anche le risorse dei Tier3 ed persino dei Tier4 (desktop), permettendo la fattiva collaborazione e completa partecipazione di tutti i Fisici coinvolti, compresi i laurendi! M.Mazzucato – GR1 - Roma
CMS current understandings (3) • CMS Computing non intende andare in “conflitto” con le responsabilita’ e gli impegni di personale sui Detector, ma ritiene che l’attivita’ di Computing e la sua realizzazione siano parte integrante di CMS Italia • CMS intende pertanto valutare realistiche implementazioni (ed e’ disponibile a collaborare nella valutazione) di Modelli di Tier1 e Tier2 in tempi brevi (qualche mese), valutazioni basate su • Personale disponibile nelle varie sedi e/o da acquisire (anche temporaneo) • Matching delle risorse e delle disponibilita’ di servizi per le attivita’ gia’ oggi in atto (simulazioni e studi di Fisica) • Tempistiche • Infrastrutture necessarie e disponibil • Balancing di funzioni e risorse tra il Tier1 ed i Tier2 • Fattibilita’ tecniche e di investimento • Soluzioni di gestione dei centri (compreso l’outsourcing se necessario) • Gestione dei conflitti e della dinamicita’ delle risorse in Tier1 comuni a piu’ Esperimenti M.Mazzucato – GR1 - Roma
CMS current understandings (4) • Nell’arco di un anno dovranno essere definiti i compiti, le funzionalita’, le dimensioni e le dislocazioni dei Centri (Tier1 e Tier2) prototipali per CMS Italia (anche, ma non solo, per arrivare ad un MoU sul Computing) • Oggi CMS propone alcune soluzioni temporanee che coprono le esigenze delle “simulazioni” per il prossimo anno (Legnaro, Bari, Bologna, Padova, Pisa, Roma1, Torino) • CMS intende utilizzare queste soluzioni come “semi” per valutare il progresso e la dislocazione dei prototipi • La scelta delle Sedi, dinamica e progressivamente da discutere coi Referee nominati dall’INFN (oltre che all’interno della Collaborazione), e’ stata gia’ effettuata da CMS, configurando un programma di sviluppo che attraversa tutte le sedi italiane in modo differenziato • CMS ritiene che la scelta di un Tier1 unico e/o in “condominio” con altri Esperimenti, debba essere effettuata dall’INFN sulla base di un confronto scientifico di progetti differenti, in tempi brevi. M.Mazzucato – GR1 - Roma
LHCb Computing Model • LHCb data storage • RAW and ESD data stored only in production centres (real data at CERN, MC data at the Tier-1 centre which produced them). • No systematic RAW/ESD distribution/replication (very small samples are sent on demand: 10% the first 2 year, 2% later). • No group analyses identified • First stage in the analysis performed in common for all the analyses that subsequently follow. • AOD and TAG produced in production centres (CERN for real data, Tier-1 centrfes for MC). • AOD and TAG data systematically distributed to all Tier-1 centres via network as soon as they are produced (or regenerated) and there stored on disks. • No Tier-2 centres. • 80 TB/a real data AOD exported from CERN. • 120 TB/a MC AOD exported from every Tier-1. M.Mazzucato – GR1 - Roma
LHCb Regional Centres • LHCb-Italy plans for: • 1 “concentrated” Tier-1 computing centre; • 9 Tier-3 computing centres, located in Bologna, Cagliari, Ferrara, Firenze, Frascati, Genova, Milano, Roma1, Roma2. • LHCb Tier-1 can be housed indifferently: • in a LHCb dedicated computing centre (like 2001 LHCb setup in Bologna) provided that technical manpower will be hired and trained by INFN; • in a multi-experiment INFN national computing centre, provided that LHCb will have its own reosurces satisfying the experiment requirements. • LHCb Tier-1 is thought to operate efficiently with resources concentraded in only one site. • LHCb Tier-1 needs a tape library. M.Mazzucato – GR1 - Roma
VIRGO needs for data analysis Time constraints -The Central Interferometer of VIRGO will produce data in 2001 - The full VIRGO interferometer will produce data in 2003 The VIRGO computing model 2 sites for the raw data storage: Tier 0 in Italy and Tier 1 in France Computing for VIRGO in Italy : 1 Tier 0, 2 Tiers 2, 2 Tier 3, Tier functions Tier 0 (raw data storage) Cascina (Virgo site) Tier 2/Tier 3 (data base and computing for pulsar search) Roma/Firenze Tier 2/Tier 3 (coalescent binary system search) Napoli/Perugia M.Mazzucato – GR1 - Roma
VIRGO needs for data analysis • Summary of the VIRGO needs for computing and network connections • units end 2001 end 2003 • CPU capacity+8,000 SI95 (350 Gflops) 8 104 SI95(3.5Tflops) • estd. number of cpus 400 1000 • disk capacity TBytes 10 100 • disk I/O rate GBytes/sec 5 5 • sustained data rate Mbytes/sec 250 250 • WAN links to Cascina Mbits/sec 155 2 ,500 • WAN links to labs Mbits/sec 34 622 • WAN links to France Mbits/sec 34 622 M.Mazzucato – GR1 - Roma
VIRGO needs for data analysis Man power for the VIRGO test bed and the full deployment of the system TIER 0: In Cascina there are already two system engineers. An operator staff of few units(2/3) is needed to run the system. The European Gravitational Observatory (EGO) consortium can provide it. TIER 2: INFN Section support is under discussion: in case of merging computing necessities from various experiments on site, the man power will be reduced. A realistic alternative for TIER 2 management is the outsourcing: consortia (as for example CASPUR for Roma La Sapienza) can provide the service for TIER 2. M.Mazzucato – GR1 - Roma
VIRGO needs for data analysis Network improvement is compulsory in any scenario. The IN2P3 computer center in Lyon already required it explicitly. The Cascina site must be connected to the GARR-B backbone at higher speed immediately. Once the Central Interferometer runs (2001), VIRGO have to start the distribution of the data via network to the VIRGO groups. The analysis of data produced by the network of the Interferometric Gravitational Wave antennas require also network improvements. M.Mazzucato – GR1 - Roma
CONCLUSIONS • Work to evaluate Tier1 RC’s location and relation with other Tier(n) in INFN has started • Need time to understand, discuss and agree on possible choices • Cannot be an abstract solution but should be accompanied by a detailed implementation plan defining role and responsibilities of each site • Proposed date to finish the work is around June 2001 M.Mazzucato – GR1 - Roma