110 likes | 246 Views
GRIF Status http://grif.fr. Michel Jouvin LAL / IN2P3 jouvin@lal.in2p3.fr. Objectives. Build a Tier2 facility for simulation and analysis in Paris Region 80% LHC 4 experiments, 20% EGEE and local LHC : analysis (2/3) and MC simulation (1/3) Be ready at LHC startup (2 nd half of 2007)
E N D
GRIF Statushttp://grif.fr Michel Jouvin LAL / IN2P3 jouvin@lal.in2p3.fr
Objectives • Build a Tier2 facility for simulation and analysis in Paris Region • 80% LHC 4 experiments, 20% EGEE and local • LHC : analysis (2/3) and MC simulation (1/3) • Be ready at LHC startup (2nd half of 2007) • Resource goals(end of 2007) • CPU : 1500 kSI2K (1kSI2K ~ P4 Xeon 2,8 Ghz) • Storage : 350 TB of disks (disk only, no MSS) • Network : 10 Gb/s backbone inside Tier2, 1 Gb/s external link GRIF Tier2 - HEPix - SLAC 2005
Members • Project started by DAPNIA (CEA), LAL (IN2P3, Orsay) and LPNHE (IN2P3, Paris), Fall 2004 • DAPNIA and LAL involved in Grid effort since beginning of EDG • 3 EGEE contracts (2 for operation support) • No lab big enough to run a T2 by itself • LLR (IN2P3, Palaiseau) and IPNO (IN2P3, Orsay) joined the project in Sept. 05 • IPNO : nuclear physics (Alice + Agatha) • LLR : CMS GRIF Tier2 - HEPix - SLAC 2005
Organization • 1 EGEE/LCG site, distributed over all labs • Computing and storage resources in each lab • Computing rooms and financing • IPNO wil concentrate on non LHC resources funding • 1 Gb/s link for IPNO, LAL, LPNHE, “soon” for DAPNIA • Technical Committee : people from every lab • 5 FTE in 2005, 6-7 in 2006, more in 2007 • Currently 15-20 people involved (several part time) • M. Jouvin (chairman), P. Micout, P.F. Honoré… • Scientific Committee (fund raising) • J.P. Meyer (DAPNIA/Atlas, chairman), 1 person / lab GRIF Tier2 - HEPix - SLAC 2005
Finances • Total budget estimated to 1,6 M€ (2005-2007) • 30% from Region council • 30% from National Research Agency (ANR) • 40% from the labs (CEA, CNRS, Paris6 university) • No significant support from IN2P3 / LCG France (focused on T1) • ½ budget still uncertain… First answers soon… • Progressive investment : no HW replacement before 2009 • 2005 : 150 K€, 2006 : 450 K€, 2007 : 1 M€ • If necessary, could use 2008 to spread the effort • 2009+ : 300 K€/year expected from IN2P3/LCG France GRIF Tier2 - HEPix - SLAC 2005
Current Status • EGEE/LCG GRIF site created • IN2P3-LAL decommissionned, resources moved to GRIF • 2 sites with resources, 2 sites ordering • DAPNIA : 20 WNs CPUs, 12 TB, installation in progress • LAL : 26 WNs CPUs, 8 TB (SRM/DPM), LCG services • 4,5 TB on order • LPNHE : 15 WNs CPUs, 5 TB ordered soon • IPNO : 20 WN CPUs (dual core blades) • End of 2005 : 80 WNs CPUs, 25 TB • Separate CE/SE on each site GRIF Tier2 - HEPix - SLAC 2005
2005 Main Activities… • Setup of resources on each site • Global configuration consistency : Quattor choosen • Flexible site customization inside a unique database • Setup of a multi-site technical team • Tutorials for new sites administrators • Sharing management load (ex : middleware upgrade) • Write documentation for sharing information and expertise (Trac) GRIF Tier2 - HEPix - SLAC 2005
… 2005 Main Activites • Evaluate DPM as a storage solution • Successful so far, easy to setup and manage • Quattor component written to manage DPM configuration • Plan to evaluate a multi-site configuration • Disk servers on several sites • Current lack of srmcp is a problem with CMS/Phedex • Participation to LCG SC3 • Throughput phase : 35 MB/s sustained 4 days • Plan to join service phase mid-november GRIF Tier2 - HEPix - SLAC 2005
2006 : Mini Tier2 • Main goal : setup 20+% of final configuration • 300 WNs CPUs, 70 TB • Exact size wil depend on fund rising success… • Focus • Muti-site or mono-site CE/SE resources • Final choice for batch scheduler : evaluation of LSF and SGE • Final choice for SE architecture (DPM only, DPM + LUSTRE) • Setup of monitoring tools : Nagios ?, Lemon ?, others ? • Integration with local operations on each site • Miscellanous • Continue active participitation to SC • Evaluation of 10 Gb/s link feasibality and effectiveness • Computer rooms requirements (electrical power, air cooling…) GRIF Tier2 - HEPix - SLAC 2005
Storage Challenge • Efficient use and management of a large amount of storage seen as the main challenge • Access to data from 1000+ CPUs, no staging • Decided to start partnership with HP on LUSTRE in the Grid (LCG) context • Performance with a large number of clients • Geographically distributed LUSTRE configuration • Replication of critical datas (metadatas) among sites • SRM and/or xrootd integration • Funds requested to ANR, answer soon… • Uncertainty with HP troubles in France… GRIF Tier2 - HEPix - SLAC 2005
Batch Scheduler • 1 unified T2 means 1 batch scheduler • Required for a coherent view/publishing of resources • Main requirements • Efficient use of distributed resources • Handle 1000+ running jobs, 10Kjobs in queues • Torque may not be appropriate • Scalability and rosbustness, lack of dynamic reconfiguration • Looking at LSF • LAL has experience for its internal use (and contacts…) • Multicluster may offer the flexibility for global unified resource but maintaining some job/resources affinity at each site • Evaluation to start soon : 1 cluster+CE per site + cross submission • Other candidates : SGE, Condor ? GRIF Tier2 - HEPix - SLAC 2005