1 / 17

LCG & EGEE Status & Overview GridPP9 February 4 th 2004 Tony.Cass@ CERN .ch

LCG & EGEE Status & Overview GridPP9 February 4 th 2004 Tony.Cass@ CERN .ch. Agenda. LCG LHCC Review Area Status Applications Fabric Grid Deployment Grid Technology LCG & GridPP2 EGEE @ CERN. LCG – LHCC Review. LHCC Comprehensive review of the project, 24 th /25 th November.

roana
Download Presentation

LCG & EGEE Status & Overview GridPP9 February 4 th 2004 Tony.Cass@ CERN .ch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LCG & EGEE Status & OverviewGridPP9February 4th 2004Tony.Cass@CERN.ch

  2. Agenda • LCG • LHCC Review • Area Status • Applications • Fabric • Grid Deployment • Grid Technology • LCG & GridPP2 • EGEE @ CERN

  3. LCG – LHCC Review • LHCC Comprehensive review of the project, 24th/25th November. • See http://agenda.cern.ch/fullAgenda.php?ida=a035729 • Preceded by • Application Area as part of overall experiment software planning, 4th September. • See http://agenda.cern.ch/fullAgenda.php?ida=a032308, • LCG Internal review of the Grid & Fabric areas, 17th-19th November. • See http://agenda.cern.ch/fullAgenda.php?ida=a035728

  4. LCG — Review Conclusions • The LHCC noted significant progress: “It is realistic to expect the LCG project to have prototyped, built and commissioned the initial LHC computing environment”. • The LHCC noted the good progress in the Applications Area. • No problems for Fabrics! • Usual worries about Grid deployment, middleware development and middleware directions (ARDA), but the review committee considered that the project is/was steering appropriate course. • GridPP funded manpower is a substantial factor behind the progress noted by the LHCC Reviewers!

  5. LCG — Applications Area • From the Applications Area report for Q4, 2003: • “The applications area in this quarter continued to move through a period in which rapid-paced development and feedback-driven debugging are giving way to consolidation of the software in a number of areas and increased direct support for experiment integration.” • Internal Applications Area review in October prior to LHCC review. • Review report reflected in AA plans for 2004. • In particular, recommendation for closer relationship with ROOT team being followed up in area of dictionary & maths libraries. • SEAL and ROOT teams developing proposed workplans for consideration in Q1 this year.

  6. LCG — Applications Area • POOL • passed major integration and deployment milestone with production use by CMS: millions of events written per week; • no longer on CMS critical path to Data Challenge readiness, a major success for the POOL team and CMS. • Simulation project • completed important milestones (initial cycle of EM physics validation), drew close to completing others (revisiting of physics requirements, hadronic physics validation), and made an important clarification of the objectives and program of the generic simulation framework subproject • Maybe not directly Grid related, but LHCC review “underlined the importance of supporting the Monte Carlo generator codes for the experiments.” • Other items • SEAL and POOL now available for Windows; initial PI program essentially complete; ARDA RTAG report.

  7. LCG — Fabric Area • Extremely Large Farm management system expandingits control of the CERN fabric • quattor management of CPU servers being extended to disk & tape servers (including CASTOR configuration). Disk configuration stored in CDB using HMS. • Lemon: EDG/WP4 repository in production since Sept • LEAF: new State Management System being used to move systems into/out of production and drive, e.g., kernel upgrades. • All computer centre machines registered in quattor/CDB. • Use of ELFms tools, particularly quattor, for management of experiment farms is under discussion (and test). • CERN Computer Centre upgrade continues. • Substation civil engineering almost complete; electrical equipment starts to arrive in March. • RHS of machine room upgraded: Major equipment migration to free the LHS is in preparation!

  8. LCG — Fabric Area • Phase II purchasing process starting now • See talk at http://agenda.cern.ch/fullAgenda.php?ida=a036781. • Long lead time before 2006 purchases given CERN rules. • Install early in 2006. Volumes are large, especially for disk servers. • Plan to qualify suppliers of “generic PCs” • “Intel-like architecture” about the only requirement • Selection criteria for companies is the major consideration at present. Plan careful evaluation of potential bidders in Autumn. • Expect CPU servers to be commodity equipment as at present. • Disk server area is major concern. • Significant problems with EIDE servers in 2003. Reasons not fully understood (yet!). Procedures and control much improved since November (end of 2003 data taking). • Still, these servers are significantly cheaper than alternatives. We need to be able to deal with hardware failures in this area. • CMS and ATLAS are watching our plans closely. • Common suppliers for Tier0/1 and online farms?

  9. LCG — Grid Deployment • LCG1 service now covers 28 sites. Major productions for ATLAS and CMS during Christmas holidays. • CMS: 600K events; sites mainly in Italy & Spain. • ATLAS 14K events (although only 75 jobs). • US/ATLAS sent requests for job execution to LCG-1 from the US Grid3 infrastructure. After some work, events were successfully generated using LCG-1 sites CERN, Turin and Brookhaven with the output data staged at the University of Chicago and registered in the Globus RLS. • LCG2 service is on smaller number of sites • Avoid configuration and stability issues • Require commitment of sufficient support effort and compute resources

  10. LCG2 Core sites and commitments Initial LCG-2 core sites Other firm commitments Will bring in the other 20 LCG-1 sites as quickly as possible

  11. LCG2 functionality • General • CondorG – • new grid manager (critical, now in official VDT); gahp-server (critical, local, with Condor team now); scheduler, memory usage (with Condor team) • Globus - • RM wouldn't work behind the firewall; prevent occassional hangs of CE; number of errors in the handling of return status from various functions • Refrained from putting all fixes into 2.2.x knowing that they would be included in 2.4.3. • RB – new WP1 fixed number of LCG-1 problems (reported by LCG) • above this we fixed (with WP1 team) memory leaks in Interlockd, network server & filelist problem • CE – memory leaks • Installation • WN installation independent from LCFGng (or other tools) • Still required for service nodes • Still require outbound IP connectivity from WN’s • Work to be done to address in Replica Manager • Add statement to security policy to recognise the need – but limit it – applications must not rely on this

  12. LCG2 Status • Generally OK, but delayed by problems in SE area. • Intention was to use SRM interfaces for Castor and dCache, but there are still problems… • Agreed now to continue for the present with gridftp access to storage. • dcache will be available as a space manager for sites without one, but not using the SRM interface initially. • Joint testing with ALICE starts this week.

  13. LCG — Grid Technology • Key topic has been, of course, the direction of Grid Middleware. • ARDA started as an RTAG in the applications area to define the completion of the Physicist Interface programme (distributed analysis). Much discussion, though, on the Grid Middleware and impact on the DA framework. • ARDA workshop held at CERN, January 21st/22nd to plan post RTAG developments. • See report later this afternoon • and you’ve just heard from Tony! • also http://agenda.cern.ch/fullAgenda.php?ida=a036745.

  14. LCG & GridPP2 • Remember: delay of LHC to April 2007 means LCG and GridPP are now out of phase. LCG phase II starts only in January 2006. • Work programme and plan both exist, however, and there is a shortfall in resources, principally lack of staff. • Strong desire to maintain UK contribution (and influence) and links between GridPP & LCG. • UK message that clear case must be made is understood. Discussions with new CERN management are starting. • £1M would support 5FTE over the 3 years (c.f. 25+ now and 10 in the GridPP2 proposal). Work areas to be agreed. • Existing GridPP funded staff have ~1 year left before the end of their contracts. There will be a review of post effectiveness similar to that just completed for other GridPP posts.

  15. EGEE @ CERN • See Neil for the high level politics! • Implementation Plan • Initial service will be based on the LCG infrastructure (this will be the production service, most resources allocated here) • Cross membership of LCG & EGEE project management boards. • Also will need a certification test-bed system • For debugging and problem resolving of the production system • In parallel must deploy a development service • Runs the candidate next software release for production • Treated as an reliable facility (but with less support than the production service) • EGEE All Activities Meeting, January 13th/14th • see http://agenda.cern.ch/fullAgenda.php?ida=a036343. • Two areas managed by CERN • JRA1/Middleware: Frederic Hemmer • SA1/Operations: Ian Bird • Siginificant effort in recruitment area over last 2 months. Four boards held. 19 job offers made to date. CERN support for at least one person prior to April project start.

  16. Conclusion • Good progress in all areas :-> • As ever, strongly supported by GridPP funded effort at CERN.

More Related