1 / 22

gLite Status

gLite Status. Stephen Burke RAL GridPP 13 - Durham. Overview. gLite releases gLite deployment WMS DMS R-GMA VOMS Outstanding issues E&OE!. Releases. gLite releases so far. Release 1.0 on April 5 th Released to meet deadline WMS + CE + Fireman + gLite i/o + R-GMA + VOMS

Download Presentation

gLite Status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gLite Status Stephen Burke RAL GridPP 13 - Durham

  2. Overview • gLite releases • gLite deployment • WMS • DMS • R-GMA • VOMS • Outstanding issues • E&OE! gLite Status

  3. Releases

  4. gLite releases so far • Release 1.0 on April 5th • Released to meet deadline • WMS + CE + Fireman + gLite i/o + R-GMA + VOMS • AliEn, GAS and package manager gone • Several things missing or not working well • No SE in gLite • Documentation is reasonable • Release 1.1 on May 12th • First versions of File Transfer Service (FTS), metadata catalogue • Secure file catalogues • Bug fixes gLite Status

  5. Future releases • Release 1.2 should have been on June 1st • Delayed to end of June, now expected late July • Was expected to be in LCG July release • Have gLite R-GMA and VOMS as LCG upgrades • “Final” gLite release (2.0) for EGEE 1 by end of the year • Updated architecture/design/workplan documents • Code freeze October (?) • Maybe a 1.3 release (August?), time is tight gLite Status

  6. Timelines Review Release 1.2 Release 2.0 Release 2.0 Func. freeze Func. freeze Integrated 2.0 March 2006 June 2005 • Consequences • ~ 2.5 months of development left • probably only 1 or 2 releases between 1.2 and 2.0 • Focus on consolidation of 1.2 and little improvements as requested from applications • Very careful in introducing new services End of EGEE 1 TODAY Final Report ? Mid Dec. Xmas Vacation October 2005 December 2005 November 2005 gLite Status

  7. Release priorities • Driven by service challenges • Especially data management • LCG Baseline Services document • No time to change anything for EGEE 1 • EGEE PTF disbanded • Not seen as effective • Who collects requirements? • Do non-LCG VOs have influence? gLite Status

  8. Deployment

  9. gLite deployments – JRA1 • gLite “prototype” system • Used by ARDA team, biomed, some others • Very small, basically just CERN • Not properly maintained • JRA1 testing testbed • Was CERN, RAL and NIKHEF • Two sites + manpower added at Imperial • One person subtracted at CERN • Still small and under-resourced • Releases are not sufficiently tested • 928 open bugs in savannah, 84 critical • 281 “ready for test”, but no time to test! gLite Status

  10. gLite deployments - LCG • Pre-production system now being installed • ~8 sites so far – more coming • None in UK? • Currently a “pure” gLite system • Role seems to change from week to week! • Partly working but many problems • Some users allowed in soon (now?) • Production system • Various plans considered • LCG 2.6 has R-GMA and VOMS • Next steps unclear (to me at least!) gLite Status

  11. Status as of release 1.1

  12. Workload management • Broker is a development of the EDG/LCG RB • Seems to be largely backward-compatible • Main new feature is DAGMAN (composite jobs) • Push and pull job submission • No web services • Hybrid info system (CEMON + BDII) • Static configuration of WMS-CE relationships • Should change to R-GMA (?) • Condor-C replaces Globus gatekeeper on CE • Several security problems • Current performance is poor • Submissions often fail • Cryptic error messages gLite Status

  13. Data Management • First version of metadata catalogue • No command-line clients yet, MySQL only • Fireman file catalogue • Competes with new LCG File Catalogue • Various experiment-specific solutions • gLite i/o • Security model still under debate (delegation, file ownership) • Doesn’t yet work with dCache or DPM SRMs, only Castor! • FTS – developed for service challenges • Point-to-point reliable file transfer • No interaction with Fireman catalogue • No File Placement Service (FPS) yet, hence no replication! • No Data Scheduler • Interaction with WMS still under discussion gLite Status

  14. R-GMA • Should be an information system • But both LCG and gLite still use BDII • New Service Discovery API • Still discussing service types and names • LCG now making substantial use of R-GMA for monitoring, accounting etc • Lots of pressure to fix bugs! • Some stability problems, needs more testing • Not ideal to test in production, but … • Seems generally in a good state gLite Status

  15. Security • gLite VOMS server now used by LCG • Some problems with gLite installation scripts • WMS and DMS have limited support for VOMS • SRM, Condor-C and R-GMA don’t yet • Many test VOMS servers exist, but still not in production • Will probably need a long learning period to get the best use of VOMS • Not a a panacea! • Security requirements mostly still not being addressed • Most date back to the start of EDG • Many known security vulnerabilities gLite Status

  16. Outstanding Issues

  17. General • Error messages, logging and fault-tolerance • Still very poor • Proposal on common error handling by Steve Fisher • Configuration • gLite has a common config tool (python/XML) • Underlying config not unified • Still complex, fragile and error prone • Not clear if LCG will switch • May get many layers - YAIM -> XML -> m/w specific config files? • Monitoring • Getting better – but all from LCG, not in gLite • Single points of failure • Still have many, but some positive movement gLite Status

  18. Job submission rate too slow Not tested (?), but probably no change Failover (RB goes down -> jobs lost) No change so far Bulk job submission Partial support via DAGs Parameterised jobs coming Space management on WNs Not being addressed Access to output from running jobs Not yet Advance reservation Some work, but not yet available Interaction with data management (pre-staging) Discussion but nothing yet CPU speed, memory etc requirements not passed to batch system May appear in future Job distribution is poor (ERT etc) Partly addressed by new Glue schema Still no direct support in broker WMS gLite Status

  19. Need a metadata solution Much discussion, seems to be converging File catalogue performance, bulk operations Partly addressed by Fireman, LFC LFC seems to have better performance but no bulk operations Catalogue replication Oracle replication by LCG gLite working towards local catalogues Small files Not being addressed Reliable file replication Partly addressed by FTS, need FPS as well File pinning Not yet in SRMs or FTS Posix file access May be addressed by gLite i/o Security model unclear High level data management Not yet (wait for Data Scheduler in 2.0) DMS gLite Status

  20. Information systems • Not many issues! • Glue schema not ideal • Minor update just released • Maybe new major version in ~ 1 year? • Stability, scalability • Need to test in production - test systems too small gLite Status

  21. VO management, groups and roles Should come with VOMS VO policies for CEs Some tools (LCAS, LCMAPS) Needs experience ACLs on files Should come with gLite File Access Service (FAS) Not ready yet Need to check security model satisfied sites No support in SRM yet No outbound IP access Some discussion, nothing yet Secure file management Not needed for HEP, but strong need for biomed Some work, not there yet Quotas Some work on measurement Enforcement? Vulnerabilities Many known, little work New group (Linda Cornwall) Security gLite Status

  22. Summary • First gLite releases are out, but are buggy and incomplete • Next release is late, not much time to the end of EGEE 1 • Many long-standing issues not addressed • Developers tend to follow their own interests rather than user/sysadmin needs • Functionality is less than at the end of EDG! • Probably still >~ 1 year to get production quality • OK for EGEE if EGEE 2 is approved • Mismatch with LCG timescale • LHC experiments are building their own Grids • How much of gLite do they need? • Who decides requirements and priorities? gLite Status

More Related