1 / 27

EDG Retreat, tutorials and Budapest meeting

EDG Retreat, tutorials and Budapest meeting. Steve Fisher / RAL. No details. Much of the obvious material has already been mentioned or will be in the testbed talk Most of my material is stolen I will try to fill in the gaps. Project Retreat.

lucio
Download Presentation

EDG Retreat, tutorials and Budapest meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EDG Retreat,tutorials and Budapest meeting Steve Fisher / RAL

  2. No details • Much of the obvious material has already been mentioned or will be in the testbed talk • Most of my material is stolen • I will try to fill in the gaps

  3. Project Retreat • Project retreat held 27 & 28 August at Chevannes • ~45 participants • work package managers, architecture group, quality group, applications groups, M-ware experts, representatives from LCG, DataTAG, Globus & Condor • Agenda and material on the web: • http://documents.cern.ch/age?a021130 • 3 sessions addressing most important aspects of projects current work: • Software Release Process • Release 1.2 • Testbed 2

  4. DAY1 Tutorial introduction Introduction to Grid computing and overview of the DataGrid project Security Testbed overview Job Submission lunch hands-on exercises: job submission EDG Tutorial The tutorials are aimed at users wishing to "gridify" their applications using EDG software and are organized over 2 full consecutive days DAY2 • Data Management • LCFG, fabric mgmt & sw distribution & installation • Applications and Use cases • Future Directions lunch • hands-on exercises: data mgmt

  5. Tutorial rehearsal • Rehearsal at CERN, 29 & 30 August • 19 participants to check material & approach • Lessons learnt • Can’t cover as much material as we hoped • Explain why not just how • Avoid details – can read them from references afterwards • Need as many helpers as possible for hands-on exercises • Participants have difficulties with certificate management • Generated a lot of enthusiasm in the participants and EDG people doing the hands-on • Found genuine bugs during hands-on exercises • Recommend M-ware WPs send developers to help with hands-on exercises • New project people should follow the tutorial

  6. Tutorial Schedule • CERN school of Computing, Naples, 23-27 September • 80 participants. Hands-on exercises only (presentations by Carl Kesselman & Ian Foster) • CERN, October 3 & 4 • NeSC, Edinburgh, December • Maximum 30 participants (more for the presentations) • Could then accommodate more sites • Sites must provide support and handle logistics • Organisers/helpers must attend tutorial at another site first • The tutorial does represent some load on the testbed • For the future • The material must be kept up to date with each public release of the software

  7. Budapest • 5th EDG Project Conference PILISCSABA • Social event (folklore) - Sun • Cruise and dinner - IBM

  8. Budapest • Monday • General status of the project by Fabrizio Gagliardi • Technical status of the project by Bob Jones • WP meetings • Tuesday • WP meetings and ATF • Wednesday • Dissemination Session • Reports of WP1-5 • Thursday, 5 September • Reports of WP6-10 and Security • Report on GLUE • Report from Globus

  9. Application Status • WP8: High Energy Physics • LHC experiments doing tests now • ATLAS task force • WP9: Earth Observation • Installation of EDG 1.2 at ESA done • Testing to start in September • WP10: Biology • Initial tests made with EDG 1.2 • Overall comments: • General confusion about how best to use data mgmt tools • Software not yet stable enough and insufficient diagnostics information available • Too difficult to configure • Concern that EDG 1.2 in its current configuration will not scale easily to ~40 sites

  10. The Problem

  11. WP8-10 - General • Deployed Software must be supported • Acceptance criteria are not in place for most WPs • Need real tests - real apps, long jobs, "random" behaviour of users, and > 50 users • Delays with 3rd party patches are a problem • so we have to invent hacks. • Release procedures, though formally in place, were ignored. • Interfaces are too low level for the user • They want efforts in reliability • need defensive programming • need good diagnostics • avoid single points of failure. • Documentation needs revision

  12. Site Management • Sys-admin needs defined tests and procedures. • Installation • Some lcfg objects had never been tested - syntax errors! • Need manual checks and on each node • many interactions/iterations with many people • Running • No test procedures to locate faulty services • Tracing a problem is hard - log files in odd places with odd formats • Error messages useless • ITeam mailing list is too busy • Need to find a more constructive way of solving problems. • Need to make more use of Bugzilla • Need to be able to cover vacations and conferences

  13. User Support • Since June only 20 questions asked • CRLs and Cas • request for accounts • commands failing due to firewalls • technical questions about installation and configuration • Q. why not use an existing solution for user support desk? • Members of the support team must be experts – • Cannot afford to provide dedicated people from the WPs. • Today's reality is that the ITeam list is the only way

  14. The Solution: “Quality”

  15. Existing Software Process • Over-simplification of the current situation: • Mware groups develop software in isolation • ITeam assembles it as best it can • Site managers are asked to install it • Application groups are asked to test it • Problems: • No place for the Mware groups to integrate software before delivering it to the ITeam • Inadequate software testing – leads to installation/configuration/execution faults • Running blind – no way to control or reliably plan software delivery

  16. Autobuild etc. • A release manager will be nominated with overall responsibility for ensuring the procedure is followed • Make autobuild tools the basis of the daily work of the M-ware groups and ITeam • Nightly build from CVS repository for all software • Problems must be fixed ASAP – checked by Quality Group reps • M-ware groups give ITeam CVS tags instead of RPMs • Tagged software must be documented • M-ware group must perform and supply unit tests • Integrated with nightly build • Tagged software that fails the integration, testing or is inadequately documented will be rejected • M-ware group is responsible for fixing it

  17. Quality Group • Recently formed Quality Group, convened by Gabriel Zaquine, is responsible for ensuring quality issues are addressed within the WPs • Ensure unit test plans are complete and followed • Follow-up on problems reported via bugzilla & in nightly builds • Organise running of appropriate code checking tools • Agree on adopted project developer-guidelines etc. • http://eu-datagrid.web.cern.ch/eu-datagrid/QAG/default.htm

  18. Testing • Strengthen the Testing Group • Identify leader and a small number of full-time testers • Assemble and maintain test suite integrated with autobuild tools • Automate installation and configuration of software releases • be able to auto install & configure a release on a pre-defined small example site • Needs improvements by M-ware WPs to simplify and complete installation & configuration of their software

  19. Technical Management • Architecture group documenting testbed 2 architecture • http://doc.cern.ch/archive/electronic/other/agenda/a021130/a021130s4t1/TB2Arch_v0_1.doc • Project Tech. Board addresses deliverables and relationships with other projects • Meets once per quarter • Need more frequent technical management forum • Authority to make technical & architectural decisions affecting sw development in WPs • This will be done by a refocused weekly WP managers’ meeting.

  20. Testbed Support • Strengthen user support group • People involved need sufficient knowledge of the software • Emphasis on the usefulness of the responses provided • Tools used for support are a secondary issue • Federate with equivalent groups from other projects • Clarify & document procedures • Site Installation (site managers & ITeam) • Steps for system manager and requirements for a site to join the testbed

  21. Releases

  22. Autobuild Procedure On RH6.2 • All but ~5/30 packages build and are packaged without errors. On RH7.2 • Around 10/30 packages fail the “make install” step. • All fail “make rpm” because of rpm command change. Warning: • Won’t integrate packages unless autobuild procedure works.

  23. Globus • Globus 2.1.2 • Has fix from CONDOR of GASS Cache problem. • Lot of work to apply to beta-21. • Includes many additional changes job manager. • WP1 logging patches no longer work. • Whole LRMS backend changes to perl framework. • Globus 2.2 • Exists, but… • Some question about what is happening with the MDS 2.2 in this. • Will make Globus Release-24c and test.

  24. Releases 1.2 • EDG 1.2 series NOT suitable for widespread deployment. • EDG 1.2.0 • Available now, known limitations. • EDG 1.2.1 • Deploying now: long jobs, low submission rate. • Deploy multiple resource brokers to reduce problem. • EDG 1.2.2 • GDMP replication fix. • Quick upgrade for sites at 1.2.1. • EDG 1.2.x • Other critical fixes, but very high threshold now.

  25. Releases > 1.2 • EDG 1.3 series will be widely deployed. • EDG 1.3.0 • Upgrade Globus—will contain GASS cache patch from Condor. • Hopefully will also have MDS 2.2. • Subject to testing, will be deployed on application testbed. • EDG 1.3.x • Clean up LCFG objects/configuration. • Modify setup to support new developer guidelines. • EDG 1.4 series will begin incremental inclusion of new middleware functionality • EDG 1.4.x

  26. Some possible increments • New LCFG - WP4 • GridFTP server access to MSS - WP5 • Giggle & Reptor – WP2 • LCAS with dynamic plug-in modules – WP4 • NetworkCost Function – WP7 • Integrate mapcentre (nordugrid?) and R-GMA – WP3 • GLUE modified info providers/consumers – WP1,4,5 • Res. Broker – WP1 • LCFG for RH 7.2 – WP4 • Integration with Condor as batch system – WP4

  27. Documentation • Release Notes: • Exist for 1.2.0 (will be updated for 1.2.1). • User’s Guide: • Exists, but should be considered draft. • please use Bugzilla for comments • Installation Guide: • Won’t be rewritten until Globus upgrade. • Tutorial materials also available.

More Related