1 / 35

WLCG Service Requirements

WLCG Service Requirements. WLCG Workshop Mumbai Tim Bell CERN/IT/FIO. Agenda. LCG Memorandum of Understanding Defining what needs to be delivered Checking the plan Tracking delivery using a dashboard. What the MoU provides. A high level definition of the service

mauli
Download Presentation

WLCG Service Requirements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WLCG Service Requirements WLCG Workshop Mumbai Tim Bell CERN/IT/FIO

  2. Agenda • LCG Memorandum of Understanding • Defining what needs to be delivered • Checking the plan • Tracking delivery using a dashboard Service Checklist Tim.Bell@cern.ch

  3. What the MoU provides • A high level definition of the service • Basis for estimating Tier investments • Tier responsibilities • Overall capacity • Basic support structure • Implementation schedule • Governance • Roles • *B Service Checklist Tim.Bell@cern.ch

  4. Tier0 service levels Service Checklist Tim.Bell@cern.ch

  5. Tier1 service levels Service Checklist Tim.Bell@cern.ch

  6. The MoU is not … • An implementation bible • What grid services at which site • How to run the services • How to deploy • Magic recipe for service delivery • Application 99% = 1.5 hours down / week • Administrator 40 hours/week = 24% up Service Checklist Tim.Bell@cern.ch

  7. What is your quest ? Service Checklist Tim.Bell@cern.ch

  8. We seek the holy grail ! A stable and functional Grid Service Checklist Tim.Bell@cern.ch

  9. Define the site services • What services do we provide ? • Who is responsible ? • What level of service is required ? • What capacity of service ? • What is the support structure ? • Who pays for what ? Service Checklist Tim.Bell@cern.ch

  10. Service catalog approach • A service catalog consists • Service Class – Criticality • Calendar – Variation with time • Product – What application • Customer – Which VO • Service = • Service Class x Calendar x Product x Customer Service Checklist Tim.Bell@cern.ch

  11. Service class https://uimon.cern.ch/twiki/bin/view/LCG/ScFourServiceDefinition Service Checklist Tim.Bell@cern.ch

  12. Class notes • Downtime defines the time between the start of the problem and restoration of service at minimal capacity (i.e. basic function but capacity < 50%) • Reduced defines the time between the start of the problem and the restoration of a reduced capacity service (i.e. >50%) • Degraded defines the time between the start of the problem and the restoration of a degraded capacity service (i.e. >80%) • Availability defines the sum of the time that the service is down compared with the total time during the calendar period for the service. Site wide failures are not considered as part of the availability calculations. • None means the service is running unattended Service Checklist Tim.Bell@cern.ch

  13. Service calendar • Some services are critical only during accelerator shift • Other services are less critical outside working hours Service Checklist Tim.Bell@cern.ch

  14. Products Service Checklist Tim.Bell@cern.ch

  15. Products (cont) Service Checklist Tim.Bell@cern.ch

  16. Products notes • Provides 1st level breakdown of the grid to smaller units • Suprisingly dynamic list. New products arriving weekly. • Short codes provide basis for naming conventions Service Checklist Tim.Bell@cern.ch

  17. Service catalog • Match product with customer and service class in each calendar slot • Multiple services (e.g. production, test, site…) for single product Service Checklist Tim.Bell@cern.ch

  18. Service catalog (cont) Service Checklist Tim.Bell@cern.ch

  19. Questionnaire • Simple questions to assess readiness for production • It is not actually necessary to fill out the answers but the questions should be asked • Focus is on the infrastructure Service Checklist Tim.Bell@cern.ch

  20. Service questions • What service levels are required for each calendar period ? • Who is providing support for the application ? • Who supports the infrastructure ? • How should the support be contacted? • What support service do they provide? Service Checklist Tim.Bell@cern.ch

  21. Configuration questions • What are the application interfaces? • What server does the application run on ? • Is there a picture of the configuration? • What are the application parameters and how are they set up? Service Checklist Tim.Bell@cern.ch

  22. Facilities questions ? Service Checklist Tim.Bell@cern.ch

  23. Facilities questions • Are all systems in a machine room ? • Is the room access controlled ? • Is there good power provision ? • UPS ? Batteries ? • What is the response time for facilities problems ? Service Checklist Tim.Bell@cern.ch

  24. Hardware questions • What kind of machine is required • CPU, RAM, Disk • Do we need redundancy ? • Power Supply, Disk, …. • Do maintenance contracts match the service ? Currently, there are no capacity guides for each application. These are required to avoid purchase of inappropriate machines Service Checklist Tim.Bell@cern.ch

  25. Sample RB disk calculation Service Checklist Tim.Bell@cern.ch

  26. Network questions • What network capacity • OPN connectivity ? • Bandwidth ? • Firewall ports ? Currently, there is no connectivity guide for each application. This is required for secure set up and appropriate network configuration. Service Checklist Tim.Bell@cern.ch

  27. Sample CE ports sheet Service Checklist Tim.Bell@cern.ch

  28. Database questions • What is your sites preferred database ? • What are the options for each application ? • Expected database size / growth ? • High Availability options ? Service Checklist Tim.Bell@cern.ch

  29. Backup / Restore questions • What needs to be backed up for each service ? • How do we ensure consistency in the event of a restore ? e.g. RB / CE. • Software corruption risk different by application ? e.g. LFC/SE vs Proxy • Has a restore test been done ? There is currently no list of critical state data for each application or steps to be executed after a restore Service Checklist Tim.Bell@cern.ch

  30. Operations questions • How are problems identified ? • Local console ? • Grid Monitoring ? • Who should be contacted to resolve the problem ? • Who should be informed of the problem ? • What new procedures / operations guides are required ? • What is the local coverage for nights / weekends ? • How does local and Grid operations interwork ? Service Checklist Tim.Bell@cern.ch

  31. Validation • Check that the service class matches the answers • A critical service cannot have the server in an office • Check the dependencies that no critical services depend on non-critical services • FTS, critical, requires MyProxy therefore MyProxy Service must be critical Service Checklist Tim.Bell@cern.ch

  32. Implementation Tracking at CERN • A dashboard approach on the Wiki Service Checklist Tim.Bell@cern.ch

  33. Common Themes • But it’s all green ? What’s the problem ? • Green does not mean no problems. We are often generous with assessments since red/yellow everywhere does not highlight issues. • Operations • No operations or problem determination guides. Limited administration guides. • Support call-tree unclear • Backup/Restore details are missing • Hardware • Limited or no capacity planning information leads to incorrect server sizing • ‘Forgot a box’ problems e.g. one per-VO not one per site • Development • Difficult to match the user expectations (e.g. a critical service) with implementation (e.g. stateful) Service Checklist Tim.Bell@cern.ch

  34. Summary • Complete a service catalog for your sites • Check the questions and prepare an action plan to address items under your control • Assess the status by service and concentrate on getting the reds to yellows Service Checklist Tim.Bell@cern.ch

  35. More Information • LCG MoU • http://lcg.web.cern.ch/lcg/C-RRB/MoU/WLCGMoU.pdf • SC4 Service Definitions for CERN • https://uimon.cern.ch/twiki/bin/view/LCG/ScFourServiceDefinition • SC4 CERN Dashboard • https://uimon.cern.ch/twiki/bin/view/LCG/WlcgScDash Service Checklist Tim.Bell@cern.ch

More Related