1 / 7

Job Priorities: status

Job Priorities: status. Simone Campana (CERN). Job Priority Mechanism. Several layers involved VOMS Batch system configuration Information System Workload Management System User gets a VOMS proxy with some group/role i.e. /atlas/Role=production The VOMS proxy gets delegated UI->WMS->CE

taline
Download Presentation

Job Priorities: status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Job Priorities: status Simone Campana (CERN)

  2. Job Priority Mechanism Several layers involved • VOMS • Batch system configuration • Information System • Workload Management System User gets a VOMS proxy with some group/role • i.e. /atlas/Role=production The VOMS proxy gets delegated UI->WMS->CE At the site (CE), LCAS/LCMAPS translates the user FQAN into a local user depending on a configuration file • the first rule in the LCMAPS configuration file that matches the FQAN is used to do the mapping • Mapping of VOMS credentials to scheduler credentials The LRMS is configured to grant different priorities/shares to different users/groups • Assignment of scheduler share to these credentials VOViews provide a way for publicizing the relevant characteristics of a CE on a FQAN-based way • # of batch slots, # of waiting jobs, etc... published independently for different FQAN • The WMS considers the VOViews matching the VOMS FQAN for the matchmaking Simone Campana - Grid Deployment Board - 06 June 2007

  3. CEs and VOViews (1) dn:GlueCEUniqueID=hostname:2119/jobmanager-pbs-qshort,mds-vo-name=local,o=grid GlueCEAccessControlBaseRule: VO:atlas GlueCEAccessControlBaseRule: VO:cms GlueCEAccessControlBaseRule: VOMS:/cms/Higgs GlueCEAccessControlBaseRule: VOMS:/atlas/Role=production GlueCEStateWaitingJobs: 1251 (2) dn:GlueVOViewLocalID=atlas,GlueCEUniqueID=hostname:2119/jobmanager-pbs-qshort,mds-vo-name=local,o=grid GlueCEAccessControlBaseRule: VO:atlas GlueCEStateWaitingJobs: 121 (3) dn:GlueVOViewLocalID=atlas_role_production,GlueCEUniqueID=hostname:2119/jobmanager-pbs-qshort,mds-vo-name=local,o=grid GlueCEAccessControlBaseRule: VOMS:/atlas/Role=production GlueCEStateWaitingJobs: 151 LCG-RB: understands only 1)  you have a problem if the CE serves more than one VO …) gLite WMS: (2) and (3) are used depending on the user VOMS /atlas/Role=NULL  matches (2) (generic ATLAS user) /atlas/Role=software  matches (2) (a specific view does not exist, and still also ATLAS user) /atlas/Role=production  matches both(2) and (3) (specific view and ATLAS user) Simone Campana - Grid Deployment Board - 06 June 2007

  4. In the Information System, for ce01.ific.uv.es: (2) GlueVOViewLocalID: atlas GlueCEAccessControlBaseRule: VO:atlas GlueCEStateWaitingJobs: 0 (3) GlueVOViewLocalID: /atlas/Role=production GlueCEAccessControlBaseRule: VOMS:/atlas/Role=production GlueCEStateWaitingJobs: 1000 Result of the list-match (Rank = -other.GlueCEStateWaitingJobs;): *CEId*                                    *Rank (3) - ce01.ific.uv.es:2119/jobmanager-pbs-short         -1000 (2) - ce01.ific.uv.es:2119/jobmanager-pbs-short         0 A VOMS with /atlas/Role=production matches both (2) and (3) The WMS matches (2) but LCMAPS will map the FQAN to the ACBR VOMS:/atlas/Role=production (3) Simone Campana - Grid Deployment Board - 06 June 2007

  5. Short Term Solution Make the VOViews mutually exclusive • on a CE not more than one VOView can match a FQAN • Introduce a DENY ruleon the Information system: GlueVOViewLocalID: atlas GlueCEAccessControlBaseRule: DENY:/atlas/Role=production GlueCEAccessControlBaseRule: VO:atlas [...] GlueVOViewLocalID: /atlas/Role=production GlueCEAccessControlBaseRule: VOMS:/atlas/Role=production Requires simple changes in the WMS Requires careful publication of the rule on in the GIP Simone Campana - Grid Deployment Board - 06 June 2007

  6. Current status Short term solution: • WMS change done, but currently only in “experimental Services” • Need to get the WMS certified • Latest YAIM version configuring DENY tags has problems • New version being released with a quick fix. • In other words the short term solution is not there yet. Situation at the sites • All T1s deploy the FQAN VOViews • Since 6 weeks at least. • Probably too aggressive approach (especially since the mechanism is broken…) • Several T2s now deploy the FQAN VOViews • Despite ATLAS asked explicitly NOT to! • Comes automatically if you run the latest YAIM out of the box. • Several sites have lots of problems with local configuration • Mapping of right unix account with VOMS FQAN • Still not clear if requested shares are really implemented • Very hard to test Simone Campana - Grid Deployment Board - 06 June 2007

  7. Future Plans Hopefully, all issues in previous slides will be sorted out soon. How much this scales in the long timescale? • Including new groups and roles will be quite some work • Both at the level of local farms and Information System • And this has been proved to be error prone • Changing shares between different groups/roles of the same VO does not seem trivial either • Even small configuration problems cause massive problems in experiments production frameworks. Simone Campana - Grid Deployment Board - 06 June 2007

More Related