70 likes | 251 Views
Job Priorities: status. Simone Campana (CERN). Job Priority Mechanism. Several layers involved VOMS Batch system configuration Information System Workload Management System User gets a VOMS proxy with some group/role i.e. /atlas/Role=production The VOMS proxy gets delegated UI->WMS->CE
E N D
Job Priorities: status Simone Campana (CERN)
Job Priority Mechanism Several layers involved • VOMS • Batch system configuration • Information System • Workload Management System User gets a VOMS proxy with some group/role • i.e. /atlas/Role=production The VOMS proxy gets delegated UI->WMS->CE At the site (CE), LCAS/LCMAPS translates the user FQAN into a local user depending on a configuration file • the first rule in the LCMAPS configuration file that matches the FQAN is used to do the mapping • Mapping of VOMS credentials to scheduler credentials The LRMS is configured to grant different priorities/shares to different users/groups • Assignment of scheduler share to these credentials VOViews provide a way for publicizing the relevant characteristics of a CE on a FQAN-based way • # of batch slots, # of waiting jobs, etc... published independently for different FQAN • The WMS considers the VOViews matching the VOMS FQAN for the matchmaking Simone Campana - Grid Deployment Board - 06 June 2007
CEs and VOViews (1) dn:GlueCEUniqueID=hostname:2119/jobmanager-pbs-qshort,mds-vo-name=local,o=grid GlueCEAccessControlBaseRule: VO:atlas GlueCEAccessControlBaseRule: VO:cms GlueCEAccessControlBaseRule: VOMS:/cms/Higgs GlueCEAccessControlBaseRule: VOMS:/atlas/Role=production GlueCEStateWaitingJobs: 1251 (2) dn:GlueVOViewLocalID=atlas,GlueCEUniqueID=hostname:2119/jobmanager-pbs-qshort,mds-vo-name=local,o=grid GlueCEAccessControlBaseRule: VO:atlas GlueCEStateWaitingJobs: 121 (3) dn:GlueVOViewLocalID=atlas_role_production,GlueCEUniqueID=hostname:2119/jobmanager-pbs-qshort,mds-vo-name=local,o=grid GlueCEAccessControlBaseRule: VOMS:/atlas/Role=production GlueCEStateWaitingJobs: 151 LCG-RB: understands only 1) you have a problem if the CE serves more than one VO …) gLite WMS: (2) and (3) are used depending on the user VOMS /atlas/Role=NULL matches (2) (generic ATLAS user) /atlas/Role=software matches (2) (a specific view does not exist, and still also ATLAS user) /atlas/Role=production matches both(2) and (3) (specific view and ATLAS user) Simone Campana - Grid Deployment Board - 06 June 2007
In the Information System, for ce01.ific.uv.es: (2) GlueVOViewLocalID: atlas GlueCEAccessControlBaseRule: VO:atlas GlueCEStateWaitingJobs: 0 (3) GlueVOViewLocalID: /atlas/Role=production GlueCEAccessControlBaseRule: VOMS:/atlas/Role=production GlueCEStateWaitingJobs: 1000 Result of the list-match (Rank = -other.GlueCEStateWaitingJobs;): *CEId* *Rank (3) - ce01.ific.uv.es:2119/jobmanager-pbs-short -1000 (2) - ce01.ific.uv.es:2119/jobmanager-pbs-short 0 A VOMS with /atlas/Role=production matches both (2) and (3) The WMS matches (2) but LCMAPS will map the FQAN to the ACBR VOMS:/atlas/Role=production (3) Simone Campana - Grid Deployment Board - 06 June 2007
Short Term Solution Make the VOViews mutually exclusive • on a CE not more than one VOView can match a FQAN • Introduce a DENY ruleon the Information system: GlueVOViewLocalID: atlas GlueCEAccessControlBaseRule: DENY:/atlas/Role=production GlueCEAccessControlBaseRule: VO:atlas [...] GlueVOViewLocalID: /atlas/Role=production GlueCEAccessControlBaseRule: VOMS:/atlas/Role=production Requires simple changes in the WMS Requires careful publication of the rule on in the GIP Simone Campana - Grid Deployment Board - 06 June 2007
Current status Short term solution: • WMS change done, but currently only in “experimental Services” • Need to get the WMS certified • Latest YAIM version configuring DENY tags has problems • New version being released with a quick fix. • In other words the short term solution is not there yet. Situation at the sites • All T1s deploy the FQAN VOViews • Since 6 weeks at least. • Probably too aggressive approach (especially since the mechanism is broken…) • Several T2s now deploy the FQAN VOViews • Despite ATLAS asked explicitly NOT to! • Comes automatically if you run the latest YAIM out of the box. • Several sites have lots of problems with local configuration • Mapping of right unix account with VOMS FQAN • Still not clear if requested shares are really implemented • Very hard to test Simone Campana - Grid Deployment Board - 06 June 2007
Future Plans Hopefully, all issues in previous slides will be sorted out soon. How much this scales in the long timescale? • Including new groups and roles will be quite some work • Both at the level of local farms and Information System • And this has been proved to be error prone • Changing shares between different groups/roles of the same VO does not seem trivial either • Even small configuration problems cause massive problems in experiments production frameworks. Simone Campana - Grid Deployment Board - 06 June 2007