120 likes | 234 Views
Clusters, SubClusters and Queues. A Spotters Guide Chris Brew HepSysMan 06/11/2008. Current Default Setup. YAIM Sets up by Default One Cluster (Batch System) One SubCluster (Set of WNs) Multiple CEs (queues) pointing to the subcluster Falls down with Non Identical Worker Nodes
E N D
Clusters, SubClusters and Queues A Spotters Guide Chris Brew HepSysMan 06/11/2008
Current Default Setup • YAIM Sets up by Default • One Cluster (Batch System) • One SubCluster (Set of WNs) • Multiple CEs (queues) pointing to the subcluster • Falls down with • Non Identical Worker Nodes • Multiple CENodes attached to the same batch system
The way it’s supposed to be Type 1 WN Type 1 WN CE Node Cluster Type 1 WN Tags Type 1 WN Q1 SubClus Type 2 WN Q2 Type 2 WN Type 2 WN SubClus Type 2 WN Q3 Q4 SubClus Type 3 WN Q5 Type 3 WN Type 3 WN Type 3 WN
The way it usually is Type 1 WN Type 1 WN CE Node Cluster Type 1 WN Type 1 WN Q1 Tags Type 21 WN Q2 Type 21 WN Type 21 WN Q3 Type 21 WN SubClus Q4 Type 31 WN Q5 Type 31 WN Type 31 WN Type 31 WN
How bad it can be CE Node Tags Cluster Q1 Type 1 WN Q2 Type 1 WN Type 1 WN Q3 Type 1 WN SubClus Q4 Type 21 WN Type 21 WN Type 21 WN CE Node Type 21 WN Cluster Q1 Tags Q2 Type 31 WN Type 31 WN Q3 SubClus Type 31 WN Q4 Type 31 WN
Problem on Non Identical Worker Nodes • Default setup assumes that all worker nodes are identical • Obviously no the case at most sites • Subcluster has to publish the lowest spec WN • Leads to: • Small memory jobs wasting large memory nodes • Inability to publish existence of large memory nodes • Differing CPU specs lead to inaccurate timing and accounting (CPU scaling helps here)
Problem of multiple CENodes • Sites want to add multiple CENodes for Scaling and Redundancy • Should just add CEs (queue endpoints) • Currently duplicates Clusters and SubClusters • Causes problems in CPU counting (gStat, GridMap, Accounting Reports, etc.) • Various hacks to try to help with this
Current Hacks • Can already set up multiple Clusters, SubClusters to advertise different memory queues • See publishing for RAL-LCG2 and UKI-SOUTHGRID-RALPP • Involves hand crafted ldif files to set up (Sub)Clusters and map queues to them • Cannot let YAIM near them
Traylen Proposal • Move (Sub)Cluster publishing from CENode to new node type • Probably share node with site-bdii • CENode gip will associate queues to SubClusters • Software Tags currently associated with CENode not (Sub)Cluster, they’ll be fixed and published through the new node type.
How it may be CE Node Q2 Q1 Q3 Q5 Q4 Type 1 WN Type 1 WN Type 1 WN Glite-Cluster Node Tags Tags Tags Type 1 WN Cluster CE Node Q2 Q1 Type 2 WN SubClus Q3 Type 2 WN SubClus Type 2 WN Q5 Q4 Type 2 WN SubClus Type 3 WN Type 3 WN CE Node Q2 Q1 Type 3 WN Q3 Type 3 WN Q5 Q4
Our Experience • We’ve put in hand crafted ldif files to define 500MB, 1000MB and 2000MB SubClusters • grid[500|1000|2000] queues pointing at them on both CENodes • Technically it works – jobs with higher memory requirements only match the high memory queues • In practice it makes no difference – almost no jobs include memory requirements
Conclusion • You’re probably not doing it right at the moment • But the fix is probably worse • You can add hacks to provide more info to the batch system • But it probably won’t make any difference • Things are likely to change (for the better) in the near future • Wait until then