280 likes | 453 Views
OSG Information Services, VO Monitoring Services and Resource Selection Services. OSG Information Services Architecture The VO Resource Service (VORS) The OSG Resource Selection Service (ReSS) ClassAd Matchmaking How these affect the Sites How these affect the User.
E N D
OSG Information Services, VO Monitoring Services and Resource Selection Services • OSG Information Services Architecture • The VO Resource Service (VORS) • The OSG Resource Selection Service (ReSS) • ClassAd Matchmaking • How these affect the Sites • How these affect the User Gabriele Garzoglio, Chris Green, Computing Division, Fermilab Rob Quick, Indiana University OSG User Meeting & OSG Site Administrators Meeting July 2007 Gabriele Garzoglio, Rob Quick, Chris Green
Context • The OSG Information Services have 4 goals: • Provide static and “real-time” (where real-time is still evolving) information about Resource configurations and state. • Feed OSG-wide monitoring tools and provide interfaces to this information for Grid operations, VOs and Users. • Provide information for interoperation of OSG and EGEE for LHC Experiments and WLCG operations. • Provide information for resource selection by OSG VOs and Users. Gabriele Garzoglio, Rob Quick, Chris Green
Please ask Questions During this talkWe are looking for input, feedback and guidance. Gabriele Garzoglio, Rob Quick, Chris Green
VO / Grid Interface LDIF WLCG BDII Info Coll. Info Collection Info Display VORS Info Collection Job / Res. Match ReSS BDII Info Collection Grid / Site Interface Classad Classad Info Formatting Info Publishing Site Info Publisher (CEMon) VORS Probes Generic Info Providers (GIP) Info Gathering Info Gathering LDIF LDIF Static Info (LDIF) Info Providers Config … Configuration OSG IS Architecture EGEE Resource Broker (RB) Condor Schedd Condor Matchmaker VO Job Queue Job/Res. Match Job Queue Job/Res. Match Grid Instantiate… Site Gabriele Garzoglio, Rob Quick, Chris Green
VO / Grid Interface LDIF WLCG BDII Info Coll. Info Collection Job / Res. Match ReSS BDII Info Collection Grid / Site Interface Classad Classad Info Formatting Info Publishing Site Info Publisher (CEMon) Generic Info Providers (GIP) Info Gathering LDIF LDIF Static Info (LDIF) Info Providers Config … Configuration VORS in OSG IS EGEE Resource Broker (RB) Condor Schedd Condor Matchmaker VO Job Queue Job/Res. Match Job Queue Job/Res. Match Info Collection Info Display Grid VORS Instantiate… Site VORS Probes Info Gathering Gabriele Garzoglio, Rob Quick, Chris Green
What VORS does for you… • Allows VO users to pick which sites support their VO • Provides critical site info to a VO user • Gives users a snapshot of current grid and site status • Will provide a facility for users to look at other Grids from an OSG PO Gabriele Garzoglio, Rob Quick, Chris Green
VO / Grid Interface LDIF WLCG BDII Info Coll. Info Collection Info Display VORS BDII Info Collection Grid / Site Interface Classad Classad VORS Probes Generic Info Providers (GIP) Info Gathering Info Gathering LDIF LDIF Static Info (LDIF) Info Providers Config … Configuration ReSS in OSG IS EGEE Resource Broker (RB) Condor Schedd Condor Matchmaker VO Job Queue Job/Res. Match Job Queue Job/Res. Match Info Collection Job / Res. Match Grid ReSS Info Formatting Info Publishing Site Info Publisher (CEMon) Instantiate… Site Gabriele Garzoglio, Rob Quick, Chris Green
ReSS Motivations • Implement a light-weight cluster selector for push-based job handling services • Enable users to express requirements on the resources in the job description • Enable users to refer to abstract characteristics of the resources in the job description • Provide soft-registration for clusters • Use the standard characterizations of the resources via the Glue Schema Gabriele Garzoglio, Rob Quick, Chris Green
ReSS Technology • ReSS basis its central services on the Condor Match-making service • Users of Condor-G naturally integrate their scheduler servers with ReSS • Condor information collector manages resource soft registration • Resource characteristics is handled at sites by the EGEE gLite CE Monitor Service (CEMon) • CEmon registers with the central ReSS services at startup • Info is gathered by CEMon at sites running Generic Information Prividers (GIP) • GIP expresses resource information via the Glue Schema model • CEMon converts the information from GIP into old classad format. Other supported formats: XML, LDIF, new classad • CEMon publishes information using web services interfaces Gabriele Garzoglio, Rob Quick, Chris Green
VO / Grid Interface LDIF WLCG BDII Info Coll. Info Collection Info Display VORS BDII Info Collection Grid / Site Interface Classad Classad VORS Probes Generic Info Providers (GIP) Info Gathering Info Gathering LDIF LDIF Static Info (LDIF) Info Providers Config … Configuration A case study:VO Schedd to interact with ReSS EGEE Resource Broker (RB) Condor Schedd Condor Matchmaker VO Job Queue Job/Res. Match Job Queue Job/Res. Match Info Collection Job / Res. Match Grid ReSS Info Formatting Info Publishing Site Info Publisher (CEMon) Instantiate… Site Gabriele Garzoglio, Rob Quick, Chris Green
VO / Grid Interface job job What Gate? classads Gate 3 classads classads classads Grid / Site Interface Gate2 Gate1 Gate3 CEMon CEMon CEMon jobs jobs jobs info info info CE CE CE GIP GIP GIP job-managers job-managers job-managers job-managers job-managers job-managers job-managers job-managers job-managers CLUSTER CLUSTER CLUSTER VO Condor-Schedd interacts with ReSS • Info Gatherer is the Interface Adapter between CEMon and Condor ReSS Info Gatherer Condor Match Maker Condor Scheduler VO Grid Site Gabriele Garzoglio, Rob Quick, Chris Green
User Interacts with Schedd and ReSS Abstract Resource Characteristic universe = globus globusscheduler = $$(GlueCEInfoContactString) requirements = TARGET.GlueCEAccessControlBaseRule == "VO:DZero" executable = /bin/hostname arguments = -f queue MyType = "Machine" Name = "antaeus.hpcc.ttu.edu:2119/jobmanager-lsf-dzero.-1194963282" Requirements = (CurMatches < 10) ReSSVersion = "1.0.6" TargetType = "Job" GlueSiteName = "TTU-ANTAEUS" GlueSiteUniqueID = "antaeus.hpcc.ttu.edu" GlueCEName = "dzero" GlueCEUniqueID = "antaeus.hpcc.ttu.edu:2119/jobmanager-lsf-dzero" GlueCEInfoContactString = "antaeus.hpcc.ttu.edu:2119/jobmanager-lsf" GlueCEAccessControlBaseRule = "VO:dzero" GlueCEHostingCluster = "antaeus.hpcc.ttu.edu" GlueCEInfoApplicationDir = "/mnt/lustre/antaeus/apps GlueCEInfoDataDir = "/mnt/hep/osg" GlueCEInfoDefaultSE = "sigmorgh.hpcc.ttu.edu" GlueCEInfoLRMSType = "lsf" GlueCEPolicyMaxCPUTime = 6000 GlueCEStateStatus = "Production" GlueCEStateFreeCPUs = 0 GlueCEStateRunningJobs = 0 GlueCEStateTotalJobs = 0 GlueCEStateWaitingJobs = 0 GlueClusterName = "antaeus.hpcc.ttu.edu" GlueSubClusterWNTmpDir = "/tmp" GlueHostApplicationSoftwareRunTimeEnvironment = "MountPoints,VO-cms-CMSSW_1_2_3" GlueHostMainMemoryRAMSize = 512 GlueHostNetworkAdapterInboundIP = FALSE GlueHostNetworkAdapterOutboundIP = TRUE GlueHostOperatingSystemName = "CentOS" GlueHostProcessorClockSpeed = 1000 GlueSchemaVersionMajor = 1 … Resource Requirements Job Description Resource Description Gabriele Garzoglio, Rob Quick, Chris Green
Does this sound like something you need to do ? (Users) • Does this sound reasonable to you? (Site Admins) Gabriele Garzoglio, Rob Quick, Chris Green
ReSS Deployment on OSG Click here for live URL Gabriele Garzoglio, Rob Quick, Chris Green
Status of ReSS • ReSS is a lightweight Resource Selection Service for push-based job handling systems • ReSS is deployed on OSG 0.6.0 as a general service: talk to us if you are interested! • DZero and Engagement VO use ReSS on OSG • ReSS is used by FermiGrid for campus-wide resource selection • More info at https://twiki.grid.iu.edu/twiki/bin/view/ResourceSelection/ Gabriele Garzoglio, Rob Quick, Chris Green
What Sites Need to Do • Configure GIPs correctly so show Green on GIP monitor http://gip-validate.grid.iu.edu/production/index.html • Make sure VORS reports correct info for your site http://vors.grid.iu.edu/cgi-bin/index.cgi • Make sure CEMon reports info from your site http://home.fnal.gov/~garzogli/ReSS/ReSS-prd-History.html • Ask for help from goc@opensciencegrid.org if you have any questions or problems! Gabriele Garzoglio, Rob Quick, Chris Green
What VOs and Users need to do • Understand parameters needed to select resource where your applications can run • Interface the Information services to your application • AND/OR • use one of the OSG provided resource selectors (details in hidden slides). Gabriele Garzoglio, Rob Quick, Chris Green
Conclusions • OSG Information Services exist and are used in patches but the information provided is not yet complete nor uniform. • We need the Sites to pay attention to the information content and configurations. • We support Users who want to use any or all of the tools. • OSG has a focus on Usability and Robustness over the next 12 months Gabriele Garzoglio, Rob Quick, Chris Green
Additional Slides for More Detailed Information Gabriele Garzoglio, Rob Quick, Chris Green
User Interaction with ReSS • The ReSS exposes information via condor collector interfaces • Programmatically: • via a Web Service interface • Command line, via condor_status • Examples: https://twiki.grid.iu.edu/twiki/bin/view/ResourceSelection/ReSSUserInterfaceTools • The Engagement VO gets OSG info from ReSS and does match making via a VO Match Making Service: https://twiki.grid.iu.edu/twiki/bin/view/ResourceSelection/ReSSForEngagementVO • Condor scheduler interaction with ReSS • See how to connect a scheduler directly to the OSG ReSS (à la DZero): https://twiki.grid.iu.edu/twiki/bin/view/ResourceSelection/SystemDeployment • See how FermiGrid uses ReSS for campus-wide resource selection: http://fermigrid.fnal.gov/matchmaking.html • Glue Schema Attributes definition: http://fermigrid.fnal.gov/attributes.html • FermiGrid classads: http://fermigrid.fnal.gov/classads.html Gabriele Garzoglio, Rob Quick, Chris Green
Glue Schema to old classad Mapping SubCluster1 SubCluster2 Cluster Site VO1 CE1 VO2 VO2 CE2 VO3 Mapping the Glue Schema “tree” into a set of “flat” classads: all possible combination of (Cluster, Subcluster, CE, VO) … Gabriele Garzoglio, Rob Quick, Chris Green
classad Site Cluster SubCluster1 CE1 VO1 Glue Schema to old classad Mapping SubCluster1 SubCluster2 Cluster Site VO1 CE1 VO2 VO2 CE2 VO3 Mapping the Glue Schema “tree” into a set of “flat” classads: all possible combination of (Cluster, Subcluster, CE, VO) … Gabriele Garzoglio, Rob Quick, Chris Green
classad classad Site Site Cluster Cluster SubCluster1 SubCluster2 CE1 CE1 VO1 VO1 Glue Schema to old classad Mapping SubCluster1 SubCluster2 Cluster Site VO1 CE1 VO2 VO2 CE2 VO3 Mapping the Glue Schema “tree” into a set of “flat” classads: All possible combination of (Cluster, Subcluster, CE, VO) … Gabriele Garzoglio, Rob Quick, Chris Green
classad classad Site Site Site Cluster Cluster Cluster SubCluster1 SubCluster1 SubCluster2 classad CE1 CE1 CE1 VO2 VO1 VO1 Glue Schema to old classad Mapping SubCluster1 SubCluster2 Cluster Site VO1 CE1 VO2 VO2 CE2 VO3 Mapping the Glue Schema “tree” into a set of “flat” classads: All possible combination of (Cluster, Subcluster, CE, VO) … Gabriele Garzoglio, Rob Quick, Chris Green
classad classad Site Site Site Site Site Site Cluster Cluster Cluster Cluster Cluster Cluster SubCluster1 SubCluster1 SubCluster1 SubCluster2 SubCluster2 SubCluster2 CE2 CE2 CE1 CE1 CE1 CE1 VO1 VO2 VO1 VO1 VO2 VO1 classad classad classad classad Glue Schema to old classad Mapping SubCluster1 SubCluster2 Cluster Site VO1 CE1 VO2 VO2 CE2 VO3 Mapping the Glue Schema “tree” into a set of “flat” classads: All possible combination of (Cluster, Subcluster, CE, VO) … Gabriele Garzoglio, Rob Quick, Chris Green