1 / 15

Resource Selection in OSG & SAM-On-The-Fly

Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week 2006 April 25, 2006. Resource Selection in OSG & SAM-On-The-Fly. Resource Selection in OSG. Overview Why Resource Selection Service? Resource Selection Service in OSG Collaborators Involved

cwen
Download Presentation

Resource Selection in OSG & SAM-On-The-Fly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week 2006 April 25, 2006 Resource Selection in OSG &SAM-On-The-Fly Parag Mhashilkar, Fermilab

  2. Resource Selection in OSG • Overview • Why Resource Selection Service? • Resource Selection Service in OSG • Collaborators Involved • Resource Selection Service Architecture • Current Status • Future Work Parag Mhashilkar, Fermilab

  3. Job Resource Selection Service 1 2 3 ……...N Resources Why Resource Selection Service? • A job can • Have special requirements. Example: disk > 1GB, memory > 256MB • Resources can • Provide special services. Example: disk > 5GB, memory > 512 MB, Software toolkit-X installed, etc. • Without a resource selection service • User has to keep track of availability of every resource that can run the job. • Resource selection service can • Gather the information about the job and resources and make decision where the job should run. • Dereference abstract attributes to bind to the job during match-making or execution time. Parag Mhashilkar, Fermilab

  4. Resource Selection Service (ReSS) in OSG • The Resource Selector is a component of the OSG Job Management Infrastructure. • Sponsored by PPDG, the project started in Sep 2005, with an aim to develop and deploy a Resource Selection Service that VOs with requirements on job management similar to DZero can use. Requirements that ReSS should support – • community of 100 users, submitting jobs to 10 job schedulers. • 10,000 jobs per day, with bursts of 2,000 per hour. • 100 clusters • job and resource descriptions in classad format with 200 attributes and 5Kb of information. • With ReSS • Emphasis is on supporting several Virtual Organizations (VO) based on policies. • VOs can tag resources which are certified to run their jobs making resource selection more manageable. Parag Mhashilkar, Fermilab

  5. Collaborators Involved • VOs • DZero • Atlas • LIGO • FermiGrid • Fermilab • OSG TG-MIG group • CEMon group from INFN • Condor group from UW Madison • GLUE group from INFN Parag Mhashilkar, Fermilab

  6. job What Gate? Info Gatherer classads Condor Match Maker Condor Scheduler Gate 3 job classads classads classads Gate1 CEMon Gate2 CEMon Gate3 CEMon jobs info jobs info jobs info CE GIP CE GIP CE GIP job-managers job-managers job-managers job-managers job-managers job-managers job-managers job-managers job-managers CLUSTER CLUSTER CLUSTER Resource Selection Service Architecture Parag Mhashilkar, Fermilab

  7. Architecture … • Generic Information Provider (GIP) describes resources in LDIF format using GLUE Schema. • CEMon provides flexible plug-in mechanism to translate classads. • Information Gatherer (IG) • Subscribes to several CEMons to gather the information about the CEs and advertises it to several condor pools. • It acts as an adapter between CEMons and Condor matchmaker. • Support for callouts to external match-making functions. These functions can make match-making more extensible. Parag Mhashilkar, Fermilab

  8. Current Status • First release of the ReSS is scheduled to be included in OSG ITB-0.5.0 • Focus on testing functionality, scalability and stress test of Information Gatherer. • Validate Classads from different sites so they can be used for common resource selection criteria. • Study the scalability and investigate how IG handles O(10) CEMon registrations and O(100) classad processing and transferring to the condor_collector. • Stress test study of the IG. Simulate the load of the production environment by increasing 10 times the frequency of classad publication by the O(10) CEMon's. • Stress test the match making infrastructure submitting O(1) job/sec for 1 hour. In particular, and push the limits ….. • Evaluate the efficiency of the condor_negotiator using call-out to external code for match-making. Parag Mhashilkar, Fermilab

  9. Future Work • Working on deployment procedures for OSG production in context of VDT. • Work with other VOs with requirements similar to mentioned earlier and extend the support of ReSS for other VOs. • Improve the scalability of ReSS beyond the RunII experiments. • Have end-to-end Samgrid-OSG integration by OSG 0.6.0 Parag Mhashilkar, Fermilab

  10. Sam-on-the-fly • Overview • What is SAM? • Why sam-on-the-fly? • Addressing the Challenges • Current Status Parag Mhashilkar, Fermilab

  11. What is SAM? • Samgrid consists of • Job Management (JIM) • Data Management (SAM) • SAM stands for ‘Sequential Access via Metadata’ (SAM). • The project was started in 1997 by DZero • SAM is organized around the concepts of a dataset (Catalog of file metadata). • Experiments: • DZero, CDF, MINOS Parag Mhashilkar, Fermilab

  12. Why Sam-on-the-fly? • Sites have resources that are available for longer duration. For example cluster at UW has 1TB disk for DZero users for next 2 months. • SAM-on-the-fly tries to address the issue of making the resources available for the users dynamically. • Before DZero users can use this resource, there is a need to • Deploy and configure SAM services like • Station (collection of resources controlled by SAM system) • Stager (service to handle staging of files on disk used by SAM) • FSS (service to interface with the FS) • File transferring services like gridftp, sam_fcp, etc. • Register SAM services with central SAM DB • Start and Stop SAM station services. • Do the cleanup when the lease period expires. • Firewall and security configurations. Parag Mhashilkar, Fermilab

  13. Addressing the Challenges Deploy and configure SAM Register SAM services with the SAM system Start SAM services for the duration of lease When the lease expires, stop SAM Do the cleanup Job Resource Parag Mhashilkar, Fermilab

  14. Current Status • Automated the product deployment steps. • Semi-Automated the SAM services registration steps. • Automated starting and stopping of SAM services. • This project is a work in progress. • People: • Fermi National Accelerator Laboratory • University of Wisconsin Madison: Alain Roy and Hidayat Teonadi. Parag Mhashilkar, Fermilab

  15. References • Resource Selection Service for OSG • http://www.opensciencegrid.org • http://osg.ivdgl.org/twiki/bin/view/ResourceSelection/WebHome • SAM • http://projects.fnal.gov/samgrid Thanks to Miron and Condor Group for all the support! Questions? Parag Mhashilkar, Fermilab

More Related