120 likes | 245 Views
Use of Condor on the Open Science Grid. Chris Green, OSG User Group / FNAL Condor Week, April 30 2008. Links OSG home page . VORS resource map and information. VDT (Virtual Data Toolkit) home page. Current use of OSG. What is OSG?.
E N D
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April 30 2008
Links OSG home page. VORS resource map and information. VDT (Virtual Data Toolkit) home page. Current use of OSG. What is OSG? • Collection of mostly US-based scientific / academic sites sharing computing and storage resources via common software stack. • Job submission and management based around Globus / CondorG. • "Virtual Organizations" (VOs): trust point for authorization; role-based personalities. • Works with multiple underlying batch systems (Condor, PBS family, LSF, SGE). Chris Green OSG User Group / FNAL
OSG facts and figures • 83 registered computing resources. • 30 registered VOs. • Usage breakdown for 2008/04/19 – 2008/04/25: Chris Green OSG User Group / FNAL
Survey of Condor useon OSG • Out of the box: • CondorG for inter-site job transfer via Globus/GRAM: GT2 submissions via CondorG still (by far) the most common method of grid job submission on OSG. • Task scheduling for site health monitoring. • One of several batch systems supported on OSG. • "ManagedFork" job management. Chris Green OSG User Group / FNAL
Survey of Condor useon OSG • External projects • Glidein / WMS: "pilot" job submission and management. • FermiGrid: job forwarding, "campus grid" management. • OSGMM / ReSS: job forwarding and attribute-based matchmaking across multiple OSG sites. • "condorview:" enhanced job monitoring and control – not the web-based statistics client of the same name. • Complex workflows (egLIGO: Pegasus/DAGMAN). • Gratia: accounting system leverages features of condor where available: condor_history, PER_JOB_HISTORY_DIR, DN. Chris Green OSG User Group / FNAL
More detail: Glidein/WMS • Workload Management System (Igor Sfiligoi, FNAL) uses Condor Glideins -- startd submitted as a grid job ("pilot") makes remote batch nodes look like local ones. • Two main components: • One or more glidein factories: manage available grid sites and submit pilot jobs. • One or more VO frontends: receive payload submissions from users for distribution to sites. • Pilots receive user payloads as distributed by VO frontends. Chris Green OSG User Group / FNAL
More detail: Glidein/WMS Chris Green OSG User Group / FNAL
More detail: Glidein/WMS • Uses GCB for firewall / NAT management . • Intra-VO priority management. • Works with glExec: application running on worker nodes which handles authorization and UID mapping for payloads – per user accountability to the site. • Unaffected by grid site batch manager choice. • V1.0 released Dec.'07; v1.1 Jan'08. • In use by: CDF; Minos (FNAL); being commissioned for CMS. Chris Green OSG User Group / FNAL
More detail: "condorview" • Michael Thomas, Caltech. • Graphical tool for browsing and managing a condor queue. • Hooks to vacate and kill jobs. • Hooks to ssh into job directory on worker node and print out process tree. • Uses condor_q, condor_config_val, and condor_fetchlog. Chris Green OSG User Group / FNAL
More detail: condorview Chris Green OSG User Group / FNAL
More detail: condorview Chris Green OSG User Group / FNAL
Concluding statements • Condor essential to the OSG. • Condor use underpins connectivity of sites within the OSG. • Close ties: Miron is OSG PI; VDT team at Wisconsin; new Condor features often a result of OSG needs. • Widely used on OSG; many novel uses of and applications building on Condor features. • More details in later talks! Chris Green OSG User Group / FNAL