170 likes | 343 Views
The Use of Condor in the gLite Grid Middleware. Erwin Laure Condor Week 14-15 March 2005. Contents. Overview on EGEE and gLite gLite and Condor Future plans. The EGEE Project. EU funded (2 years until March 2006)
E N D
The Use of Condor in the gLite Grid Middleware Erwin Laure Condor Week 14-15 March 2005
Contents • Overview on EGEE and gLite • gLite and Condor • Future plans Condor Week 2005
The EGEE Project • EU funded (2 years until March 2006) • EGEE offers the largest production grid facility in the world open to many applications (HEP, BioMedical, generic) • Existing production service based on LCG • Next generation open source web-services middleware being re-engineered taking into account production/ deployment/ management needs • Well-defined, distributed support structure to provide eInfrastructure that is available to many application domains • Middleware Activity • Re-engineer and harden Grid middleware • Provide production quality middleware Collaborations Global Grid Operations, Support and training Network infrastructure(GÉANT) www.eu-egee.org Condor Week 2005
EGEE Activities • 48 % service activities (Grid Operations, Support and Management, Network Resource Provision) • 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development) • 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation) Emphasis in EGEE is on operating a production grid and supporting the end-users Condor Week 2005
Computing Resources: Feb 2005 Country providing resources Country anticipating joining In LCG-2: • 113 sites, 30 countries • >10,000 cpu • ~5 PB storage Includes non-EGEE sites: • 9 countries • 18 sites Condor Week 2005
gLite Grid Middlewareguiding principles • Service oriented approach • Allow for multiple interoperable implementations • Lightweight (existing) services • Easily and quickly deployable • Use existing services where possible • Condor, EDG, Globus, LCG, … • Portable • Being built on Scientific Linux and Windows • Security • Sites and Applications • Performance/Scalability & Resilience/Fault Tolerance • Comparable to deployed infrastructure • Co-existence with deployed infrastructure • Co-existence with LCG-2 and OSG (US) are essential for the EGEE Grid services • Site autonomy • Reduce dependence on ‘global, central’ services • Open source license Condor Week 2005
LCG-2 (=EGEE-0) 2004 prototyping prototyping product 2005 product LCG-3 EGEE-1 From Development to Product • Fast prototyping approach • Small scale testbed (initially CERN and Wisconsin) • Single out individual components for deployment on pre-production service (originally LCG-2/EGEE0 based) • These components need to go through integration and testing • To ensure they are deployable and basically work Condor Week 2005
gLite Services and Responsible Clusters JRA3 UK Access Services Grid AccessService API CERN IT/CZ Security Services Authorization Information & Monitoring Services ApplicationMonitoring Information &Monitoring Auditing Authentication Data Services Job Management Services MetadataCatalog File & ReplicaCatalog JobProvenance PackageManager Accounting StorageElement DataManagement WorkloadManagement ComputingElement Site Proxy Condor Week 2005
gLite Services for Release 1 JRA3 UK Access Services Grid AccessService API CERN IT/CZ Security Services Authorization Information & Monitoring Services Application Monitoring Information &Monitoring Auditing Focus on key servicesRelease date is March 31st 2005 Authentication Data Services Job Management Services MetadataCatalog File & ReplicaCatalog JobProvenance PackageManager Accounting StorageElement DataManagement WorkloadManagement ComputingElement Site Proxy Condor Week 2005
Condor and gLite • Design team including representatives from Middleware providers (AliEn, Condor, EDG, Globus,…) including US partners produced middleware architecture and design. • Takes into account input and experiences from applications, operations, and related projects • DJRA1.1 – EGEE Middleware Architecture (June 2004) • https://edms.cern.ch/document/476451/ • DJRA1.2 – EGEE Middleware Design (August 2004) • https://edms.cern.ch/document/487871/ • Wisconsin is one of the sites of the development prototype • Using Condor pool as backend • Using Globus RLS • Use VDT distribution of Condor and Globus Condor Week 2005
WMS Condor-C Condor Week 2005
Gatekeeper LCASLCMAPS WSS LaunchCondor-C LaunchCondor-C The current gLite CE • Collaboration of INFN, Univ. of Chicago, Univ. of Wisconsin-Madison, and the EGEE security activity (JRA3) Submitjob CEMon Notifications Blahpd Condor-C CE Localbatchsystem LSF Condor PBS/Torque Condor Week 2005
Catalog Catalog Catalog MOM MOM MOM LocalCat LocalCat LocalCat Data Management Services • Efficient and reliable data storage, movement, and retrieval on the infrastructure • Storage Element • Reliable file storage (SRM based storage systems) • Posix-like file access (gLite I/O) • Transfer (gridFTP) • File and Replica Catalog • Resolves logical filenames (LFN) to physical location of files (URL understood by SRM) and storage elements • Hierarchical File system like view in LFN space • Single catalog or distributed catalog (under development) deployment possibilities • File Transfer and Placement Service • Reliable file transfer and transactional interactions with catalogs • Stork being evaluated • Data Scheduler • Scheduled data transfer in the same spirit as jobs are being scheduled taking into account e.g. network characteristics (collaboration with JRA4) • Under development • Metadata Catalog • Limited metadata can be attached to the File and Replica Catalog • Interface to application specific catalogs have been defined Data Scheduler VOs Catalog Catalog Catalog Site boundary VOs FPS FPS FPS FPS TransferAgent SRM GridFTP I/O Storage Element Condor Week 2005
Evolutions foreseen in 2005 Here follows a list of main topics we still need to address, details and other topics need to be worked out with operations and applications • WMS • WS Interface • Better support for bulk job submission • CE • Head node monitoring • Guard and if necessary pause/resume services running on the head node • SUDO service • Currently one Condor-C instance needed per user • Condor-C should run under a VO user and submit jobs via a sudo service to the local batch system • This is being done in collaboration with Condor and Globus Condor Week 2005
Evolutions foreseen in 2005 • Catalogs • Distributed and single deployment options • FTS/FPS • Channel management • Clarify the role of Stork • Data Scheduler • “broker” for data transfer “jobs” • R-GMA Information system • Web service version • Package Manager Second major gLite release foreseen at the end of 2005 Condor Week 2005
Summary • Contributions of Condor to gLite span whole process • Design – prototyping – product • International collaborations in Grid middleware are essential • Grid middleware cannot be developed within a single corner • Effective exchange of ideas, requirements, solutions and technologies • Coordinated development of new capabilities • Open communication channels • Early detection of differences and disagreements • Condor link to OSG is very important to gLite • A common integration and testing project is being defined • gLite process, in particular through the strong interactions between European (EGEE) and US (Condor and Globus) projects is a first step towards truly international collaborative middleware development Condor Week 2005
More information http://www.glite.org http://cern.ch/egee-jra1 Condor Week 2005