450 likes | 543 Views
Grid developments and middleware components Mike Mineter EGEE Training team mjm@nesc.ac.uk. http://egee-intranet.web.cern.ch. Acknowledgements.
E N D
Grid developments and middleware componentsMike MineterEGEE Training teammjm@nesc.ac.uk http://egee-intranet.web.cern.ch
Acknowledgements This presentation for the GGF Summer School, 2004 was prepared by the NeSC Edinburgh training team. It includes slides and information from many sources: • Roberto Barbera (Slides on middleware are based on presentations given in Edinburgh, April 2004) • Malcolm Atkinson and Ian Bird (Sites in LCG-2/EGEE-0 at GGF-11) • Other colleagues in EGEE (project overview slides) • The European DataGrid training team • Authors of the LCG-2 User Guide v. 2.0 : Antonio Delgado Peris, Patricia Méndez Lorenzo, Flavia Donno, Andrea Sciabà, Simone Campana, Roberto Santinelli https://edms.cern.ch/file/454439//LCG-2-UserGuide.html Grid developments and middleware components, 19 July 04 - 2
Outline • Grid developments from an EGEE perspective: • Creating e-Infrastructure • Building on and with other Grid projects • Towards service-orientation • Establishing a “production Grid” • Overview of the middleware of the current EGEE-0 system • Major components • Lifecycle of a job • Summary Grid developments and middleware components, 19 July 04 - 3
To underpin European science and technology in the service of society To link with and build on National, regional and international initiatives Emerging technologies (e.g. fibre optic networks) To foster international cooperation both in the creation and the use of the e-infrastructure Collaboration Pan-European Grid Operations, Support and training Network infrastructure(GÉANT) Towards a European e-Infrastructure Grid developments and middleware components, 19 July 04 - 4
In 2 years EGEE will: • Establish production quality sustained Grid services • 3000 users from at least 5 disciplines • over 20,000 CPU's, 50 sites • over 5 Petabytes (1015) storage • Demonstrate a viable general process to bring other scientific communities on board • Spend 32 Million Euros - started April 2004 • 70 institutions in 27 countries • Propose a second phase in mid 2005 to take over EGEE in early 2006 Initial New Grid developments and middleware components, 19 July 04 - 5
EGEE will: • Establish production quality sustained Grid services • 3000 users from at least 5 disciplines • over 20,000 CPU's • Demonstrate a viable general process to bring other scientific communities on board • Spend 32 Million Euros over 2 years starting April 2004 • 70 institutions in 28 countries • Propose a second phase in mid 2005 to take over EGEE in early 2006 (re)engineering, operations, outreach Initial domains Grid developments and middleware components, 19 July 04 - 6
Outline • Grid developments from an EGEE perspective: • Creating e-Infrastructure • Building on and with other Grid projects • Towards service-orientation • Establishing a “production Grid” • Overview of the middleware of the current EGEE-0 system • Major components • Lifecycle of a job • Summary Grid developments and middleware components, 19 July 04 - 8
Condor Globus MyProxy ... EDG . . . VDT LCG GridCC NextGrid EGEE … Future e-Infrastructure EGEE view of history 2001 … DataTAG CrossGrid AliEn ... SRM 2004 USA EU Used in Grid developments and middleware components, 19 July 04 - 9
Outline • Grid developments from an EGEE perspective: • Creating e-Infrastructure • Building on and with other Grid projects • Towards service-orientation • Establishing a “production Grid” • Overview of the middleware of the current EGEE-0 system • Major components • Lifecycle of a job • Summary Grid developments and middleware components, 19 July 04 - 10
Service orientation: building EGEE-1 • “gLite” - the new EGEE middleware (under test) • Service oriented - components that are : • Loosely coupled (by messages – examples tomorrow) • Accessible across network; modular and self-contained; clean modes of failure • So can change implementation without changing interfaces • Can be developed in anticipation of new uses • … and are based on standards. Opens EGEE to: • New middleware (plethora of tools now available) • Heterogeneous resources (storage, computation…) • Interact with other Grids (international, regional and national) Grid developments and middleware components, 19 July 04 - 11
Outline • Grid developments from an EGEE perspective: • Creating e-Infrastructure • Building on and with other Grid projects • Towards service-orientation • Establishing a “production Grid” • Overview of the middleware of the current EGEE-0 system • Major components • Lifecycle of a job • Summary Grid developments and middleware components, 19 July 04 - 12
LCG-1 LCG-2 EGEE-1 EGEE-2 Globus 2 based Web services based LCG and EGEE • LCG: Large Hadron Collider Computing Grid • LCG infrastructure running LCG-2 is “EGEE-0” • In parallel producing new web-service-oriented middleware (“gLite”) • Will replace LCG-2 as production facility in 2005 • New major releases each year Grid developments and middleware components, 19 July 04 - 13
Sites in LCG-2/EGEE-0 : June 4 2004 http://goc.grid-support.ac.uk/gppmonWorld/gppmon_maps/CERN_lxn1188.html • 22 Countries • 58 Sites (45 Europe, 2 US, 5 Canada, 5 Asia, 1 HP) • Coming: New Zealand, China, • other HP (Brazil, Singapore) • 3800 cpu Grid developments and middleware components, 19 July 04 - 14
Operations Infrastructure • A lot more than middleware!! • 40% of EGEE budget • 10 ROCs: coordinate deployment & operation, tasks include: • First point of contact for all new sites, new users, and user support; Issue certificates • Negotiate policies with resource providers • 5 CICs: tasks include provision of • VO services, core Grid services (RBs, UIs, database services, BDIIs) Grid developments and middleware components, 19 July 04 - 15
EGEE: adding a VO EGEE has a formal procedure for adding selected new user communities: • Negotiation with one of the Regional Operations Centres • Seek balance between the resources contributed by a VO and those that they consume. • Resource allocation will be made at the VO level. • Many resources need to be available to multiple VOs : shared use of resources is fundamental to a Grid Grid developments and middleware components, 19 July 04 - 16
Story so far: themes illustrated by EGEE • e-Infrastructure • Integrating networks, grids and emerging technologies • Based on standards • Underpinning research, industry, … the “knowledge economy” • International, collaborative effort • Moving to a Service Orientated Architecture • Focus: Production grids for multiple VOs • Demands massive effort in organisation and administration: • Operations • Support • Training Grid developments and middleware components, 19 July 04 - 17
Outline • Grid developments from an EGEE perspective: • Creating e-Infrastructure • Building on and with other Grid projects • Towards service-orientation • Establishing a “production Grid” • Overview of the middleware of the current EGEE-0 system • Major components • Lifecycle of a job • Summary Grid developments and middleware components, 19 July 04 - 22
User-view of EGEE: a multi-VO Grid User Interface User Interface Grid services Grid developments and middleware components, 19 July 04 - 23
Input “sandbox” DataSets info Output “sandbox” SE & CE info Job Submit Event Job Query Publish Job Status Storage Element Middleware components Replica Catalogue “User interface” Information Service Resource Broker Author. &Authen. Input “sandbox” + Broker Info Output “sandbox” Logging & Book-keeping Computing Element Job Status Grid developments and middleware components, 19 July 04 - 24
Workload Management System (WMS) • Distributed scheduling • multiple UI’s where you submit your job • multiple RB’s from where the job is sent to a CE • multiple CE’s where the job can be put in a queuing system • Distributed resource management • multiple information systems that monitor the state of the grid • Information from SE, CE, sites Grid developments and middleware components, 19 July 04 - 25
Authentication User obtains certificate from CA Connects to UI by ssh Downloads certificate Invokes Proxy server Single logon – to UI - then Secure Socket Layer with proxy identifies user to other nodes UI Authentication, Authorisation CA Personal VO mgr VO service • Authorisation - currently • User joins Virtual Organisation • VO negotiates access to Grid nodes and resources (CE, SE) • Authorisation tested by CE, SE: gridmapfile maps user to local account VO database SSL (proxy) Gridmapfiles On CE, SE nodes Grid developments and middleware components, 19 July 04 - 26
The user’s interface to the Grid Command-line interface to Proxy server Job operations To submit a job Monitor its status Retrieve output Data operations Upload file to SE Create replica Discover replicas Other grid services Also C++ and Java APIs To run a job user creates a JDL (Job Description Language) file UI JDL User Interface node Grid developments and middleware components, 19 July 04 - 27
A CE is a grid batch queuewith a “grid gate” front-end: “Compute element” in LCG-2 Job request I.S. Logging Logging Info system Globus gatekeeper gridmapfile Grid gate node Local resource management system:Condor / PBS / LSF master Homogeneous set of worker nodes Grid developments and middleware components, 19 July 04 - 28
gridmapfile Storage elements and files • Storage elements hold files: write once, read many • Replica files can be held on different SE: • “close” to CE; share load on SE • Replica Catalogue - what replicas exist for a file? • Replica Location Service - where are they? File transfer Requests Logging GridFTP EventLogging Globus gatekeeper Info system Local Info Disk arrays or tapes Grid developments and middleware components, 19 July 04 - 29
Resource Broker nodes • Run the Workload Management System • To accept job submissions • Dispatch jobs to appropriate Compute Element (CE) • Allow users • To get information about their status • To retrieve their output • A configuration file on each UI node determines which RB node(s) will be used • When a user submits a job, JDL options are to: • Specify CE • Allow RB to choose CE (using optional tags to define requirements) • Specify SE (then RB finds “nearest” appropriate CE, after interrogating Replica Location Service) Grid developments and middleware components, 19 July 04 - 32
Outline • Grid developments from an EGEE perspective: • Creating e-Infrastructure • Building on and with other Grid projects • Towards service-orientation • Establishing a “production Grid” • Overview of the middleware of the current EGEE-0 system • Major components • Lifecycle of a job • Summary Grid developments and middleware components, 19 July 04 - 35
UI RB node Replica Location Server Network Server Workload Manager Inform. Service Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job Status UI RB node submitted Replica Location Server Network Server Workload Manager Inform. Service UI: allows users to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs) Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000; Rank = other.GlueCEStateFreeCPUs; Job Status UI RB node submitted Replica Location Server Network Server Workload Manager Inform. Service Job Contr. - CondorG Job Description Language (JDL) to specify job characteristics and requirements CE characts & status SE characts & status Computing Element Storage Element
NS: network daemon responsible for accepting incoming requests submitted waiting UI RB node Job Status Replica Location Server Network Server Job Input Sandbox files Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status Replica Location Server Network Server Job Workload Manager Inform. Service RB storage WM: responsible to take the appropriate actions to satisfy the request Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status Replica Location Server Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage Where must this job be executed ? Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status Replica Location Server Network Server Matchmaker: responsible to find the “best” CE where to submit a job Match- Maker/ Broker Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status Where are (which SEs) the needed data ? Replica Location Server Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage What is the status of the Grid ? Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status Replica Location Server Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage CE choice Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Adapter Job Contr. - CondorG CE characts & status JA: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, etc.) SE characts & status Computing Element Storage Element
Job submission submitted waiting UI ready RB node Job Status Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Job Contr. - CondorG JC: responsible for the actual job management operations (done via CondorG) CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI ready scheduled RB node Job Status Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Input Sandbox files CE characts & status SE characts & status Job Computing Element Storage Element
Job submission submitted waiting UI ready scheduled running Job RB node Job Status Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Input Sandbox “Grid enabled” data transfers/ accesses Computing Element Storage Element
Job submission submitted waiting UI ready scheduled running done RB node Job Status Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Output Sandbox files Computing Element Storage Element
Job submission submitted waiting UI ready scheduled running done RB node Job Status edg-job-get-output <dg-job-id> Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Output Sandbox Computing Element Storage Element
Job submission UI RB node Job Status submitted Replica Location Server Network Server waiting ready Output Sandbox files Workload Manager Inform. Service RB storage scheduled Job Contr. - CondorG running done cleared Computing Element Storage Element
Job monitoring UI RB node edg-job-status <dg-job-id> edg-job-get-logging-info <dg-job-id> Network Server LB: receives and stores job events; processes corresponding job status Workload Manager Job status Logging & Bookkeeping Job Contr. - CondorG Log Monitor Log of job events LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB Computing Element
Summary… 1 • EGEE is creating a production-quality Grid as a step towards an emerging Europe-wide e-Infrastructure • Secure, reliable, sustainable • Wide spectrum of VOs • Integrating with national, regional, international grids and networks • EGEE is reengineering middleware, with Service Orientation • The LCG is providing a service now • EGEE-0 components, Job submission and life-cycle have been described…. Grid developments and middleware components, 19 July 04 - 64
Input “sandbox” DataSets info Output “sandbox” SE & CE info Job Submit Event Job Query Publish Job Status Storage Element Summary -2: EGEE components Replica Catalogue “User interface” Information Service Resource Broker Author. &Authen. Input “sandbox” + Broker Info Output “sandbox” Logging & Book-keeping Computing Element Job Status Grid developments and middleware components, 19 July 04 - 65