210 likes | 226 Views
Explore the world of grid computing, from terminology to components like CE, SE, WN, and RB. Learn how EGEE grid elements collaborate to solve computational problems, manage resources, and create a virtual computing environment.
E N D
Computational grids and grids projects DSS, 4.4.2005 pesicka@kiv.zcu.cz
Content • Grid computing (terminology) • EGEE grid elements, how it works • Gilda testbed (example of simple job) • Grid projects
Grid computing • model for solving massive computational problems • use of unused resources (CPU cycles, disk storage,...) • support computation across administrative domains • apart from traditional clusters • creates “virtual cluster” embedded in network infrastructure • multi-user environment • issue of authorization – allow remote users to control computing resources
Grid computing - resources • sharing heterogenous resources • different platforms • hw / sw architectures • computer languages • located in different places • different administrative domains • connected through the network • virtualizing computing resources
Grid x cluster • grids – heterogeneous • can use ordinary desktops as well • cluster – homogenous • located in data centres • Grids are build from Computational Elements (CE) • The cluster can act as an CE of the whole grid system
Global Grid Forum • GGF – defines specification for grid computing • Globus Alliance – implements standards – GT • Globus Toolkit – middleware to build services based on GT; de facto standard; just part of the grid
Globus – implemented services • Resource management • GRAM (Grid Resource Allocation Management) • Information services • MDS (Monitoring and Discovery Services) • Security Services • GSI (Grid Security Infrastructure) • Data Movement and Management • GridFTP, GASS (Global Access to Secondary Storage)
EGEE grid components • UI (User Interface) • user access to the computational grid • logon, start jobs, info about state of jobs • information about free resources • management of user’s data • CE (Computing Element) • receive jobs for the given cluster, farm (homogenous) • info about computational power and installed sw • give the jobs to the local job management system(PBS, LFS, NQE, LoadLeveler, Condor), LJMS sends the job later to the working nodes
EGEE grid components II. • SE (Storage Element) • interface how to store user data inside the grid • access to the files • replication of files • file is registrated inside the grid with the internal name(independent of the name and the location) • RC (Replica Catalog) • RLS (Replica Location Server) • info about file replicas, selection of the appropriate replica
EGEE grid components III. • WN (Worker Nodes) • computation nodes, place where the computation is running • have access to the application software (mount from server) • capable of manipulation with data stored on SE • they are accessible only from CE, not from the whole environment
EGEE grid components IV. • IS (Information Service) • state information about elements of grids (CE, SE, ...) • monitoring of the state of the jobs • RB (Resource Broker) • scheduler, find the proper resources for the job requirements • divide jobs to the CE, sending JDL (Job Description Language) • use IS for its decisions
enter Grid enter Grid enter Grid enter Grid CE UI UI WN WN WN WN WN WN GILDA RLS Students Terminals SE - PKI X.509 certificate keys - JDL files RB
How it all works together – step by step • User connects to the UI • time limited proxy certificate is created • User defines the computational job and tell it to the resource broker • by the means of JDL file • JDL file may contain some input data (more datasets – SE) • Resource broker talks to IS, finds proper CE • Resource broker creates job and sends it to the CE
How it all works together II. • CE receives job and sends it to the local job management system • The job is running on the WN (working nodes) • using lager datasets – copy data from SE • new large output data – copy to SE, registrated with RLS (Replica Location Server) • At the end of the job, output (stdout, stderr) copied back to the RB
How to try it and participate • Genius portal– access to the grid • Gilda • demo applications • last versions of middleware sw • https://grid-demo.ct.infn.it/
Example – hostname.jdl Type = "Job"; JobType = "Normal"; Executable = "/bin/hostname"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; Arguments = "-f";RetryCount = 7;
Example – log after job submission Let the GILDA Resource Broker choose Selected Virtual Organisation name (from UI conf file): gilda Connecting to host grid004.ct.infn.it, port 7772 Logging to host grid004.ct.infn.it, port 9002 ================================ edg-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://grid004.ct.infn.it:9000/YWwYrwIircPajba_1pAdeg The edg_jobId has been saved in the following file: /home/demo03/.genius/.tmp_submittedjob_demo03 ==================================================
Example – job queue • Status of the job can be checked in job queue • ready • scheduled • running • done – Get Output • cleared (after GetOutput) • Output • hostname.err0 • hostname.out.txt24 • Hostname.out.txt • testbed010.cnaf.infn.it {Heureka! We got it!}
Grid Projects • EGEE (Enabling Grid for E-sciencE) • connect Europian grids, create production grid • starten on 1.April 2004 • 70 partners (EU, USA, Russia) • 7 federations (CE federation – Czech Rep.) • CERN – one federation itself • CESNET – scheduling and state monitoring part of the middleware
Project Geneva • CoreGrid, Akogrimo, DataMiningGrid • GridCoord, HPC4U, IntelliGrid • K-WF Grid, NextGrid, OntoGrid • Provenance, SIMDAT, UniGridS
Literature, Materials • Wikipedia • http://egee.cesnet.cz • http://www.globus.org