210 likes | 417 Views
Computational grids and grids projects. DSS, 4.4.2005 pesicka@kiv.zcu.cz. Content. Grid computing (terminology) EGEE grid elements, how it works Gilda testbed (example of simple job) Grid projects. Grid computing. model for solving massive computational problems
E N D
Computational grids and grids projects DSS, 4.4.2005 pesicka@kiv.zcu.cz
Content • Grid computing (terminology) • EGEE grid elements, how it works • Gilda testbed (example of simple job) • Grid projects
Grid computing • model for solving massive computational problems • use of unused resources (CPU cycles, disk storage,...) • support computation across administrative domains • apart from traditional clusters • creates “virtual cluster” embedded in network infrastructure • multi-user environment • issue of authorization – allow remote users to control computing resources
Grid computing - resources • sharing heterogenous resources • different platforms • hw / sw architectures • computer languages • located in different places • different administrative domains • connected through the network • virtualizing computing resources
Grid x cluster • grids – heterogeneous • can use ordinary desktops as well • cluster – homogenous • located in data centres • Grids are build from Computational Elements (CE) • The cluster can act as an CE of the whole grid system
Global Grid Forum • GGF – defines specification for grid computing • Globus Alliance – implements standards – GT • Globus Toolkit – middleware to build services based on GT; de facto standard; just part of the grid
Globus – implemented services • Resource management • GRAM (Grid Resource Allocation Management) • Information services • MDS (Monitoring and Discovery Services) • Security Services • GSI (Grid Security Infrastructure) • Data Movement and Management • GridFTP, GASS (Global Access to Secondary Storage)
EGEE grid components • UI (User Interface) • user access to the computational grid • logon, start jobs, info about state of jobs • information about free resources • management of user’s data • CE (Computing Element) • receive jobs for the given cluster, farm (homogenous) • info about computational power and installed sw • give the jobs to the local job management system(PBS, LFS, NQE, LoadLeveler, Condor), LJMS sends the job later to the working nodes
EGEE grid components II. • SE (Storage Element) • interface how to store user data inside the grid • access to the files • replication of files • file is registrated inside the grid with the internal name(independent of the name and the location) • RC (Replica Catalog) • RLS (Replica Location Server) • info about file replicas, selection of the appropriate replica
EGEE grid components III. • WN (Worker Nodes) • computation nodes, place where the computation is running • have access to the application software (mount from server) • capable of manipulation with data stored on SE • they are accessible only from CE, not from the whole environment
EGEE grid components IV. • IS (Information Service) • state information about elements of grids (CE, SE, ...) • monitoring of the state of the jobs • RB (Resource Broker) • scheduler, find the proper resources for the job requirements • divide jobs to the CE, sending JDL (Job Description Language) • use IS for its decisions
enter Grid enter Grid enter Grid enter Grid CE UI UI WN WN WN WN WN WN GILDA RLS Students Terminals SE - PKI X.509 certificate keys - JDL files RB
How it all works together – step by step • User connects to the UI • time limited proxy certificate is created • User defines the computational job and tell it to the resource broker • by the means of JDL file • JDL file may contain some input data (more datasets – SE) • Resource broker talks to IS, finds proper CE • Resource broker creates job and sends it to the CE
How it all works together II. • CE receives job and sends it to the local job management system • The job is running on the WN (working nodes) • using lager datasets – copy data from SE • new large output data – copy to SE, registrated with RLS (Replica Location Server) • At the end of the job, output (stdout, stderr) copied back to the RB
How to try it and participate • Genius portal– access to the grid • Gilda • demo applications • last versions of middleware sw • https://grid-demo.ct.infn.it/
Example – hostname.jdl Type = "Job"; JobType = "Normal"; Executable = "/bin/hostname"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; Arguments = "-f";RetryCount = 7;
Example – log after job submission Let the GILDA Resource Broker choose Selected Virtual Organisation name (from UI conf file): gilda Connecting to host grid004.ct.infn.it, port 7772 Logging to host grid004.ct.infn.it, port 9002 ================================ edg-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://grid004.ct.infn.it:9000/YWwYrwIircPajba_1pAdeg The edg_jobId has been saved in the following file: /home/demo03/.genius/.tmp_submittedjob_demo03 ==================================================
Example – job queue • Status of the job can be checked in job queue • ready • scheduled • running • done – Get Output • cleared (after GetOutput) • Output • hostname.err0 • hostname.out.txt24 • Hostname.out.txt • testbed010.cnaf.infn.it {Heureka! We got it!}
Grid Projects • EGEE (Enabling Grid for E-sciencE) • connect Europian grids, create production grid • starten on 1.April 2004 • 70 partners (EU, USA, Russia) • 7 federations (CE federation – Czech Rep.) • CERN – one federation itself • CESNET – scheduling and state monitoring part of the middleware
Project Geneva • CoreGrid, Akogrimo, DataMiningGrid • GridCoord, HPC4U, IntelliGrid • K-WF Grid, NextGrid, OntoGrid • Provenance, SIMDAT, UniGridS
Literature, Materials • Wikipedia • http://egee.cesnet.cz • http://www.globus.org