270 likes | 283 Views
This workshop discusses concepts, job categories, levels of gridification, and software remote installation for porting applications to EU-IndiaGrid. It also covers application toolkits, higher-level grid services, and grid applications development.
E N D
Porting applications to EU-IndiaGrid: EGEEMarco Verlato (@pd.infn.it) EU-IndiaGrid Workshop 16-18 April 2007 Bangalore, India
Outline • Concepts • Job categories • Levels of Gridification • Software remote installation • Applications invocation
Application Application toolkits, ….. Higher-level grid services (brokering,…) Basic Grid services:AA, job submission, info, … Grid Applications Development Where computer science meets the application communities! VO-specific developments built on higher-level tools and core services Makes Grid services useable by non-specialists Grids provide the compute and data storage resources Production grids provide these core services.
Job concept • gLite middleware follows the job submission concept for application execution and resource sharing • A job is a self contained entity that packages and conveys all the required information and artifacts for the successful remote execution of an application • Executable files • Input/Output data • Parameters • Environment • Infrastructure Requirements • Workflows
Type of applications • The work done to port the application depends on its characteristics, complexity and the level of interactions with external grid services • Size of input/output data • Metadata request • Access to external software (database, libraries...)
Applications categories • Simulation • Bulk Processing • Responsive Apps. • Workflow • Parallel Jobs
Simulation • Examples • LHC Monte Carlo simulation • Fusion • Characteristics • Jobs are CPU-intensive • Large number of independent jobs • Run by few (expert) users • Small input; large output • Needs • Batch-system services • Minimal data management for storage of results ATLAS ITER
Bulk Processing • Examples • HEP processing of raw data, analysis • Earth observation data processing • Characteristics • Widely-distributed input data • Significant amount of input and output data • Needs • Job management tools (workload management) • Meta-data services • More sophisticated data management
Responsive Apps. (I) • Examples • Prototyping new applications • Monitoring grid operations • Characteristics • Small amounts of input and output data • Not CPU-intensive • Short response time (few minutes) • Needs • Configuration which allows “immediate” execution (QoS) • Services must treat jobs with minimum latency
Responsive Apps. (II) • Grid as a backend infrastructure: • gPTM3D: interactive analysis of medical images • GPS@: bioinformatics via web portal • GATE: radiotherapy planning • DILIGENT: digital libraries • Characteristics • Rapid response: a human waiting for the result! • Many small but CPU-intensive tasks • User is not aware of “grid”! • Needs • Interfacing (data & computing) with non-grid application or portal • User and rights management between front-end and grid
Workflow • Examples • “Bronze Standard”: medical images registration • Flood prediction • Characteristics • Use of grid and non-grid services • Complex set of algorithms for the analysis • Complex dependencies between individual tasks • Needs • Tools for managing the workflow itself • Standard interfaces for services (I.e. web- services)
Parallel Jobs • Examples • Climate modeling • Earthquake analysis • Computational chemistry • Characteristics • Many interdependent, communicating tasks • Many CPUs needed simultaneously • Use of MPI libraries • Needs • Configuration of resources for flexible use of MPI • Pre-installation of optimized MPI libraries
Universal Needs • Security • Ability to control access to services and to data • Fine-grained access control lists Encryption & logging for more demanding disciplines • Access control consistently implemented over all services • VO Management • Management of users, groups, and roles • Changing the priority of jobs for different users, groups, roles • Quota management for users, groups, roles • Definition and access to special resources • Application-level services • Responsive queues (guaranteed, low-latency execution)
Levels of “Gridification” • No development:wrap existing applications as jobs. No source code modification is required • Minor modifications:the application exposes minimal interaction with the grid services (e.g. Data Managements) • Major modifications:a wide portion of the code is rewritten to adopt to the new environment (e.g. parallelization, metadata, information) • Pure grid applications:developed from scratch. Extensively exploit existing grid services to provide new capabilities customized for a specific domain (e.g. metadata, job management, credential management)
No development • It is the simplest case : the application has not external dependencies, just needs computational power • Being sure that the application runs on the Worker Node, the executable, and the input files it may need are carried in the Input Sandbox and therefore executed • Although the application can be directly run, it’s recommended to invoke it from a supplied script which prepare the environment before the execution
Minor modifications • The application has minimal interactions with grid service : it is self consistent but... • e.g. takes input files from an SE • output files are stored on a SE • Perform I/O from a metadata catalog • the application itself can be downloaded from a SE • User provides an adapter script which executes needed tasks, set the environment and run the application
Major modifications • A part of the application software is rewritten taking care of the new environment : e.g. because of parallelization, interaction at run time with grid services... • Use of API • Web Service invoked • CLI wrap • Source code recompiled taking into account this
Pure grid application • Entirely developed from scratch taking into account grid capabilities • Better exploits grid facilities • Needs a deep knowledge of grid environment and possibilities in order to engineer entirely the application logic over a grid environment • Very rare to meet...
Additional software installed • Independently from the integration level, applications may need external software (libraries) that, as application itself, can be provided in several ways • at run time • carrying out it within InputSandbox • downloading it from a Storage Element • permanent install on the CE • as tar.gz installed by the VO software manager • as rpm being part of the gLite release
Software installation on CE • The Experiment Software Manager (ESM) is the member of the experiment VO entitled to install Application Software in the different sites. • The ESM is generally acknowledged through a VOMS role (SoftwareManager) • The ESM can manage (install, validate, remove...) Experiment Software on a site at any time through a normal Grid job, without previous communication to the site administrators. • Such job has in general no scheduling priorities and will be treated as any other job of the same VO in the same queue. • lcg-asis is a good generator for such installation jobs
Software installation on CE • The site provides a dedicated space where each supported VO can install or remove software. The amount of available space must be negotiated between the VO and the site • An environmental variable holds the path for the location of a such space. Its format is the following: VO_<name_of_VO>_SW_DIR
Software validation • Once the software is installed, the ESM can perform the validation • The validation is a process verifying that the software was properly installed and works as expected • It can be performed in the same job, after installation (validation on the fly) or later on, via a dedicated job.
Software validation • If the ESM judges that the software was properly installed he publishes in the Information System (IS) the tag which identifies unequivocally the software, e.g.:VO-cms-CMSSW_1_2_2 • Such a tag is added to the GlueHostApplicationSoftwareRunTimeEnvironment attribute of the IS, using the GRIS running in the CE • Jobs requesting for a particular piece of software can be directed to the appropriate CE just by setting special requirements on the JDL, e.g.: Requirements=Member("VO-cms-CMSSW_1_2_2", other.GlueHostApplicationSoftwareRunTimeEnvironment);
Invoke Application • From the UI • Command Line Interfaces / Scripts • APIs • Higher level tools • From portals • Accessible from any browser • Tailored to applications
References (I) https://grid.ct.infn.it/twiki/bin/view/GILDA/APIUsage
References (II) https://glite-demo.ct.infn.it/
Acknowledgements • Thanks to Mike Mineter (NeSC), Valeria Ardizzone and Emidio Giorgio of the INFN GILDA team