590 likes | 607 Views
Explore workflow development and execution using P-GRADE Portal for scalable grid applications. Learn about its history, motivations, and capabilities in enabling parallelism and grid interoperation.
E N D
Grid application development with gLite and P-GRADE Portal Miklos Kozlovszky MTA SZTAKI m.kozlovszky@sztaki.hu
Contents • P-GRADE Portal in a nutshell • Workflow development with the Portal • Workflow execution with the Portal • Scaling up to a parametric workflow
Short History of P-GRADE portal • Parallel Grid Application and Development Environment • Initial development started in the Hungarian SuperComputing Grid project in 2003 • It has been continuously developed since 2003 • Detailed information: http://www.portal.p-grade.hu/ • Open Source community development since January 2008: https://sourceforge.net/projects/pgportal/
Download of OSS P-GRADE portal 110 downloads within the first month ~697 total downloads until now
Main P-GRADE related projects • EU SEE-GRID-1 (2004-2006) • Integration with LCG-2 and gLite • EU SEE-GRID-2,SEE-GRID-SCI (2006-2008 / 2008-2010) • Parameter sweep extension • EU CoreGrid (2005-2008) • To solve grid interoperation for job submission • To solve grid interoperation for data handling: SRB, OGSA-DAI • GGF GIN (2006) • Providing the GIN Resource Testing portal • EGEE 2,3 (2006-2010) • Respect program tool used for training and application development • ICEAGE (2006-2008) • P-GRADE portal is used for training as official portal of the GILDA training infrastructure • EU EDGeS (2008-2009) • Transparent access to any EGEE and Desktop Grid systems
Motivations for developing P-GRADE portal • P-GRADE portal should • Hide the complexity of the underlying grid middlewares • Provide a high-level graphical user interface that is easy-to-use for e-scientists • Support many different grid programming approaches: • Simple Scripts & Control (sequential and MPI job execution) • Scientific Application Plug-ins • Complex Workflows • Parameter sweep applications: both on job and workflow level • Interoperability: transparent access to grids based on different middleware technology (both computing and data resources) • Support several levels of parallelism
Application Application toolkits, standards Higher-level grid services (brokering,…) Basic Grid services:AA, job submission, info, … Layers in a Grid system Graphical interface P-GRADE Portal services Command line tools Grid middleware
What is a P-GRADE Portal workflow? • a directed acyclic graph where • Nodes represent jobs (batch programs to be executed on a computing element) • Ports represent input/output files the jobs expect/produce • Arcs represent file transfer operations • semantics of the workflow: • A job can be executed if all of its input files are available
Three Levels of parallelism Multiple instances of the same workflow can process different data files • Job level: Parallel execution inside a workflow node (MPI job as workflow component) • Workflow level: Parallel execution among workflow nodes (WF branch parallelism) • PS workflow level: Parameter study execution of the workflow Multiple jobs can run parallel Each job can be a parallel program
Example 1.: Computational Chemistry Department of Chemistry, University of Perugia 25 times SOLUTION OF SCHRODINGER EQUATION FOR TRIATOMIC SYSTEMS USING TIME-DEPENDENT (RWAVEPR) OR TIME INDEPENDENT (ABC) METHOD A single execution can be between 5 hours and 10 hours Many simulations at the same time SEQUENTIAL FORTRAN 90
Example 2.:Ultra-short range weather forecast • Hungarian Meteorology Service Forecasting dangerous weather situations (storms, fog, etc.),crucial task in the protection of life and property 25 x Processed information: surface level measurements, high-altitude measurements, radar, satellite, lightning, results of previous computed models 10 x 5 x 25 x • Requirements: • Execution time < 10 min • High resolution (1km)
Grid interoperation by P-GRADE Acccessing Globus, gLite and ARC based grids simultaneously P-GRADE portal
UPLOAD SOURCE(S) COMPILE – EDIT DOWNLOAD BINARI(ES) Typical user scenarioCompilation phase Certificate servers Gridservices Portal server
SAVE WORKFLOW START EDITOR Typical user scenarioApplication development phase Certificate servers Gridservices Portal server OPEN & EDIT or DEVELOP WORKFLOW
TRANSFER FILES, SUBMIT JOBS DOWNLOAD PROXY CERTIFICATES MONITOR JOBS VISUALIZE JOBS and WORKFLOW PROGRESS DOWNLOAD (SMALL) RESULTS DOWNLOAD (SMALL) RESULTS Typical user scenarioWorkflow Execution phase Certificate servers Gridservices Portal server
P-GRADE Portal structural overview Webbrowser Java Webstartworkflow editor Client User interface layerPresents the user interface Internal layer – Java classes Represents the internal concepts P-GRADEPortalserver Grid layer – gLite and Globus command line tools Interfacing with grid services EGEE and Globus Grid services (gLite WMS, LFC,…; Globus GRAM, GridFTP, …) Grid
Interface layer Webbrowser Java Webstartworkflow editor Client User interface layer Web server Gridpshere Web portal framework P-GRADEPortalserver Gridsphere portlets P-GRADE portlets Workflow monitor: Java applet generator Workflow editor: Java webstart application
Interface layer functionalities • Workflow portlet • Workflow manager, Storage, Upload • Certificate portlet • Upload, download and other operations • Settings portlet • Grid settings, Quota settings • File management • Manage files in the grid • Compiler portlet • Compile jobs on portal server Webbrowser Java Webstartworkflow editor Client • Login • Welcome • ... User interface layer Web server Gridpshere Web portal framework P-GRADEPortalserver Gridsphere portlets P-GRADE portlets Workflow monitor: Java applet generator Workflow editor: Java webstart application
P-GRADE vs. Non-P-GRADE portlets GridSphere 2.x Grid Portal framework P-GRADE Portal portlets
Interface layer Webbrowser Java Webstartworkflow editor Client User interface layer Web server Gridpshere Web portal framework P-GRADEPortalserver Gridsphere portlets P-GRADE portlets Workflow monitor: Java applet generator Workflow editor: Java webstart application
Interface layer Webbrowser Java Webstartworkflow editor Client User interface layer Web server Gridpshere Web portal framework P-GRADEPortalserver Gridsphere portlets P-GRADE portlets Workflow monitor: Java applet generator Workflow editor: Java webstart application
Portlets/functionalities of P-GRADE portal • Settings (portlet) • Certificate and proxy management (portlet) • Information system visualization (portlet) • Graphical workflow editing • Workflow manager (portlet) • LFC (EGEE) file management (portlet) • Compilation support (portlet) • Fault-tolerance support
Settings Portlet • Portal administrator can • connect the portal to several grids • register the basic resources of the connected grids
Settings Portlet User cancustomize the connected grids by adding and removing resources
HUNGRID access SEE-GRID access Certificate and proxy management Portlet • User can upload his certificates of various grids to the MyProxy server • User can download proxys and allocate to grids • User can use simultaneously as many proxys as many grids are connected to the portal • As a result parallel branches of a workflow can be executed simultaneously in several grids
MyProxy interaction in P-GRADE: Certificate Manager Certificates portlet • To start your session on the Grid you must create a proxy certificate on the portal server • “Certificates” portlet: • to upload a proxy into MyProxy servers • to download a proxy from MyProxy into the portal server
Certificate ManagerDownloading a proxy • MyProxy server access details: • Hostname • Port number • User name (from upload) • Password (from upload) • Proxy parameters: • Lifetime • Comment • Grid association
Certificate ManagerAssociating the proxy with a grid This operation displays thedetails of the certificateand the list of available Grids (defined by portal administrator)
Solving Grid interoperation by P-GRADE Portal EGEE Grid P-GRADE-Portal Different jobs can be parallel executed in different grids UK NGS London Paris Athens
Interoperation vs. Interoperability As defined by the GIN (Grid Interoperation Now)CG (Community Group) of the OGF (Open Grid Forum) Interoperation: • short term solution that defines what needs to be done to achieve interoperation between current production grids using existing technologies Interoperability: • native ability of Grids and Grid middleware to interact directly via common open standards Grid 1 P-GRADE Portal Grid 1 Grid 2 Grid 3 Grid 2 Grid 3 Interoperability Interoperation
Graphical workflow editing • The aim is to define a DAG of batch jobs: • Drag & drop components:jobs and ports • Define their properties • Connect ports by channels(no cycles, no loops, no conditions) • Automatically generates JDL file
Workflow EditorProperties of a job • Properties of a job: • Binary executable • Type of executable • Number of required processors • Command line parameters • The resource to be used for the execution: • Grid/VO • (Computing element)
I still don’t know which resource to use! Direct resource selection:Which computing element to use? The information system portlet queries BDII and GIIS servers
Automatic resource selection • Select a broker Grid/VO for the job (e.g. GILDA_LCG2_broker/GILDA_gLite_broker) • (Describe the ranks & requirements of the job in JDL) • The portal will use the broker to find the best resource for the job!
Workflow EditorDefining broker jobs Select a Grid with broker! (*_BROKER) Ignore the resource field! If default JDL is not sufficient use the built-in JDL editor!
Workflow EditorBuilt-in JDL editor JDL look at the gLite Users’ manual!
Workflow EditorDefining input-output files File properties Type: input:the job reads output:the job generates File type:local: comes from my desktop remote: comes from an SE File: location of the file Internal file name: Executable reads the file in this name – fopen(“file.in”, …) File storage type (output files only): Permanent:final result Volatile:only data channel
Client side location: c:\experiments\11-04.dat LFC logical file name(LFC file catalog is required – EGEE VOs)lfn:/grid/gilda/kozlovszky/11-04.dat GridFTP address (in Globus Grids): gsiftp://somengshost.ac.uk/mydir/11-04.dat How to refer to an I/O file? Input file Output file Local file • Client side location: result.dat • LFC logical file name(LFC file catalog is required – EGEE VOs)lfn:/grid/gilda/kozlovszky/11-04_-_result.dat • GridFTP address (in Globus Grids): gsiftp://somengshost.ac.uk/mydir/result.dat Remote file
LOCAL INPUT FILES & EXECUTABLES LOCAL INPUT FILES& EXECUTABLES REMOTE INPUTFILES REMOTE OUTPUTFILES LOCAL OUTPUT FILES LOCAL OUTPUT FILES Only the permanent files! Local vs. remote files • Your binary can access data services directly too • GridFTP API • GFAL API • lfc-*, lcg-* commands Gridservices Storage elements Portal server Computing elements
Workflow manager • Lists available workflows • Enables • Submitting • Aborting • Deleting existing workflows • Shows status, logs and results of workflow executions • Orchestrates job executions inside a workflow
Workflow Management(workflow portlet) • The portlet presents the status, size and output of the available workflow in the “Workflow” list • It has a Quota manager to control the users’ storage space on the server • The portlet also contains the “Abort”, “Attach”, “Details”, “Delete” and “Delete all” buttons to handle execution of workflows • The “Attach” button opens the workflow in the Workflow Editor • The “Details” button gives an overview about the jobs of the workflow
Workflow Execution(observation by the workflow portlet) White/Red/Green color means the job is initial/running/finished state
Workflow Execution(observation by the workflow portlet) White/Red/Green color means the job is initial/running/finished state
Workflow Execution(observation by the workflow portlet) White/Red/Green color means the job is initial/running/finished state
Workflow Execution(observation by the workflow portlet) White/Red/Green color means the job is initial/running/finished state
Workflow Execution(observation by the workflow portlet) White/Red/Green color means the job is initialised/running/finished