540 likes | 693 Views
P-GRADE Portal Family for e-Science Communities. Peter Kacsuk MTA SZTAKI Univ. of Westminster. www.lpds.sztaki.hu/pgportal pgportal @ lpds.sztaki.hu. The community aspects of e-science. Web2 is about creating and supporting web communities
E N D
P-GRADE Portal Family for e-Science Communities Peter KacsukMTA SZTAKI Univ. of Westminster www.lpds.sztaki.hu/pgportal pgportal@lpds.sztaki.hu
The community aspects of e-science • Web2 is about creating and supporting web communities • Grid is about creating virtual organizations where e-science communities • can share resources and • can collaborate • A portal should support e-science communities in their collaborations and resource sharing • And even more: it should provide simultaneous access to any accessible • Resources • Databases • Legacy applications • Workflows, etc. no matter in which grid they are operated on.
Who are the members of an e-science community? • End-users (e-scientists) • Execute the published applications with custom input parameters by creating application instances using the published applications as templates • Grid Application Developers • Develop grid applications by the portal • Publish the completed applications for end-users • Grid Portal Developers • Develop the portal core services (job submission, etc.) • Develop higher level portal services (workflow management, etc.) • Develop specialized/customized portal services (grid testing, rendering, etc.) • Writes technical, user and installation manuals
Using a portalto parameterize and run these applications Supercomputer based SGs (DEISA, TeraGrid) Access to a large set of ready-to-run scientific applications (services) by transparently accessing a large set of various IT resources from the e-science infrastructure App. Repository Portal Cluster based service grids (SGs) (EGEE, OSG, etc.) Desktop grids (DGs) (BOINC, Condor, etc.) Clouds Grid systems Local clusters Supercomputers E-science infrastructure What does an individual e-scientist need?
Supercomputer based SGs (DEISA, TeraGrid) App. Repository Portal Cluster based service grids (SGs) (EGEE, OSG, etc.) Desktop grids (DGs) (BOINC, Condor, etc.) Clouds Grid systems Local clusters Supercomputers What does an e-science community need? E-scientists Application developers The same as an individual scientist but in collaboration with other members of the community
App. Repository Portal Collaboration between e-scientists and application developers E-scientists Application developers • Application Developers • Develop e-science applications via the portal in collaboration with e-scientists • Publish the completed applications for end-users via an application repository • End-users (e-scientists) • Specify the problem/application needs • Execute the published applications via the portalwith custom input parameters by creating application instances
Supercomputer based SGs (DEISA, TeraGrid) App. Repository Portal Cluster based service grids (SGs) (EGEE, OSG, etc.) Desktop grids (DGs) (BOINC, Condor, etc.) Clouds Grid systems Local clusters Supercomputers Collaboration between application developers • Application developers use the portal to develop complex applications (e.g. parameter sweep workflow) for the e-science infrastructure • Publish templates, legacy code appls. and half-made applications in the repository to be continued by other appl. developers Application developers
Supercomputer based SGs (DEISA, TeraGrid) App. Repository Portal Cluster based service grids (SGs) (EGEE, OSG, etc.) Desktop grids (DGs) (BOINC, Condor, etc.) Clouds Grid systems Local clusters Supercomputers Collaboration between e-scientists • Sharing parameterized appls via the repository • Joint run appls via the portal in the e-science infrastructure • Joint observation and control of appl execution via the portal E-scientists
Requirements for an e-science portal from the e-scientists’ point of view It should be able to • Support large number of e-scientists (~ 100) with good response time • Enable the store and share of ready-to-run applications • Enable to parameterize and run applications • Enable to observe and control application execution • Provide reliable appl. execution service even on top of unreliable infrastructures (like for example grids) • Provide specific, user community views • Enable the access of the various components of an e-science infrastructure (grids, databases, clouds, local clusters, etc.) • Support user’s collaboration via sharing: • Applications (legacy, workflow, etc.) • Databases
Requirements for an e-science portal from the app. developers’ point of view It should be able to • Support large number of application developers (~ 100) with good response time • Enable the store and share of half-made applications, application templates • Provide graphical appl. developing tools (e.g. workflow editor) to develop new applications • Enable to parameterize and run applications • Enable to observe and control application execution • Provide methods and API to customize the portal interface towards specific user community needs by creating user-specific portlets • Enable the access of the various components of an e-science infrastructure (grids, databases, clouds, local clusters, etc.) • Support application developers’ collaboration via sharing: • Applications (legacy, workflow, etc.) • Databases • Enable the integration/call of other services
Choice of an e-science portal • Basic question for a community: • Buy a commercial portal? (Usually expensive) • Download OSS portal? (Good choice but: Does the OSS project survive for a long time?) • Develop own portal? (Requires long time and can become very costly) • The best choice is: Download OSS where there is an active development community behind the portal
The role of the Grid portal developers’ community • Grid Portal Developers • Jointly develop the portal core services (e. g. GridSphere, OGCE, Jetspeed-2, etc.) • Jointly develop higher level portal services (workflow management, data management, etc.) • Jointly develop specialized/customized portal services (grid testing, rendering, etc.) • Never build a new portal from scratch, use the power of the community to create really good portals • Unfortunately, we are not quite there: • Hundreds of e-science portals have been developed • Some of them are really good: • Genius, Lead, etc. • However, not many of them OSS (see the sourceforge list on the next slide) • Even less is actively maintained • Even less satisfies the generic requirements of a good e-science portal
P-GRADE portal family • The goal of the P-GRADE portal family • To meet all the requirements of end-users and application developers listed above • To provide a generic portal that can be used by a large set of e-science communities • To provide a community code based on which the portal developers’ community can start to develop specialized and customized portals
2008 2009 2010 Open source from Jan. 2008 P-GRADE portal family P-GRADE portal 2.4 GEMLCA Grid Legacy Code Arch. P-GRADE portal 2.5 Param. Sweep NGS P-GRADE portal GEMLCA, repository concept Basic concept P-GRADE portal 2.8 Current release WS-PGRADE Portal Beta release 3.3 P-GRADE portal 2.9 Under development WS-PGRADE Portal Release 3.4
P-GRADE Portal in a nutshell • General purpose, workflow-oriented Grid portal. Supports the development and execution of workflow-based Grid applications – a tool for Grid orchestration • Based on GridSphere-2 • Easy to expand with new portlets (e.g. application-specific portlets) • Easy to tailor to end-user needs • Basic Grid services supported by the portal:
SAVE WORKFLOW, UPLOAD LOCAL FILES START EDITOR The typical user scenarioPart 1 - development phase Certificate servers Gridservices Portal server OPEN & EDIT or DEVELOP WORKFLOW
TRANSFER FILES, SUBMIT JOBS DOWNLOAD PROXY CERTIFICATES MONITOR JOBS VISUALIZE JOBS and WORKFLOW PROGRESS SUBMIT WORKFLOW DOWNLOAD (SMALL) RESULTS DOWNLOAD (SMALL) RESULTS The typical user scenarioPart 2 - execution phase Certificate servers Gridservices Portal server
P-GRADE Portal architecture Webbrowser Java Webstartworkflow editor Client Tomcat Frontend layer P-GRADE Portal portlets (JSR-168 Gridsphere-2 portlets) P-GRADEPortalserver DAGMan workflow manager Informationsystemclients CoG API& scripts shell scripts Backend layer Grid middleware clients Grid middleware services (gLite WMS, LFC,…; Globus GRAM, …) gLite and GlobusInformationsystems MyProxy server & VOMS Grid
P-GRADE portal in a nutshell Certificate and proxy management Grid and Grid resource management Graphical editor to define workflows and parametric studies Accessing resources in multiple VOs Built-in workflow manager and execution visualization GUI is customizable to certain applications
What is a P-GRADE Portal workflow? • A directed acyclic graph where • Nodes represent jobs (batch programs to be executed on a computing element) • Ports represent input/output files the jobs expect/produce • Arcs represent file transfer operations and job dependencies • Semantics of the workflow: • A job can be executed if all of its input files are available
Parallel execution inside a workflow node • Parallel execution among workflow nodes Multiple jobs run parallel Each job can be a parallel program Multiple instances of the same workflow with different data files Introducing three levels of parallelism • Parameter study execution of the workflow
1 PS workflow execution 4 x 3 normal executable workflows (e-workflows) = Parameter sweep (PS) workflow execution based on the black box concept PS port: 4 instances of the input file PS port: 3 instances of the input file This provides the 3rd level of parallelism resulting a very large demand for Grid resources
Initial input data Generator component(s) Generate orcut input intosmaller pieces Collector component(s) Aggregate result Workflow parameter studies in P-GRADE Portal Core workflow E-workflows Files in the same LFC catalog (e.g. /grid/gilda/sipos/myinputs) Results produced in the same catalog
Generic structure of PS workflows and their execution 1st phase:executing all Generators in parallel Generator jobs to generate the set of input files 2nd phase:executing all generated eWorkflows in parallel 3rd phase:executing all Collectors in parallel Core workflow to be executed as PS Collector jobs to collect and process the set of output files
Portal Integrating P-GRADE portal with DSpace repository • Goal: to make available workflow applications for the whole P-GRADE portal user community • Solution: Integrating P-GRADE portal with DSpace repository • Functions: • App developers can publish their ready-to-use and half-made applications in the repository • End-userscan download, parameterize and execute the applications stored in the repository DSpace repository Portal Portal App developer End-user • Advantage: • Appl. developers can collaborate with end-users • Members of a portal user community can share their WFs • Different portal user communities can share their WFs
Upload WF to DSpace Download WF from DSpace Integrating P-GRADE portal with DSpace repository DSpace Repository
Creating application specific portals from the generic P-GRADE portal • Creating an appl. spec. portal does not mean to develop it from scratch • P-GRADE is a generic portal that can quickly and easily be customized to any application type • Advantage: • You do not have to develop the generic parts (WF editor, WF manager, job submission, monitoring, etc.) • You can concentrate on the appl. spec. part • Much shorter development time
Concept of creating application specific portals End user Webbrowser Client Appl. developer Custom User Interface (Written in Java, JSP, JSTL) P-GRADE portal developer P-GRADEPortalserver Application Specific Module P-GRADE portal developer Services of P-GRADE Portal (workflow management, parameter study management, fault tolerance, …) EGEE and Globus Grid services(gLite WMS, LFC,…; Globus GRAM, …) Grid
Roles of people in creating and using customized P-GRADE portals They can be the same group • End User • Executes the published application with custom input parameters by creating application instances using the published application as a template • Grid Application Developer • develops a grid application by P-GRADE Portal • sends the application to the grid portal developer • Grid Portal Developer • Creates new classes from the ASM for P-GRADE by changing the names of the classes • Develops one or more Gridsphere portlets that fit to the application I/O pattern and the end users’ needs • Connects the GUI to P-GRADE Portal using the programming API of P-GRADE ASM • Using the ASM he publishes the grid application and its GUI for end users
OMNeT++ portal by SZTAKI Traffic simulation portal by Univ. of Westminster Application Specific P-GRADE portals Rendering portal by Univ. of Westminster
Grid interoperation by P-GRADE portal • P-GRADE Portal enables: Simultaneous usage of several production Grids at workflow level • Currently connectable grids: • LCG-2 and gLite: EGEE, SEE-GRID, BalticGrid • GT-2: UK NGS, US OSG, US Teragrid • In progress: • Campus Grids with PBS or LSF • BOINC desktop Grids • ARC: NorduGrid • UniCore: D-Grid
Job Workflow Leeds Job Job Job Simultaneous use of production Grids at workflow level UK NGS GT2 Manchester SZTAKI Portal Server P-GRADE Portal User EGEE-VOCE gLite Budapest Supports both direct and brokered job submission WMS broker Athens Brno
P-GRADE Portal references • P-GRADE Portal services: • SEE-GRID, BalticGrid • Central European VO of EGEE • GILDA: Training VO of EGEE • Many national Grids (UK, Ireland, Croatia, Turkey, Spain, Belgium, Malaysia, Kazakhstan, Switzerland, Australia, etc.) • US Open Science Grid, TeraGrid • Economy-Grid, Swiss BioGrid, Bio and Biomed EGEE VOs, MathGrid, etc. Portal services and account request: • portal.p-grade.hu/index.php?m=5&s=0
Community based business model for the sustainability of P-GRADE portal • Some of the developments are related to EU projects. Examples: • PS feature: SEE-GRID-2 • Integration with DSpace: SEE-GRID-SCI • Integration with BOINC: EDGeS, CancerGrid • There is an open Portal Developer Alliance with the current active members: • Middle East Technical Univ. (Ankara, Turkey) • gLite file catalog management portlet • Univ. of Westminster (London, UK) • GEMLCA legacy code service extension • SRB integration (workflow and portlet) • OGSA-DAI integration (workflow and portlet) • Embedding Taverna, Kepler and Triana WFs into the P-GRADE workflow • All these features are available in the UK NGS P-GRADE portal
Business model for the sustainability of P-GRADE portal • Some of the developments are ordered by customer academic institutes: • Collaborative WF editor: Reading Univ. (UK) • Accounting portlet: MIMOS (Malaysia) • Separation of front-end and back-end: MIMOS • Shiboleth integration: ETH Zurich • ARC integration: ETH Zurich • Benefits for the customer academic institutes: • Basically they like the portal but they have some special needs that require extra development • Instead of developing from scratch a new portal (using many person-months) rather they pay only for the required little extension/modification of the portal • To solve their problem gets priority • They become expert of the internal structure of the portal and will be able to further develop it according to their needs • Joint publications
Main features of NGS P-GRADE portal • Extends P-GRADE portal with • GEMLCA legacy code architecture and repository • SRB file management • OGSA-DAI database access • WF level interoperation of grid data resources • Workflow interoperability support • All these features are provided as production service for the UK NGS
FS2 FS1 Interoperation of grid data resources Grid 1 Grid 2 Workflow engine DB1 J1 J3 J2 J4 DB2 J5 J: Job FS: File storage system, e.g. SRB or SRM DB: Database management system (based on OGSA-DAI)
Workflow level Interoperation of local, SRB, SRM and GridFTP file systems From NGS SRB From local (both) From NGS GFTP From NGS SRB (both) Running at EGEE Running at OSG Running at NGS Jobs can run in various grids and can read and write files stored in different grid systems by different file management systems Running at NGS From NGS SRB To EGEE SRM To NGS SRB From NGS GFTP Running at NGS
WF interoperability: P-GRADE workflow embedding Triana, Taverna, and Kepler workflows Triana workflow Taverna workflow Available for UK NGS users as production service P-GRADE workflow hosting the other workflows Kepler workflow
WS-PGRADE and gUSE • New product in the P-GRADE portal family: • WS-PGRADE (Web Services Parallel Grid Runtime and Developer Environment) • WS-PGRADE uses the high-level services of • gUSE (Grid User Support Environment) architecture • Integrates and generalizes P-GRADE portal and NGS P-GRADE portal features • Advance data-flows (PS features) • GEMLCA • Workflow repository • gUSE features • Scalable architecture (can be installed on one or more servers) • Various grid submission services (GT2, GT4, LCG-2, gLite, BOINC, local • Built-in inter-grid broker (seamless access to various types of resources) • Comfort features • Different separated user views supported by gUSE application repository
gUSE: service-oriented architecture Graphical User Interface: WS-PGRADE Gridsphere portlets gUSE Filestorage Workflowstorage Filestorage gUSEinformationsystem Autonomous Services: high level middleware service layer WorkflowEngine Applicationrepository Submitters Submitters Submitters Submitters Meta-broker Logging Resources: middleware service layer Local resources, Service grid resources, Desktop Grid resources, Web services, Databases
Ergonomics • Users can be grid application developers or end-users. • Application developers design sophisticated dataflow graphs • embedding into any depth, recursive invocations, conditional structures, generators and collectors at any position • Publish applications in therepository at certain stages of work • Applications • Projects • Concrete workflows • Templates • Graphs • End-userssee WS-PGRADE portal as a science gateway • List of ready-to-use applications in gUSE repository • Import and execute application without knowledge of programming, dataflow or grid
Cross & dot product data-pairing Concept similar to Taverna All-to-all vs. one-to-one pairing of data items Any componentcan be generator, PS node or collector, no ordering restriction Conditional execution based on equality of data Nesting, recursion Dataflow programmingconcept for appl. developers 40 20 50 1000 40 Collector Generator 5000 1 7042 tasks Collector 5000 1
Current users of gUSE beta release • CancerGrid project • Predicting various properties of molecules to find anti-cancer leads • Creating science gateway for chemists • EDGeS project (Enabling Desktop Grids for e-Science) • Integrating EGEE with BOINC and XtremWeb technologies • User interfaces and tools • ProSim project • In silico simulation of intermolecular recognition • JISC ENGAGE program (UK)
The CancerGrid infrastructure Portal gUSE Local jobs DG jobs 3GBridge LocalResource Job 1 Job 2 Job N executingworkflows BOINCserver browsing molecules PortalStorage WU 1 WU 2 WU N BOINC client GenWrapper forbatch execution WU X LegacyApplication WU Y moleculedatabase LegacyApplication Portal and DesktopGridserver DG clients from all partners Molecule database server
Execute on local desktop Grid CancerGrid workflow N = 30K, M = 100 --> about 0.5 year execution time x1 NxM= 3 millions x1 xN xN xN NxM NxM x1 xN xN N=30K xN Generator job Generator job NxM= 3 millions
G-USE in ProSim Project Protein Molecule Simulation on the Grid Grid Computing team of Univ. of Westminster
The User Scenario PDB file 1 (Receptor) PDB file 2 (Ligand) Check (Molprobity) Energy Minimization (Gromacs) Perform docking (AutoDock) Validate (Molprobity) Molecular Dynamics (Gromacs)
The Workflow in g-USE • Parameter sweeps in phases 3 and 4 • Executed on 5 different sites of the UK NGS