260 likes | 364 Views
WS-PGRADE portal and its usage in the CancerGrid project. M. Kozlovszky, P. Kacsuk Computer and Automation Research Institute of the Hungarian Academy of Sciences PUCOWO, Zurich, Switzerland 10-11/06/2010. Motivations of creating gUSE.
E N D
WS-PGRADE portal and its usage in the CancerGrid project M. Kozlovszky, P. Kacsuk Computer and Automation Research Institute of theHungarian Academy of Sciences PUCOWO,Zurich, Switzerland10-11/06/2010
Motivations of creating gUSE • To overcome (most of)the limitations of P-GRADE portal: • To provide better modularity to replace any service • To improve scalability to millions of jobs • To enable advanced dataflow patterns • To interface with wider range of resources • To separate Application Developer view from Application User view WS-PGRADE(Web Services Parallel Grid Runtime and Developer Environment)and gUSE (Grid User Support Environment) architecture
WS-PGRADE/gUSE • Creating complex workflow and parameter sweeps • Seamless access to various types of resources • clusters, • service grids, • desktop grids, • databases. • Scalable architecture • Advanced data-flows • Creating complex applications using embedded workflows, legacy codes • Comfort features • Separated views • Community components from workflow repository www.wspgrade.hu
WS-PGRADE architecture Graphical User Interface: WS-PGRADE Gridsphere portlets gUSE Filestorage Workflowstorage Filestorage gUSEinformationsystem Autonomous Services: high level middleware service layer WorkflowEngine Applicationrepository Submitters Submitters Submitters Submitters Meta-broker Logging Resources: middleware service layer Local resources, service grid VOs, Desktop Grid resources, Web services, Databases
Application lifecycle in WS-PGRADE • Define workflow structure • Configure workflow • Define content for tasks • Run a test • Use local resources, Web services, Databases • Scale workflow for large simulations • Use batch systems, use cluster grids, use desktop grids • Fix some parameters, leave some open • Result: An application specific science gateway for end users
WS-PGRADE application: Acyclic dataflow • Job to run on dedicated machine • Job to run in a gLite VO • Job to run in a Globus 2/4 VO • Task to run in a BOINC Grid • Web service invocation • Database operation (R / W) • File from the client host • File from a GridFTP site • File from an LFC catalog • Input string from a task or service • Result of a Database query
40 20 50 1000 40 Collector Generator 5000 1 7042 tasks Collector 5000 1 Dataflow programming with gUSE • Separate application logic from data • Cross & dot product data-pairing • Concept from Taverna • All-to-all vs. one-to-one pairing of data items • Generator components: to producemany output files from 1 input file • Collector components: to produce1 output file from many input files • Any componentcan be generator or collector • Conditional execution based on equality of data • Nesting, cycle, recursion
Ergonomics • Users can be grid application developers or end-users. • Application developers design sophisticated dataflow graphs • Embedding into any depth, recursive invocations, conditional structures, generators and collectors at any position • Publish applications in the repository at certain stages of work • Applications • Projects • Concrete workflows • Templates • Graphs • End-users see WS-PGRADE & gUSE as a science gateway • List of ready to use applications in repository • Import and execute application without knowledge of programming, dataflow or grid
Current users of gUSE • EDGeS project (Enabling Desktop Grids for e-Science) • Integrating EGEE with BOINC and XtremWeb technologies • User interfaces and tools • ProSim project • In silico simulation of intermolecular recognition • See next presentation • University of Westminster Desktop Grid • Using AutoDock on institutional PCs • CancerGrid project • Predicting various properties of molecules to find anti-cancer leads • Creating science gateway for chemists
Motivation to use gUSE and WS-PGRADE for CancerGrid • Arbitrary number of generators (and collectors) within one workflow (at arbitrary locations). • Scalability: Number of jobs within one workflow is at range: 100K…1M ! • Import of the existing EndUser configuration GUI (Easy-to-use, web based, user specific) application specific portlet for end users was not needed.
Portal Local jobs DG jobs 3GBridge LocalResource Job 1 Job 2 Browsing molecules Job N Executingworkflows BOINCserver PortalStorage WU 1 WU 2 WU N BOINC client GenWrapper forbatch execution WU X LegacyApplication WU Y moleculedatabase LegacyApplication Portal and DesktopGridserver DG clients from all partners Molecule database server
CancerGrid Workflows Descriptor Calculation Property Prediction Screening Model building
Working on the CancerGrid Portal – step-by-step • Initial state: molecules/structures stored in DB, organised into lists • User selects list of molecules/structures • User selects/downloads a workflow from repository • User configures the workflow to take the list as input • User optionally updates parameters of the modules • Submits workflow • Optionally monitors the status • When workflow finished, results are stored in the DB
Conclusions • WS-PGRADE: Implemented on top of scalable, WS based gUSE architecture • More expressive dataflow patterns • Transparent access to • Local resources • Service Grids • Desktop Grids • Databases • Web services • Application repository • Service for collaboration of developers and end-users
Next steps at www.guse.hu User manual Request a user account
Thank you for your attention! Questions? www.wspgrade.hu Acknowledgement: CancerGrid EU FP6 project (FP6-2005-LIFESCTHTALTH-7) http://www.cancergrid.eu
Flexmol is an XML-based molecular language Molecule 2D/3D converter (Cmol3D) Molecule 3D conformation generator (Cmol3D) MOPAC (Molecular Orbital PACkage) is a semiempirical quantum chemistry program based on Dewar and Thiel's NDDO approximation Codessa Pro (Comprehensive Descriptors for Structural and Statistical Analysis) is a software suite for developing quantitative structure-activity/property relationships Matrix former QSAR Model builder Quantitative structure-activity relationship (QSAR) is the process by which chemical structure is quantitatively correlated with a well defined process, such as biological activity or chemical reactivity. (Chemical) Property Predictor File format converters (to integrate the previous tools into a workflow) Applications in CancerGrid
The CancerGrid infrastructure • PRODUCTION system • gUSE portal • https://grid.cancergrid.eu/gridsphere/gridsphere • BOINC server (private desktopgrid with firewall and controlled donor access) • https://grid.cancergrid.eu/cancergrid • Monitoring info • https://grid.cancergrid.eu/cancergrid/hostinfo.php • 69 machines (AMRI 10, SZTAKI56, UPF2, UoJ1) • TEST system • gUSE portal • https://cancergrid.lpds.sztaki.hu/gridsphere • BOINC server (private desktopgrid with firewall and controlled donor access) • https://cancergrid.lpds.sztaki.hu/cgrid/ops/ • Monitoring info • https://cancergrid.lpds.sztaki.hu/cgrid/hostinfo.php Performance measurements: