640 likes | 799 Views
Using Scientific Workflows in GEON. Efrat Jaeger, Ilkay Altintas. Mission of scientific workflow systems. Promote “scientific discovery” by providing tools and methods to generate scientific workflows
E N D
Using Scientific Workflows in GEON Efrat Jaeger, Ilkay Altintas
Mission of scientific workflow systems • Promote “scientific discovery” by providing tools and methods to generate scientific workflows • Create a generic customizable graphical user interface for scientists from different scientific domains • Support computational experiment creation, execution, sharing, reuse and provenance • Design frameworks which define efficient ways to connect to the existing data and integrate heterogeneous data from multiple resources • Large scale resource sharing and management • Collaborative and distributed applications • Gluing it all together to user’s monitor!!!
Utilizing Kepler in GEON • An extensible, easy to use, workflow design and prototyping tool • Integrating heterogeneous local and remote tools in a single interface: • Web and Grid services • GIS services • Legacy application integration via Shell-Command actor • Remote tools via SSH, SCP and GridFTP • Relational and spatial databases access • Reusable generic and domain specific actors • Support for High Performance Computations: • Job submission and monitoring • Logging of execution trace and registering intermediate products • Data provenance and failure recovery • Portal accessibility. • Deployment of workflows to the GEON portal • Harvesting data and tools from repositories: • Direct access to data and tools registered to the GEON portal • A web service harvester • Storage Resource Broker (SRB) • Reverse engineering of existing approaches
Actor-Oriented Design • Actor • Encapsulation of parameterized actions • Interface defined by ports and parameters • Port • Communication between input and output data • Without call-return semantics • Composite Actors • Abstract information • Sub-workflows • Model of computation (Director) • Communication semantics among ports • Flow of control
Workflow Design and Prototyping Data Search Actor Search • Vergil is the graphical user interface for Kepler • Actor ontology and semantic search for actors • Search -> Drag and drop -> Link via ports • Metadata-based search for datasets
Actor Search • Kepler Actor Ontology • Used in searching actors and creating conceptual views (= folders) • Currently more than 200 Kepler actors added!
Data Search and Usage of Results • Kepler DataGrid • Discovery of data resources through local and remote services • SRB, • Grid and Web Services, • Db connections • Registry of datasets on the fly using workflows
Integrating heterogeneous local and remote tools in a single interface • Generic Web Service Client and Web Service Harvester • GIS Services • Legacy Application Integration via Command Line wrapper tools, e.g. GMT • RDBMS and Spatial Databases Access • Remote Tools Access via SSH, SCP and GridFTP • Some Grid actors-Globus Job Runner, GridFTP-based file access, Proxy Certificate Generator • Generic and domain-oriented actors: • Classification and interpolation algorithms • Native R support • Imaging, Gridding, Vis Support • Textual and Graphical Output • …more …
Some Features • Support for High Performance Computations • Job submission and monitoring • Logging of execution trace and registering intermediate products • Data provenance and failure recovery • Portal accessibility • Deployment of workflows to the GEON portal • Harvesting data and tools from repositories • Direct access to data and tools registered to the GEON portal • A web service harvester • Storage Resource Broker (SRB)
GEON Mineral Classification Workflow An “early” example: Classification for naming Igneous Rocks.
PointInPolygon algorithm
Enter initial inputs, Run and Display results
Output Visualizers Browser display of results
Integration Scenario: A-type query • Classifying A-types from an Igneous rock database • Integrating between Relational and Spatial (shapefiles) databases to query and interactively display GIS results • Reusing existing and generic Kepler components (Classifier, JDBC) Ghulam Memon, Ashraf Memon
Classification sub-workflow runs for … … each body, each sample and each diagram Reusing The Mineral Classifier
SQL database access (JDBC) Query the UTEPgravity database for bouguer anomalies Extraction of Datasets on the Fly Translating query xml response to web service xml input format. worldImage XML SOAP response
Creating shapefiles on the fly using ESRI mapping services Translating query xml response to web service xml input format. worldImage XML SOAP response Displaying an image of the shapefile on a browser interface Extraction of Datasets on the Fly
Image of the resulting dataset Sample
Annotation form GEON Dataset Registration (as in geonSearch)
ADN metadata Metadata display validation GEON Dataset Registration Registering
Beach Balls Workflow GOAL: Integrate seismic focal mechanisms with image services
Gravity Modeling Workflow Observed Gravity Topography Pluton map Sediments Moho Output Residual Map Differencecalculator Densities Source: (GEON) Dogan Seber, Randy Keller Interactive 3D model Defining possible depth distribution of plutons
ToDo Kepler as a Modeling Tool: Gravity Modeling Workflow • Comparing between synthetic and observed gravity models of heterogeneous data sources. Creating a residual map of the difference using ESRI services and displaying it on a web browser • Portrays Kepler as a prototyping tool (“ToDo”) • Adjustable parameter-wise Joint work betweenSDSC and UTEP.
ToDo Gravity Modeling Workflow
R. Haugerud, U.S.G.S LiDAR Introduction Survey Interpolate / Grid Process & Classify D. Harding, NASA Point Cloud x, y, zn, … Analyze / “Do Science”
The Computational Challenge: • LiDAR generates massive data volumes - billions of returns are common. • Distribution of these volumes of point cloud data to users via the internet represents a significant challenge. • Processing and analysis of these data requires significant computing resources not available to most geoscientists. • Interpolation of these data challenges typical GIS / interpolation software. • our tests indicate that ArcGIS, Matlab and similar software packages struggle to interpolate even a small portion of these data. • Traditionally: Popularity > Resources
A Three-Tier Architecture • GOAL: Efficient LiDAR interpolation and analysis using GEON infrastructure and tools • GEON Portal • Kepler Scientific Workflow System • GEON Grid • Use scientific workflows to glue/combine different tools and the infrastructure Portal Grid
Analyze Visualize move process move render display Kepler can be used as a batch execution engine Portal • Configuration phase • Subset: DB2 query on DataStar Monitoring/ Translation Subset • Interpolate: Grass RST, Grass IDW, GMT… • Visualize: Global Mapper, FlederMaus, ArcIMS Scheduling/ Output Processing Grid
Analyze Arizona Cluster Visualize move process Datastar move render display Fledermaus CreateScene file iView3D/Browser sd d1 IBM DB2 NFS Mounted Disk Lidar Processing Workflow (using Fledermaus) Subset d2 d1 d2 (grid file) d1 d2 NFS Mounted Disk
Analyze Arizona Cluster Visualize move process Datastar move render display Global Mapper Get image for grid file Browser d1 IBM DB2 NFS Mounted Disk Lidar Processing Workflow (using Global Mapper) Subset d2 d1 d2 (grid file) d1 d2 NFS Mounted Disk
Analyze Arizona Cluster Visualize ArcIMS move process Datastar move render display ArcInfo ArcSDE ArcIMS d1 IBM DB2 NFS Mounted Disk Lidar Processing Workflow (using ArcIMS) Subset d2 (grid file) d1 d1 d2 NFS Mounted Disk
Lidar Workflow Portlet • User selections from GUI • Translated into a query and a parameter file • Uploaded to remote machine • Workflow description created on the fly • Workflow response redirected back to portlet
x,y,z and attribute Client/ GEON Portal NFS Mounted Disk DB2 Render Map raw data ArcSDE ArcInfo ArcIMS Parameter xml process output Create Workflow Description Map Parameters Grass Functions Map onto the grid (Pegasus) DB2 Spatial query Grass surfacing algorithms: Spline IDW block mean … Compute Cluster Binary grid ASCII grid Text file Tiff/Jpeg/Gif submit ASCII grid Download data KEPLER WORKFLOW LIDAR POST-PROCESSING WORKFLOW PORTLET
GLW Monitoring • Job management • A unified interface to follow up on the status of submitted jobs The system • View job metadata • Zoom to a specific bounding box location • Track errors • Modify a job and re-submist • View the processing results • In the future, register desired workflow products • Useful for publication • GLW is exposed to a high risk of components failures • Long running process • Distributed computational resources under diverse controlling authorities • Provides transparent/background error handling using provenance data and ‘smart’ reruns
Examples • Searching for actors and datasets • Actor search for ‘gis’ • Data search for ‘volcanic’ • Create a “Hello World!” workflow • <KEPLER_DIR>/demos/getting-started/04-HelloWorld.xml • Use of GEON data source and portal search • Search for ‘Igneous’ • Relational Database Access and Query • Connect to VT Igneous rocks database: • Database format: DB2 • URL: jdbc:db2://data.sdsc.geongrid.org:60000/IGNEOUS • User: readonly • Passwd: read0n1y • Web service based workflows • <KEPLER_DIR>/demos/getting-started/06-WebServicesAndDataTransformation.xml • Composite actors • Invoke a remote application – SSH • ls to a remote directory • Using various interpolation algorithms • interpolation actor • invoking a perl script through ssh • through a web service
Atype Workflow Demo