310 likes | 430 Views
Dataflows in SRB using SDSC Matrix. Arun Jagatheesan Architect & Team Lead, SDSC Matrix San Diego Supercomputer Center. 10 th Annual NPACI/SDSC Summer Computing Institute August 23-27, 2004, Sun Diego, California, USA . Talk Outline. Introduction to Gridflows
E N D
Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team Lead, SDSC Matrix San Diego Supercomputer Center 10th Annual NPACI/SDSC Summer Computing Institute August 23-27, 2004, Sun Diego, California, USA
Talk Outline • Introduction to Gridflows • Introduction to SDSC Matrix Project • Data Grid Language • Architecture of SDSC Matrix • Matrix Usage • What can you do for Matrix?
Acknowledgement • Jonathan Weinberg • Daniel Moore • Allen Ding • Reena Mathew • Erik Vandekieft • SRB Team • You! – ( hey your name can be here ) SDSC SRB, NSF GriPhyN, NSF SCEC, DoE Portals Project,
Gridflows (Grid Workflow) • Automation of an execution pipeline • Data and/or tasks processed by multiple autonomous grid resources • According to set of procedural rules • Confluence of multiple autonomous administrative domains • GridFlow Execution Servers • By themselves are from autonomous administrative domains • P2P (Distributed) Control
Talk Outline • Introduction to Gridflows • Introduction to SDSC Matrix Project • Data Grid Language • Architecture of SDSC Matrix • Matrix Usage • What can you do for Matrix?
SDSC Matrix Project • CS Research & Development • Gridflow Description, Data Grid Administration Rules • Gridflow P2P protocols for Gridflow Server Communication • Development • SRB Data Grid Web Services • SRB Datagrid flow automation and provenance • Theory Practice • Help in customized development & deployment of gridflow concepts in scientific / grid applications • Visibility and assist in standardization of efforts at GGF
Advantages from SRB Perspective • Reduces the Client-Server Communication • The whole execution logic is sent to the server • Less number of WAN messages • Our experiments prove significant increase in performance • Datagrid Information Lifecycle Management • Autonomic: “Move data at 9:00 PM in weekdays and in week ends” • Data Grid Administration • Power-users and Sophisticated Users • Data Grid Administrator (Rules to manage data grid) • Scientist or Librarian (Visualized data flow programming)
Talk Outline • Introduction to Gridflows • Introduction to SDSC Matrix Project • Data Grid Language • Architecture of SDSC Matrix • Matrix Usage • What can you do for Matrix?
What they want? We know the business (scientific) process CyberInfrastructure is all we care (why bother about atoms or DNA)
What they want? Use DGL to describe your process logic with abstract references to datagrid infrastructure dependencies
Why a Gridflow Language? • Infrastructure independent description • Abstract references to hardware and cyberinfrastructure • Description of execution flow logic • Separate the execution flow logic from application logic • (e.g) MonteCarlo is an application, execution of that 10 times or till a variable becomes zero is execution logic • Procedural Rules associated with execution flow • Provenance • What happened, when, who, how …? (and querying)
Gridflow Language Requirements • High level Abstract descriptions • Abstract description of cyberinfrastructure dependencies • Simple yet flexible • Flexible to describe complex requirements (no brute force) • Gridflow dependency patterns • Based on execution structure and data semantics • (Parallel, Sequential, fork-new), (milestones, for-each, switch-case).. • Asynchronous execution • For long-run requests • Querying using existing standard • XQuery
Gridflow Language Requirements • Process meta data and annotations • Runtime definition, update and querying of meta-data • Runtime Management of Gridflows • Stop gridflow at run time • Partitioning • Facility in language to divide a gridflow request to multiple requests (Excellent Research Topic) • Import descriptions • Refer other gridflows in execution
Data Grid Language (DGL) • XML based gridflow description • Describes execution flow logic • ECA-based rule description for execution • ECA = Event, Condition, Action • Querying of Status of Gridflow • XQuery / Simple query of a Gridflow Execution • Scoped variables and gridflow patterns • For control of execution flow logic
DGL Requests • Data Grid Flow • An XML Structure that describes the execution logic, associated procedural rules and grid environment variables • Status Query • An XML Structure used to query the execution status any gridflow or a sub-flow at any granular level • A DGL or Matrix client sends any of these to the Matrix Server
Data Grid Request Annotations about the Data Grid Request Can be either a Flow or a Status Query
Grid User • <GridUser> • <userID>Matrix-demo</userID> • <organization> • <organizationName>sdsc</organizationName> • </organization> • <challenge-Response>******</challenge-Response> • <homeDirectory>/home/Matrix-demo.sdsc</homeDirectory> • <defaultStorageResource>sdsc-unix</defaultStorageResource> • <phoneNumber>0</phoneNumber> • <e-mail>arun@sdsc.edu</e-mail> • </GridUser>
Flow Scoped Variables that can control the flow Logic used by the sub-members Sub-members that are the real execution statements
Talk Outline • Introduction to Gridflows • Introduction to Matrix • Data Grid Language • Architecture of SDSC Matrix • Matrix Usage • What can you do for Matrix?
Event Publish Subscribe, Notification JMS Messaging Interface Matrix Gridflow Server Architecture JAXM Wrapper WSDL Description SOAP Service for Matrix Clients Matrix Data Grid Request Processor Sangam P2P Gridflow Broker and Protocols Transaction Handler Workflow Query Processor Status Query Handler Flow Handler and Execution Manager XQuery Processor Gridflow Meta data Manager ECA rules Handler Persistence (Store) Abstraction Matrix Agent Abstraction SDSC SRB Agents Other SDSC Data Services Agents for java, WSDL and other grid executables JDBC In Memory Store
Talk Outline • Introduction to Gridflows • Introduction to Matrix • Data Grid Language • Architecture of SDSC Matrix • Matrix Usage • What can you do for Matrix?
Using XML-Editor • Only XML (DGL) file required • All that is needed is a DGL file that has to be sent to the server • Use XML Editor to make DGL file • XMLSpy® could be used • Send it to the Matrix Server • Use the Java Program DGLSender.java
Using Java API • Download our Matrix Java Client • Programmatically create a request • Use it in your java program to interact with the grid and develop a local application • http://www.npaci.edu/DICE/SRB/matrix/Software/index.html
Using WSDL • Use the WSDL to create a SOAP based client in any programming language or your preference
Using DG-Modeler • GUI for dataflow programming
Gridflow Process I Gridflow Description Data Grid Language End User using DGBuilder
Planner Concrete Gridflow Using Data Grid Language Gridflow Process II Abstract Gridflow using Data Grid Language
Gridflow Processor Concrete Gridflow Using Data Grid Language Gridflow Process III Gridflow P2P Network
got ideas/suggestions?Contact: SDSC Matrix project arun@sdsc.edu Google key word: SDSC Gridflow Click here to start the slide show again