1 / 35

Data Grid Services and Pipelines

Data Grid Services and Pipelines. Arun Jagatheesan Architect & Technical Lead, SDSC Matrix arun@sdsc.edu. NPACI Summer Computing Institute August 18, 2003, San Diego. Credit / Acknowledgements. Participants Allen Ding Lucas Gilbert Reena Mathew Erik Vandiekieft (IBM) Xi Cynthia Sheng

kathie
Download Presentation

Data Grid Services and Pipelines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Grid Services and Pipelines Arun Jagatheesan Architect & Technical Lead, SDSC Matrix arun@sdsc.edu NPACI Summer Computing Institute August 18, 2003, San Diego

  2. Credit / Acknowledgements • Participants • Allen Ding • Lucas Gilbert • Reena Mathew • Erik Vandiekieft (IBM) • Xi Cynthia Sheng • Well Wishers • Reagan Moore & SRB Team • Kim Baldridge • YOU !!! • Sponsors • NSF GriPhyN, NSF SCEC, NPACI REU, NIH BIRN

  3. Lecture Outline • Concepts • Distributed Data Management • Process Flow Pipelines • Web Services; Grid Services • Theory • Data Grid Language (DGL) • Practice (Hands-on) • SDSC Matrix • Web Demo • Matrix Java API

  4. Grid as Utility Computing

  5. myActiveNeuroCollection patientRecordsCollection image.cgi image.wsdl image.sql E:\srbVault\image.jpg /users/srbVault/image.jpg Select … from srb.mdas.td where... Logical Layers (bits,data,information,..) Inter-organizational Information Storage Management Semantic data Organization (with behavior) Virtual Data Transparency Data Replica Transparency image_0.jpg…image_100.jpg Data Identifier Transparency Storage Location Transparency Storage Resource Transparency

  6. Is that all? We need more Hey, Who is this Guy?

  7. Digital entities Meta-data Services State Data  Discovery New data updates relationships among data in collections Services invoked to analyze new relationships DGMS applications get notified of state updates

  8. Services, Data flow pipeline Management Distributed Data Management • Data collecting • Sensor systems, object ring buffers • Data organization • Collections, manage data context • Data sharing • Data grids, manage heterogeneity • Data publication • Digital libraries, support discovery • Data preservation • Persistent archives, manage technology evolution • Data analysis • Processing pipelines, choreograph data and knowledge extraction • Data mediation • Semantic data, mappings between data, information, knowledge

  9. Input Compute Coordinated execution amongst flows Research Archive Digital Library Data process-flow pipelines

  10. Web Page (HTML) Searched and used by human being Any computer Useful for dissemination of information on any topic Web Service Searched and used by computer programs Any programming language, OS etc Useful for dissemination of services for any topic Web Services • XML/ WSDL – Web Service Description • SOAP (HTTP/SMTP) – Transport/Access • UDDI - Discover • WSDL • SOAP (HTTP/SMTP) • UDDI • HTML – describe data layout • HTTP – transport data • Google – discover data

  11. Lecture Outline • Concepts • Distributed Data Management • Process Flow Pipelines • Web Services; Grid Services • Theory • Data Grid Language (DGL) • Practice (Hands-on) • SDSC Matrix • Web Demo • Matrix Java API

  12. 121.Event DGL Thit.xml Hits.sql 121.Event XML based, Invoke Operations Subset Xquery Process flow National Lab University of Gators Need for Standard DGL Database (DBMS) SQL DDL, DML, DQL DGMS

  13. Data Grid Language • XML based asynchronous protocol • Describe data sets, collections, datagrid operations, ... • Access and Manage data grids, data-flow pipelines • Query on data resource (based on W3C XQuery) • Facilitates Grid Workflow • Sharing of granular state information about execution of each datagrid operation amongst different processes or services

  14. Data Grid Request (DReq) • Datagrid Request • Asynchronous requests for data/process-flow in datagrids • Requests are either a Transaction or a Status Query • Each Transaction consists of one or more Flows • Each Flow consists of one ore more datagrid operations • Datagrid operation = data transformation or data query • A flow can be executed sequential or parallel

  15. Data Grid Request

  16. Datagrid Response Either Transaction Acknowledgement or Status Response Status Response contains the results of a Transaction Response could be received at any granular level Status response is used for coordination of flows and inter-process notifications Data Grid Response

  17. Data Grid Response (DRes)

  18. Lecture Outline • Concepts • Distributed Data Management • Process Flow Pipelines • Web Services; Grid Services • Theory • Data Grid Language (DGL) • Practice (Hands-on) • SDSC Matrix • Web Demo • Matrix Java API

  19. “Lets play who wants to be a coder” Now its your turn to take the red pill from Matrix It gets interesting from here, lets us all do coding

  20. SDSC Matrix Architecture SOAP Service Wrapper Abstraction Event Publish Subscribe, Notification JMS Messaging System JAXM Wrapper OGSA RPC-Style for SOAP Matrix Data Grid Request Processor Status Query Handler Pipeline Query Processor Transaction Handler Flow Handler and Execution Manager XQuery Processor Termination Handler Data flow pipeline Meta data Manager Matrix Agent Abstraction Persistence (Store) Abstraction SRB Agents OGSA Agent WSDL Agent JDBC In Memory Store

  21. Lesson – 1 : Data Grid Request Create Data Grid Request and its components

  22. Learn it your self : Task - 1 • Create Flow(0) in a Data Grid Request [DGREQ] • Create a simple Data Grid Request using Web Demo • Add Flow • Make it Sequential • Add Step • Create Collection • Collection Name : <say My-First-Collection > • Click on Flow0 again, to add one more step in this Flow0 • Create Container • Container Name : <say My-First-Container> • Click on DGRequest link to see Flow0 with 2 steps

  23. Learn it your self : Task - 2 • Create Flow(1) in a Data Grid Request [DGREQ] • Click on DGRequest link to see Flow0 with 2 steps • Click on Add Flow • Make it of type parallel • Add Step • Rename Collection • Old Collection : <say My-First-Collection > to new name • Click on Flow1 link, to add one more step in this Flow1 • Create Collection • Collection Name : <say MyCollection-2> • Click on DGRequest link to see 2 Flows with 2 steps each

  24. Learn it your self : Task - 3 • Add Doc Meta for [DGREQ] • Click on DOCMETA • Fill your name (optional) • Press >> to save details • Doc Meta is just for reference. • The Author is the process which created the request. The Author could have created the request for another user

  25. Learn it your self : Task - 4 • Add USERINFO for [DGREQ] • Click on USERFINO • Add user id : <du22> • Organization: <npaci> • Challege Response: <class (password)> • Home Directory </home/du22.npaci • Storage Resource <hpss-sdsc> • Press >> to save

  26. Learn it your self : Task - 5 • Add VOINFO for [DGREQ] • Click on VOINO • Add Server : <srb.sdsc.edu> • Port: <5544> • Click >> to save this in our demo • VO Info is for Virtual Organization Information

  27. Learn it your self : Task - 5 • Send Data Grid Request • First check if all components are ready • We just learnt the components of a DReq. • They all must be [Y] in demo, indicating they are ready • Click Send • If all the components are ok, the Data Grid Request is shown in XML • Click Send DGReq

  28. Lesson – 2 : Data Grid Acknowledgement, Status Get Data Grid Acknowledgement, Send Status Request, Receive Status Response

  29. Data Grid Acknowledgement • Data Grid Requests responded asynchronously • Data Grid Acknowledgement • Transaction ID to get status and result of DGReq • All valid results are responded by this acknowledgement before they are processes • Clients use this Acknowledgement Transaction ID • The ID may be passed to third parties which can subscribe to these events (Grid Process Pipelines)

  30. Data Grid Status Req and Response • Transaction ID used to find status • Later versions can use publish/subscribe • Third party subscription also possible

  31. Lesson – 3 : Query Data XQuery

  32. XQuery • W3C’s long waited answer – next SQL? • As always, SDSC and our group lead the way • A subset of Xquery on Data Grid has been implemented • Built our own Xquery parser • Demo CDL (in house project for NPACI Chemistry Digital Library)

  33. Lesson – 4 : Java API Java API for Matrix

  34. Demo Java Program • Remember, its for programmatic exchange of state information for coordinated execution of data flow pipelines • Java API. Sample Program • Just download this zip file • Unzip the file • Type rundemo.bat • Type rundemoquery.bat

  35. Summary • Coordinated execution of process-flow pipelines in Grid Environment necessary • Data Grid Language in Data Grid like a SQL for databases • SDSC Matrix Process flow pipelines • Dynamic control of SRB and other services • Discovery of process based on the data • Check out our latest release • Imagine what we can do for your project

More Related