270 likes | 400 Views
The OGSA-DAI Project Databases and the Grid. Neil Chue Hong Project Manager EPCC, Edinburgh N.ChueHong@epcc.ed.ac.uk. What is OGSA-DAI?. It is a project: OGSA Data Access and Integration: funded by the UK eScience Grid Core Programme It is a vision:
E N D
The OGSA-DAI ProjectDatabases and the Grid Neil Chue Hong Project Manager EPCC, Edinburgh N.ChueHong@epcc.ed.ac.uk http://www.ogsadai.org.uk
What is OGSA-DAI? • It is a project: • OGSA Data Access and Integration: funded by the UK eScience Grid Core Programme • It is a vision: • From simple database access to truly virtualised data resources • It is a standard: • The GridDataService Specification from the Data Access and Integration Working Group (DAIS-WG) of the Global Grid Forum (GGF) • It is software that you can use: • Current version is R2.5 http://www.ogsadai.org.uk
OGSA-DAI Objective • To define: • open standards and • open source based • uniform service interfaces • for accessing heterogeneous data sources • within the Open Grid Services Architecture (OGSA) framework • Why? • Because we are increasingly wanting to integrate different data sources from different organisations together • The Grid, and OGSA, appears to provide a framework for producing software to do this http://www.ogsadai.org.uk
Contributing to the global grid computing community IBM USA EPCC& NeSC Glasgow Newcastle Belfast Manchester Daresbury Lab Cambridge Oxford EPCC & NeSCIBM UK IBM USA Manchester e-SC Newcastle e-SCOracle 373 man months Oracle Hinxton RAL Cardiff London IBM Hursley Southampton £3 million, 18 months, started February 2002 Funded by the Grid Core Programme Who are we? http://www.ogsadai.org.uk
Data Intensive Applications Scientific Data Mining & Integration Technology Distributed Scheduling Accounting Grid Plumbing & Security Infrastructure Monitoring Diagnosis Logging Data & Storage Resources What are we doing? http://www.ogsadai.org.uk
Data Intensive Applications Scientific Data Mining & Integration Technology Data Integration Distributed Scheduling Accounting Authorisation Data Access Grid Plumbing & Security Infrastructure Monitoring Diagnosis Logging Data & Storage Resources Structured Data What are we doing? http://www.ogsadai.org.uk
App. Developers Data Intensive Applications Scientific Data Mining & Integration Technology Data Integration Distributed Scheduling Accounting Authorisation Data Access Grid Plumbing & Security Infrastructure Operations Team Monitoring Diagnosis Logging Owners Data & Storage Resources Structured Data What are we doing? http://www.ogsadai.org.uk
App. Developers Tech. Developers Data Integration Distributed Scheduling Accounting Authorisation Data Access Operations Team Monitoring Diagnosis Logging Owners Data & Storage Resources Structured Data Data Providers Data Curators What are we doing? Data Intensive Application Scientists Data Intensive Applications Scientific Data Mining & Integration Technology Grid Plumbing & Security Infrastructure http://www.ogsadai.org.uk
DAIS WG • GridDatabaseService Specification • DAIS WG of the GGF • Aim to produce a V1.0 specification by early 2004 • Defines an interface for a GridDatabaseService • May contributors, not just OGSA-DAI Project • OGSA-DAI (the software) seeks to be a reference implementation of this standard • But does not necessarily track it exactly just now • Requirements and Overview Informational documents also published http://www.ogsadai.org.uk
The OGSA-DAI Approach • Reuse existing technologies and standards • OGSA, Query languages, Java, transport • Three key services: • GridDataService • GridDataServiceFactory • DAIServiceGroupRegistry • Benefits: • Location independence • Hides heterogeneity • Scalable • Flexible • Dynamic http://www.ogsadai.org.uk
Data Format Drivers Query (Create Retrieve Update Delete) OGSA-DAI Positioning - Today OGSA-DAI Distributed Query OGSA-DAI Basic Services GDSF DAISGR GDS Delivery OGSA Meta Data Notification Lifetime Location Database, Communication, OS… Technology http://www.ogsadai.org.uk
OGSA-DAI To Date • Assuming that OGSA becomes the standard framework • Have adopted the OGSA approach • Have first concentrated on data access • Released software has only limited data integration so far • Distributed query processor prototype due in July • Implementation provides focus on basic functionality first • But architecturally we have tried to answer many pertinent questions • Functionality will increase over subsequent releases http://www.ogsadai.org.uk
SOAP/HTTP service creation API interactions GDS in action Registry DAISGR 1a. Request to Registry for sources of data about “x” 1b. Registry responds with Factory handle 2a. Request to Factory for access to database Factory GDSF Analyst 2c. Factory returns handle of GDS to client 2b. Factory creates GridDataService to manage access 3a. Client queries GDS with SQL, XPath, XQuery etc 3c. Results of query returned to client as XML Database (Xindice MySQL Oracle DB2) Grid Data Service GDS OR3d. Results of query delivered to consumer as XML 3b. GDS interacts with database Consumer http://www.ogsadai.org.uk
Activities • OGSA-DAI is structured around the concept of activities • This framework allows new functionality to be added easily • Three types of activity at present: • statement (e.g. SQLQuery, Xupdate) • transformation (e.g. XSL translation, compression) • delivery (e.g. GridFTP) • OGSA-DAI provides implementations of common functionality, others can extend http://www.ogsadai.org.uk
Documents • Accessing a Grid Data Resource is done using Documents • caveat: this may change • A document allows you to: • define parameters • execute activities • deliver results • Written in XML, normally used by a client. <gridDataServicePerform> <request name=“myRequest”> <parameter name=“idname”> <value name=“idvalue”>10</value> </parameter> <sqlQueryStatement name=“myStatement”> <sqlParameter position=“1” from=“idvalue”/> <expression> SELECT * FROM littleblackbook WHERE id=? </expression> <webRowSetStream name=“statementresult”/> </sqlQueryStatement> <deliverToResponse name=“d1”> <fromLocal from=“statementresult”/> </deliverToResponse> </request> </gridDataServicePerform> http://www.ogsadai.org.uk
OGSA-DAI Core Services • OGSA-DAI Release 2.5 – out now • Java, Tomcat, Globus Toolkit 3 Beta • Supports MySQL, DB2, Xindice; SQL92, XPath, Xupdate • OGSA-DAI Release 3 – end July • Java, Tomcat, Globus Toolkit 3.0 • Supports MySQL, DB2, Oracle, Xindice; SQL92, XPath, Xupdate • Adds Notification, Internationalisation, Transactions, Caching • Continue to track Globus Toolkit 3 releases • Experimental, then production, GT3 grids will help http://www.ogsadai.org.uk
GDS Instance Q Rs DT D + GDH GSH/R + data id Ra Consumer GDS Instance Q + D + GSH/R Rs DT GSH/R Ra Consumer 3 3 Client Client 2 1 1 2 GDS GDS GDT GDT DB DB Asynchronous Delivery • Asynchronous delivery – Pull • Asynchronous delivery – Push http://www.ogsadai.org.uk
GDS Client Client Client Client Client Operation Operation Operation Operation Operation Operation Operation Operation Operation Operation Operation Operation Operation 1 2 3 5 4 DB DB DB DB DB GDS Composition GDS GDS GDS GDS GDS GDS GDS GDS GDS GDS GDS http://www.ogsadai.org.uk
Distributed Query Service • A higher level service: • Extension of Polar* query processor, partitions and schedules queries • Sits on top of OGSA and OGSA-DAI • Defines new portTypes and services • GridDistributedQuery(GDQ) PortType • GridDistributedQueryService(GDQS) – wraps Polar* • GridQueryEvaluatorService(GQES) – perform subqueries • Currently based on OGSA-DAI Release 1.5 http://www.ogsadai.org.uk
DQS Architecture http://www.ogsadai.org.uk
DQP in action http://www.ogsadai.org.uk
DQS: the future • The GridDistributedQueryService • is an example of a higher level data integration service which utilises OGSA-DAI core services • Assumes that GDSF, GDQS Factory and client live in different containers • Really requires a well-defined meta-model for the physical schema of a database • Being partially addressed in DAIS WG • Shows how a GDS can be both client and service • Service hierarchy and composition • DAIT (proposed follow-on to OGSA-DAI) would produce a robust reference implementation of the DQP components http://www.ogsadai.org.uk
Projects using OGSA-DAI • Industry: • FirstDIG: business process analysis (with First Transport Group) • OGSA-DAI with datamining • Collaborative • Bridges: database integration over six geographically distributed genomics research sites (with IBM UK) • OGSA-DAI with DiscoveryLink • eDIKT: porting OGSA-DAI to other platforms • OGSA-DAI with performance • DEISA: linking Europe’s HPC centres • OGSA-DAI with distributed accounting • MS .Net Grid: porting OGSA-DAI to the .Net framework (with Microsoft Research UK) • OGSA-DAI with .Net http://www.ogsadai.org.uk
Client Query Query Render GTI HGU ODD Genes • OGSA-DAI used to query gene expression data resources at GTI and HGU • One data resource: low spatial resolution, high gene resolution • Other resource: high spatial resolution, low gene resolution • Query one database and use data to find correct data resource to run more detailed query and produce visualisation • Simple example of data integration at work GDS GDS EPCC http://www.ogsadai.org.uk
today GT3 Beta GT3 A1 GT3 A3 TP5 TP4 GT3 Final GT3 A4 GT3 A2 Project Timeline WS + GSI UK support ( > 100 downloads) XML + OGSA Prototypes for Early Adopters Design Documents & Demos for DAIS WG @ GGF5 XML + OGSA Prototype Available RDB + GT2 / OGSA Prototypes Available GGF6 WG Papers & Prototypes Early Adopters Workshop @ NeSC Ship Release 1 (Jan 15th 2003) OGSADAI Tutorial @ NeSC Release 1.5 (Feb 28th 2003) Tutorial @ GGF7 Release 2 Tutorial @ NeSC Release 2.5 Release 3 Feb ’02 May ’02 Jul ’02 Sep ’02 Dec ’02 Feb ’03 May ’03 Sep ’03 Phase 2 Starts Phase 1 Starts http://www.ogsadai.org.uk
A DAIT for the Future • DAIT (Data Access and Integration Two) • follow on project from OGSA-DAI, funded for two years • continue to research, prototype and productise • release every six months, R4 in December 2003 • R4: • support for SQL Server and structured filesystems • extended DBMS management functionality (e.g. archive) • bulk load operations (where supported) • support for DFDL file access • triggers exposed through notification • R5 • Distributed Query Processing, Distributed Transactions • Virtualised views across databases http://www.ogsadai.org.uk
Further information • The OGSA-DAI Project Site: • http://www.ogsadai.org.uk • The DAIS-WG site: • http://cs.man.ac.uk/grid-db • OGSA-DAI Users Mailing list • users@ogsadai.org.uk • General discussion on grid data access and integration • Formal support for OGSA-DAI releases • http://www.ogsadai.org.uk/support + support@ogsadai.org.uk • OGSA-DAI training courses • http://www.ogsadai.org.uk/courses/ http://www.ogsadai.org.uk