1 / 23

Data access and integration with OGSA-DAI: OGSA-DQP

Data access and integration with OGSA-DAI: OGSA-DQP. Steven Lynden University of Manchester. Introduction. OGSA-DQP is a service based distributed query processor It evaluates queries over distributed data sources wrapped by OGSA-DAI It is built using OGSA-DAI extensibility points

ranae
Download Presentation

Data access and integration with OGSA-DAI: OGSA-DQP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data access and integration with OGSA-DAI:OGSA-DQP Steven Lynden University of Manchester

  2. Introduction • OGSA-DQP is a servicebased distributed query processor • It evaluates queries over distributed data sources wrapped by OGSA-DAI • It is built using OGSA-DAI extensibility points • People involved: • University of Manchester • Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou, Norman Paton • University of Newcastle • Jim Smith, Arijit Mukherjee, Paul Watson • OGSA-DAI • Prototype release 3.0 available from the OGSA-DAI website • Install on OGSA-DAI WSRF/WS-I 2.1 Data access & integration with OGSA-DAI: GGF 17

  3. OGSA-DQP high-level overview • OGSA-DQP uses a middleware approach. • It can be seen as a mediator over OGSA-DAI wrappers. • Usability: use it as an OGSA-DAI data service. • DQP is capable of planning, scheduling and executing in parallel the distributed queries • Calls to analysis (Web) services can be declared within queries and invoked by DQP. Query Results OGSA-DQP OGSA-DAI OGSA-DAI DBMS DBMS data data Data access & integration with OGSA-DAI: GGF 17

  4. Using OGSA-DQP • All interactions are client-server based • Firstly, configure OGSA-DQP by specifying the data sources and analysis services to be used (administration) • DQP creates a global schema which can then be used to formulate queries • The user may then submit queries • Infrastructural requirements: • OGSA-DAI-wrapped relational databases • Analysis services (optional) • Evaluation infrastructure Data access & integration with OGSA-DAI: GGF 17

  5. OGSA-DQP architecture Evaluator QE DQP activities installed OGSA-DAI data service Evaluator perform QE Evaluator QE The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator” AKA Grid Query Evaluation Service (GQES) Data access & integration with OGSA-DAI: GGF 17

  6. OGSA-DQP architecture • DQP evaluator services: • Are plain Web services • Implement the QueryEvaluation port type: • evaluate – the input is a query plan partition which is subsequently executed • receiveData – allows the evaluator to receive data from other evaluators • OGSA-DAI extensions: • DQP resource – a resource which encapsulates a distributed query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor. • OQL query statement activity – enables the submission of a query in Object Query Language (OQL) • DQP factory activity – enables the creation and configuration of DQP resources. Data access & integration with OGSA-DAI: GGF 17

  7. Example query • Given two DBMSs and one analysis tool (i.e., a Web service): • goTerm : a GO Gene Ontology table in a remote mySQL DB, exposed by an OGSA-DAI data service • protein : a table in a protein sequence DB, exposed by an OGSA-DAI data service • Blast (sequence alignment scoring Web service); • We want to obtain alignment scores for a sequence against proteins of a certain kind • The user submits a single query referencing data stored at multiple sites. • The author of the query need not be aware of how/where data is stored. • Queries are written in Object Query Language (OQL): select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId=t.proteinId Data access & integration with OGSA-DAI: GGF 17

  8. Background: OQL • Why? • OGSA-DQP is based on a parallel distributed query processor for object databases (Polar*) • The standard query language of object databases is OQL • Polar* is still used by DQP to parse, optimise and schedule queries • Instead of querying object databases, we are now querying relational databases • OQL queries are compiled by Polar* into distributed query plans. • During the execution of the query plan, DQP will query relational data sources using SQL. Data access & integration with OGSA-DAI: GGF 17

  9. Client interaction with OGSA-DQP • Two main client/server interactions: • Configuration: the client sends a perform document requesting the service to create a DQP data service resource • Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1) The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource Data access & integration with OGSA-DAI: GGF 17

  10. DQP configuration <perform> <DQPFactory> Evaluator URLs OGSA-DAI data service resources Web service URLs </DQPFactory> </perform> OGSA-DAI data service GetRP OGSA-DAI data service OGSA-DAI data service GetRP perform DQP factory activity Result: resource ID of created DSR creates DQP DSR • Global schema of imported DBs & analysis services • Set of evaluators that can be used • Physical DB metadata (used to optimise queries) Data access & integration with OGSA-DAI: GGF 17

  11. DQP query evaluation <perform> <OQLQueryStatement> <expression> OQL query </expression> </OQLQueryStatement> </perform> Evaluator OGSA-DAI data service perform QE OGSA-DAI data service Evaluator Analysis service transport perform . . . QE OQLQueryStatement DQP DSR Evaluator OGSA-DAI data service perform QE Result: WebRowSet XML Stream Data access & integration with OGSA-DAI: GGF 17

  12. Interacting with an OGSA-DQP service • Three options: • A command line client • Allows configuration and query submission via the execution of Apache Ant scripts • Client toolkit classes • Allow you to integrate OGSA-DQP into yourown applications [The above utilities are part of the main OGSA-DQP download] • GUI client Data access & integration with OGSA-DAI: GGF 17

  13. Command-line client Configuration example: $ ant factory -Ddqp.config.file=config.xml -Durl=http://rpc122.cs.man.ac.uk/axis/services/service1 -Dresource.id=dqp-factory Querying the global schema – example: $ ant getschemas -Durl=http://rpc122.cs.man.ac.uk/axis/services/service1 -Dresource.id=ogsadai-911acvd122 Data access & integration with OGSA-DAI: GGF 17

  14. Command-line client Query submission example: $ ant query -Durl=http://rpc122.cs.man.ac.uk/axis/services/service1 -Dresource.id=ogsadai-911acvd122 -Dclient.query=“%print select i.id from i in go_goterms;” -Dclient.output.file=results.xml • Results will be saved as a WebRowSet, the standard XML representation of relational results used by OGSA-DAI Data access & integration with OGSA-DAI: GGF 17

  15. Client toolkit classes • Client toolkit classes are provided for the activities contributed by OGSA-DQP: • GDQSFactory class used to construct DQPFactory activities • OQLQuery class used to construct OQLQueryStatement activities • The client toolkit allows the integration of DQP with other applications and seamless interaction with the OGSA-DAI client toolkit • OGSA-DQP client toolkit is Java only… Data access & integration with OGSA-DAI: GGF 17

  16. Query execution using client toolkit 1 GenericServiceFetcher fetcher = GenericServiceFetcher.getInstance(); 2 DataService service = fetcher.getDataService(url,resourceID); 3 OQLQuery oqlQuery = new OQLQuery(query); 4 OutputStreamActivity outputStream = new OutputStreamActivity(); 5 outputStream.setInput( oqlQuery.getOutput() ); 6 ActivityRequest request = new ActivityRequest(); 7 request.add( oqlQuery ); 8 service.perform(request); 9 oqlQuery.getResultSet(); 10 java.sql.ResultSet rs = outputStream.getResultSet(); Data access & integration with OGSA-DAI: GGF 17

  17. Demo: The GUI Client • The GUI allows you to: • Interact with OGSA-DQP services. The GUI is pre-configured with the URL of a OGSA-DQP service we have deployed at EPCC. • View the configuration parameters of DQP data service resources • View the global schema maintained by a DQP data service resource • Submit OQL queries to DQP data service resources • View the results of queries • View graphical and XML representations of query plans Data access & integration with OGSA-DAI: GGF 17

  18. Services @ Newcastle University giga01.ncl.ac.uk giga02.ncl.ac.uk Evaluator service OGSA-DAI data service Evaluator service OGSA-DAI data service GO Term DB Protein interaction DB giga03.ncl.ac.uk giga04.ncl.ac.uk Evaluator service OGSA-DAI data service Evaluator service OGSA-DAI data service Protein Term DB Protein property DB Data access & integration with OGSA-DAI: GGF 17

  19. Services @ Newcastle University giga07.ncl.ac.uk giga06.ncl.ac.uk giga05.ncl.ac.uk Evaluator service Evaluator service Evaluator service OGSA-DAI data service giga08.ncl.ac.uk giga09.ncl.ac.uk Protein Sequence DB Evaluator service Evaluator service Entropy analyser service Data access & integration with OGSA-DAI: GGF 17

  20. Database tables GO Terms extent name: “goterms_goterms” Protein interactions Extent name: “interaction_protein_interactions” Data access & integration with OGSA-DAI: GGF 17

  21. Database tables Protein terms extent name: “protein_term_protein_goterm” Protein properties extent name: “protein_property_protein_propertys” Protein sequence extent name: “protein_sequence_protein_sequences” Data access & integration with OGSA-DAI: GGF 17

  22. DQP service @ EPCC test.ogsadai.org.uk DQP factory OGSA-DAI data service GIGA resource ogsadai-1092f60c1e1 Encapsulates the distributed query environment deployed at Newcastle Data access & integration with OGSA-DAI: GGF 17

  23. Conclusion • OGSA-DQP is a service based distributed query processor that is: • Exposed as a service • Implemented as an orchestration of services • It provides an example of how the OGSA-DAI extensibility points can be used… • The activity extensibility points are used • New data resource accessors are implemented • Dynamic resource deployment is used during configuration to create new resources • Benefits: • OGSA-DAI manages activity concurrency – we didn’t need to write concurrent code • OGSA-DQP can take advantage of the host of delivery options provided by OGSA-DAI • OGSA-DQP is insulated from multiple platforms (WS-I, WSRF) by OGSA-DAI Data access & integration with OGSA-DAI: GGF 17

More Related