230 likes | 392 Views
Data access and integration with OGSA-DAI: OGSA-DQP. Steven Lynden University of Manchester. Introduction. OGSA-DQP is a service based distributed query processor It evaluates queries over distributed data sources wrapped by OGSA-DAI It is built using OGSA-DAI extensibility points
E N D
Data access and integration with OGSA-DAI:OGSA-DQP Steven Lynden University of Manchester
Introduction • OGSA-DQP is a servicebased distributed query processor • It evaluates queries over distributed data sources wrapped by OGSA-DAI • It is built using OGSA-DAI extensibility points • People involved: • University of Manchester • Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou, Norman Paton • University of Newcastle • Jim Smith, Arijit Mukherjee, Paul Watson • OGSA-DAI • Prototype release 3.0 available from the OGSA-DAI website • Install on OGSA-DAI WSRF/WS-I 2.1 Data access & integration with OGSA-DAI: GGF 17
OGSA-DQP high-level overview • OGSA-DQP uses a middleware approach. • It can be seen as a mediator over OGSA-DAI wrappers. • Usability: use it as an OGSA-DAI data service. • DQP is capable of planning, scheduling and executing in parallel the distributed queries • Calls to analysis (Web) services can be declared within queries and invoked by DQP. Query Results OGSA-DQP OGSA-DAI OGSA-DAI DBMS DBMS data data Data access & integration with OGSA-DAI: GGF 17
Using OGSA-DQP • All interactions are client-server based • Firstly, configure OGSA-DQP by specifying the data sources and analysis services to be used (administration) • DQP creates a global schema which can then be used to formulate queries • The user may then submit queries • Infrastructural requirements: • OGSA-DAI-wrapped relational databases • Analysis services (optional) • Evaluation infrastructure Data access & integration with OGSA-DAI: GGF 17
OGSA-DQP architecture Evaluator QE DQP activities installed OGSA-DAI data service Evaluator perform QE Evaluator QE The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator” AKA Grid Query Evaluation Service (GQES) Data access & integration with OGSA-DAI: GGF 17
OGSA-DQP architecture • DQP evaluator services: • Are plain Web services • Implement the QueryEvaluation port type: • evaluate – the input is a query plan partition which is subsequently executed • receiveData – allows the evaluator to receive data from other evaluators • OGSA-DAI extensions: • DQP resource – a resource which encapsulates a distributed query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor. • OQL query statement activity – enables the submission of a query in Object Query Language (OQL) • DQP factory activity – enables the creation and configuration of DQP resources. Data access & integration with OGSA-DAI: GGF 17
Example query • Given two DBMSs and one analysis tool (i.e., a Web service): • goTerm : a GO Gene Ontology table in a remote mySQL DB, exposed by an OGSA-DAI data service • protein : a table in a protein sequence DB, exposed by an OGSA-DAI data service • Blast (sequence alignment scoring Web service); • We want to obtain alignment scores for a sequence against proteins of a certain kind • The user submits a single query referencing data stored at multiple sites. • The author of the query need not be aware of how/where data is stored. • Queries are written in Object Query Language (OQL): select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId=t.proteinId Data access & integration with OGSA-DAI: GGF 17
Background: OQL • Why? • OGSA-DQP is based on a parallel distributed query processor for object databases (Polar*) • The standard query language of object databases is OQL • Polar* is still used by DQP to parse, optimise and schedule queries • Instead of querying object databases, we are now querying relational databases • OQL queries are compiled by Polar* into distributed query plans. • During the execution of the query plan, DQP will query relational data sources using SQL. Data access & integration with OGSA-DAI: GGF 17
Client interaction with OGSA-DQP • Two main client/server interactions: • Configuration: the client sends a perform document requesting the service to create a DQP data service resource • Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1) The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource Data access & integration with OGSA-DAI: GGF 17
DQP configuration <perform> <DQPFactory> Evaluator URLs OGSA-DAI data service resources Web service URLs </DQPFactory> </perform> OGSA-DAI data service GetRP OGSA-DAI data service OGSA-DAI data service GetRP perform DQP factory activity Result: resource ID of created DSR creates DQP DSR • Global schema of imported DBs & analysis services • Set of evaluators that can be used • Physical DB metadata (used to optimise queries) Data access & integration with OGSA-DAI: GGF 17
DQP query evaluation <perform> <OQLQueryStatement> <expression> OQL query </expression> </OQLQueryStatement> </perform> Evaluator OGSA-DAI data service perform QE OGSA-DAI data service Evaluator Analysis service transport perform . . . QE OQLQueryStatement DQP DSR Evaluator OGSA-DAI data service perform QE Result: WebRowSet XML Stream Data access & integration with OGSA-DAI: GGF 17
Interacting with an OGSA-DQP service • Three options: • A command line client • Allows configuration and query submission via the execution of Apache Ant scripts • Client toolkit classes • Allow you to integrate OGSA-DQP into yourown applications [The above utilities are part of the main OGSA-DQP download] • GUI client Data access & integration with OGSA-DAI: GGF 17
Command-line client Configuration example: $ ant factory -Ddqp.config.file=config.xml -Durl=http://rpc122.cs.man.ac.uk/axis/services/service1 -Dresource.id=dqp-factory Querying the global schema – example: $ ant getschemas -Durl=http://rpc122.cs.man.ac.uk/axis/services/service1 -Dresource.id=ogsadai-911acvd122 Data access & integration with OGSA-DAI: GGF 17
Command-line client Query submission example: $ ant query -Durl=http://rpc122.cs.man.ac.uk/axis/services/service1 -Dresource.id=ogsadai-911acvd122 -Dclient.query=“%print select i.id from i in go_goterms;” -Dclient.output.file=results.xml • Results will be saved as a WebRowSet, the standard XML representation of relational results used by OGSA-DAI Data access & integration with OGSA-DAI: GGF 17
Client toolkit classes • Client toolkit classes are provided for the activities contributed by OGSA-DQP: • GDQSFactory class used to construct DQPFactory activities • OQLQuery class used to construct OQLQueryStatement activities • The client toolkit allows the integration of DQP with other applications and seamless interaction with the OGSA-DAI client toolkit • OGSA-DQP client toolkit is Java only… Data access & integration with OGSA-DAI: GGF 17
Query execution using client toolkit 1 GenericServiceFetcher fetcher = GenericServiceFetcher.getInstance(); 2 DataService service = fetcher.getDataService(url,resourceID); 3 OQLQuery oqlQuery = new OQLQuery(query); 4 OutputStreamActivity outputStream = new OutputStreamActivity(); 5 outputStream.setInput( oqlQuery.getOutput() ); 6 ActivityRequest request = new ActivityRequest(); 7 request.add( oqlQuery ); 8 service.perform(request); 9 oqlQuery.getResultSet(); 10 java.sql.ResultSet rs = outputStream.getResultSet(); Data access & integration with OGSA-DAI: GGF 17
Demo: The GUI Client • The GUI allows you to: • Interact with OGSA-DQP services. The GUI is pre-configured with the URL of a OGSA-DQP service we have deployed at EPCC. • View the configuration parameters of DQP data service resources • View the global schema maintained by a DQP data service resource • Submit OQL queries to DQP data service resources • View the results of queries • View graphical and XML representations of query plans Data access & integration with OGSA-DAI: GGF 17
Services @ Newcastle University giga01.ncl.ac.uk giga02.ncl.ac.uk Evaluator service OGSA-DAI data service Evaluator service OGSA-DAI data service GO Term DB Protein interaction DB giga03.ncl.ac.uk giga04.ncl.ac.uk Evaluator service OGSA-DAI data service Evaluator service OGSA-DAI data service Protein Term DB Protein property DB Data access & integration with OGSA-DAI: GGF 17
Services @ Newcastle University giga07.ncl.ac.uk giga06.ncl.ac.uk giga05.ncl.ac.uk Evaluator service Evaluator service Evaluator service OGSA-DAI data service giga08.ncl.ac.uk giga09.ncl.ac.uk Protein Sequence DB Evaluator service Evaluator service Entropy analyser service Data access & integration with OGSA-DAI: GGF 17
Database tables GO Terms extent name: “goterms_goterms” Protein interactions Extent name: “interaction_protein_interactions” Data access & integration with OGSA-DAI: GGF 17
Database tables Protein terms extent name: “protein_term_protein_goterm” Protein properties extent name: “protein_property_protein_propertys” Protein sequence extent name: “protein_sequence_protein_sequences” Data access & integration with OGSA-DAI: GGF 17
DQP service @ EPCC test.ogsadai.org.uk DQP factory OGSA-DAI data service GIGA resource ogsadai-1092f60c1e1 Encapsulates the distributed query environment deployed at Newcastle Data access & integration with OGSA-DAI: GGF 17
Conclusion • OGSA-DQP is a service based distributed query processor that is: • Exposed as a service • Implemented as an orchestration of services • It provides an example of how the OGSA-DAI extensibility points can be used… • The activity extensibility points are used • New data resource accessors are implemented • Dynamic resource deployment is used during configuration to create new resources • Benefits: • OGSA-DAI manages activity concurrency – we didn’t need to write concurrent code • OGSA-DQP can take advantage of the host of delivery options provided by OGSA-DAI • OGSA-DQP is insulated from multiple platforms (WS-I, WSRF) by OGSA-DAI Data access & integration with OGSA-DAI: GGF 17