150 likes | 297 Views
OGSA-DQP. Steven Lynden University of Manchester. Introduction . OGSA-DQP is a service based distributed query processor It evaluates queries over distributed data sources wrapped by OGSA-DAI It is built using OGSA-DAI extensibility points People involved: University of Manchester
E N D
OGSA-DQP Steven Lynden University of Manchester
Introduction • OGSA-DQP is a servicebased distributed query processor • It evaluates queries over distributed data sources wrapped by OGSA-DAI • It is built using OGSA-DAI extensibility points • People involved: • University of Manchester • Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou, Norman Paton • University of Newcastle • Jim Smith, Arijit Mukherjee, Paul Watson • OGSA-DAI • Prototype release 3.0 available from the OGSA-DAI website Data access & integration with OGSA-DAI: GGF 17
OGSA-DQP high-level overview • OGSA-DQP uses a middleware approach. • It can be seen as a mediator over OGSA-DAI wrappers. • Usability: use it as an OGSA-DAI data service. • DQP is capable of planning, scheduling and executing in parallel the distributed queries • Calls to analysis (Web) services can be declared within queries and invoked by DQP. Query Results OGSA-DQP OGSA-DAI OGSA-DAI DBMS DBMS data data Data access & integration with OGSA-DAI: GGF 17
OGSA-DQP architecture Evaluator QE DQP activities installed OGSA-DAI data service Evaluator perform QE Evaluator QE The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator” AKA Grid Query Evaluation Service (GQES) Data access & integration with OGSA-DAI: GGF 17
OGSA-DQP architecture • DQP evaluator services: • Are plain Web services • Implement the QueryEvaluation port type: • evaluate – the input is a query plan partition which is subsequently executed • receiveData – allows the evaluator to receive data from other evaluators • OGSA-DAI extensions: • DQP resource – a resource which encapsulates a distributed query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor. • OQL query statement activity – enables the submission of a query in Object Query Language (OQL) • DQP factory activity – enables the creation and configuration of DQP resources. Data access & integration with OGSA-DAI: GGF 17
Example query • Given two DBMSs and one analysis tool (i.e., a Web service): • goTerm : a table in a GO Gene Ontology database running as a remote mySQL DB, exposed by an OGSA-DAI data service • protein : a table in a protein sequence DB, exposed by an OGSA-DAI data service • Blast (sequence alignment scoring Web service); • We want to obtain alignment scores for a sequence against proteins of a certain kind • The user submits a single query referencing data stored at multiple sites. • The author of the query need not be aware of how/where data is stored. • Queries are written in Object Query Language (OQL): select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId=t.proteinId Data access & integration with OGSA-DAI: GGF 17
Client interaction with OGSA-DQP • Two kinds of client/server interactions: • Configuration: the client sends a perform document requesting the service to create a DQP data service resource • Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1) The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource Data access & integration with OGSA-DAI: GGF 17
DQP configuration <perform> <DQPFactory> Evaluator URLs OGSA-DAI data service resources Web service URLs </DQPFactory> </perform> OGSA-DAI data service GetRP OGSA-DAI data service OGSA-DAI data service GetRP perform DQP factory activity Result: resource ID of created DSR creates DQP DSR • Global schema of imported DBs & analysis services • Set of evaluators that can be used • Physical DB metadata (used to optimise queries) Data access & integration with OGSA-DAI: GGF 17
DQP query evaluation <perform> <OQLQueryStatement> <expression> OQL query </expression> </OQLQueryStatement> </perform> Evaluator OGSA-DAI data service perform QE OGSA-DAI data service Evaluator Analysis service transport perform . . . QE OQLQueryStatement DQP DSR Evaluator OGSA-DAI data service perform QE Result: WebRowSet XML Stream Data access & integration with OGSA-DAI: GGF 17
OQL Query Statement activity detail query single-node optimiser parser logical optimiser physical optimiser evaluators partitioner scheduler multi-node optimiser query results Data access & integration with OGSA-DAI: GGF 17
Logical optimisation • Consider the query: select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId = t.proteinId • Plan is expressed as a logical algebra • Multiple equivalent plans are generated reduce op_call (Blast) join (proteinId) reduce reduce scan (protein) scan termId=GO:0005942 (goTerm) Data access & integration with OGSA-DAI: GGF 17
Physical optimisation • Plan is expressed as a physical algebra • Plan is chosen by cost-ranking of equivalent plans reduce op_call (Blast) hash_join (proteinId) reduce reduce table_scan (protein) table_scan termId=GO:0005942 (goTerm) Data access & integration with OGSA-DAI: GGF 17
Query partitioning • Plan is transformed into a parallel algebra (physical operators + data exchange) • Exchange operators are placed where data exchange must take place reduce op_call (Blast) exchange hash_join (proteinId) exchange exchange reduce reduce table_scan (protein) table_scan termId=GO:0005942 (goTerm) Data access & integration with OGSA-DAI: GGF 17
4,5 reduce op_call (Blast) 2 exchange hash_join (proteinId) exchange exchange reduce reduce 1 3 table_scan (protein) table_scan termId=GO:0005942 (goTerm) Query scheduling • Allocate operators to evaluator nodes partitioned parallelism pipelined parallelism select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId = t.proteinId Data access & integration with OGSA-DAI: GGF 17
Conclusion • OGSA-DQP is a service based distributed query processor that is: • Exposed as a service • Implemented as an orchestration of services • Benefits: • Queries are executed in parallel • OGSA-DAI OGSA-DQP can take advantage of the host of delivery options provided by OGSA-DAI • Web services can be invoked during query execution, merging data access with data analysis Data access & integration with OGSA-DAI: GGF 17