80 likes | 197 Views
context. High-level data access and integration services are needed if applications that have data with complex structure and complex semantics are to benefit from the GRID .
E N D
context • High-level data access and integration services are needed if applications that have data with complex structure and complex semantics are to benefit from the GRID. • Standards for data access are emerging, and middleware products that are reference implementations of such standards are already available. • Distributed query processing technology is one approach to delivering (1.) given the availability of (2.). EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
OGSA-DQPgoals • To benefit from homogeneous access to heterogeneous data sources [OGSA-DAI]. • To benefit from Grid abstractions for on-demand, transparent allocation of resources required for a task [OGSA/OGSI/GT3]. • To provide transparent, implicit parallelism and distribution. [Polar*] • To orchestrate the composition of data retrieval and analysis servicesusing query mechanisms. • To expose this orchestration capability as a Grid data service. EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
OGSA-DQPinnovations • OGSA-DQP dynamically allocates evaluators to do work on behalf of the mediator. • All available nodes can be allocated for query evaluation (not just the nodes with data sources) • A distributed query execution plan is resourced on the fly • This allows for runtime circumstances to be taken into account when the optimiser decides how to partition and schedule. • The query plan is the outcome of optimising a declarative service orchestration expressed as a query. • OGSA-DQP uses a parallel physical algebra: most mediator-based query processors do not. EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
Exposes to clients Grid Distributed Query Services (GDQSs) that: interact with clients; find and retrieve service descriptions; parse, compile, partition and schedule the query execution over a union of distributed data sources. Coordinates the GQESs into executing the plan The query plan is an orchestration of GQESs Coordinates transparently Grid Query Evaluation Services (GQESs) that: implement the physical query algebra; implement the query execution model and semantics; run a partition of a query execution plan generated by a GDQS; interact with other GQESs/GDSs/WSs but not with clients. OGSA-DQPprovides two grid services EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
<?xml version="1.0" encoding="UTF-8"?> <databaseSchema xmlns=""> <logicalSchema> <table name="goterm"> <column fullName="goterm_id" length="32" name="id"> <sqlTypeName>varchar</sqlTypeName> <sqlJavaTypeID>12</sqlJavaTypeID> </column> <column fullName="goterm_type" length="55" name="type"> <sqlTypeName>varchar</sqlTypeName> <sqlJavaTypeID>12</sqlJavaTypeID> </column> <column fullName="goterm_name" length="255" name="name"> <sqlTypeName>varchar</sqlTypeName> <sqlJavaTypeID>12</sqlJavaTypeID> </column> <primaryKey> <columnFullName>id</columnFullName> </primaryKey> </table> </logicalSchema> <physicalSchema> <hostMachine>130.88.192.230</hostMachine> <database join_buffer_size="131072" max_join_size="4294967295"> <physTable avgRowLength="67" dataLength="766784" indexLength="126976" name="goterm" rowFormat="Dynamic" rows="11369"/> </database> </physicalSchema> <GDSFHandle>http://phoebus.cs.man.ac.uk:9090/ogsa/services/ogsadai/GridDataServiceFactory</GDSFHandle> </databaseSchema> <?xml version="1.0" encoding="UTF-8"?> <Partitions> <Partition> <evaluatorURI>http://130.88.198.195:9090/ogsa/services/ogsadai/dqp/GridQueryEvaluationFactory/hash-11025450-1076603541049</evaluatorURI> <Operator operatorID="0" operatorType="TABLE_SCAN"> <tupleType> <type>goterm</type> <name>goterm.OID</name> <type>string</type> <name>goterm.id</name> <type>string</type> <name>goterm.type</name> <type>string</type> <name>goterm.name</name> </tupleType> <TABLE_SCAN> <dataResourceName> goterms </dataResourceName> <GDSHandle> http://130.88.192.230:9090/ogsa/services/ogsadai/GridDataServiceFactory/hash-31056514-1076603576481</GDSHandle> <tableName> goterms </tableName> <predicateExpr> <predicate> <comparativeOperator>LIKE</comparativeOperator> <leftOperand name=" goterm.id" type="13"/> <rightOperand name=" GO:0000%" type="16"/> </predicate> </predicateExpr> </TABLE_SCAN> </Operator> . . . </Partition> . . . </Partitions> Brief tour: an illustration <?xml version="1.0" encoding="UTF-8"?> <GDQDataSourceList xmlns="http://dqp.ogsadai.org.uk/schema/gdqs"> <importedDataSource> <GDSFactoryHandle>http://phoebus.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle> <GDSFactoryHandle>http://rpc676.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle> <GDSFactoryHandle>http://mygrid.ncl.cs.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle> </importedDataSource> <importedService> <wsdlURL>http://phoebus.cs.man.ac.uk:9090/axis/services/EntropyAnalyserService?WSDL</wsdlURL> </importedService> </GDQDataSourceList> EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
The Demonstration:Configuring the DQP Select DQP Factory Select Data Sources Select Web Services Import Metadata EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
Given two DBMSs and one analysis tool (e.g., a WS): Goterm to a GO Gene Ontology running as a remote mySQL DB, proteinSequence yeast protein sequences, EntropyAnalyser (information Content analyser); We can obtain the information content of protein sequences of a certain kind specified by certain gene ontology terms: select p.ORF, go.id, calculateEntropy(p.sequence) from p in protein_sequences, go in goterms, pg in protein_goterms where go.id=pg.GOTermIdentifier and p.ORF=pg.ORF and p.ORF like "YBL06%" and go.id like "GO:0000%"; The Demonstration :Example Query • Then, OGSA-DQP acts as an enactor of a declarative orchestration of services on the Grid: Parallelized on nodes 1 & 2 Partition boundaries EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
where to find out more: software OGSA-DQP Grid middleware to query distributed data sources www.ogsadai.org.uk/dqp OGSA-DAI Grid middleware to interface with data(bases) www.ogsadai.org.uk/ Globus Toolkit Open-source implementation of OGSA/OGSI www.globustoolkit.org/ EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)