200 likes | 375 Views
Automatic Deployment of Application-Specific Metadata and Code in MOCHA. Manuel Rodriguez-Martinez Nick Roussopoulos. Client. Client. Integration Server. Catalog. Translator. Translator. Translator. Introduction. Database Middleware Systems:
E N D
Automatic Deployment of Application-Specific Metadata and Code in MOCHA Manuel Rodriguez-Martinez Nick Roussopoulos
Client Client Integration Server Catalog Translator Translator Translator Introduction • Database Middleware Systems: • Used to integrate data from multiple sources. • Help to keep clients simple • thin clients • economic ($$$) to deploy • Web-based GUI • Re-use existing servers • replacing them can be expensive and dangerous • Examples • TSIMMIS, Garlic, DISCO, Oracle, Sybase, ... Oracle XML Images M. Rodriguez-Martinez – N. Roussopoulos
Code Deployment Problem Code for data types and operators is user-defined Polygon Perimeter() Need to manually install the code to: clients integration servers translators Must be ported (C/C++ code) Security (do not crash system) Does not scale well as the number of sites increases hard to deploy, upgrade and maintain the code Client Client Integration Server Catalog Translator Translator Translator Oracle XML Images Limitations of this Solution M. Rodriguez-Martinez – N. Roussopoulos
Query Processing Problem Availability of code limits operator placement options. not all sites can evaluate the operators in a query Integration server ends up doing most of the processing. data must be shipped to it Too much data movement! Does not scale well network becomes a major performance bottleneck limited bandwidth increases query execution time Client Client Integration Server Catalog Translator Translator Translator Oracle XML Images 100MB 100MB 100MB Limitations of this Solution M. Rodriguez-Martinez – N. Roussopoulos
The MOCHA Solution • Middleware system automatically deploys the code • ship Java classes for data types and operators • done at run time in dynamic fashion • Provide information on how to use the code • metadata and control in XML and RDF • Exploit these features in query operator placement • place operators at sites that minimize data movement • remote data sources get operators that filter the data • integration server gets operators that expand the data • more on this: SIGMOD 2000 paper M. Rodriguez-Martinez – N. Roussopoulos
Client Client Code Repository Catalog QPC DAP DAP DAP DAP MOCHA Architecture Network XML Repository Text Files Oracle 8i Informix M. Rodriguez-Martinez – N. Roussopoulos
Automatic Code Deployment Oracle Code Repository Informix Catalog DAP DAP QPC Texas Virginia Maryland Internet Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Client Virginia M. Rodriguez-Martinez – N. Roussopoulos
100MB 200MB tuples tuples 350KB 200KB 150KB 200KB 150KB 150KB 200KB 350KB results results results results results results results results Answering the Query Oracle Code Repository Informix Catalog DAP DAP QPC Texas Virginia Maryland Internet Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Client Virginia M. Rodriguez-Martinez – N. Roussopoulos
Client QPC DAP DAP DAP XML Oracle Text Components of MOCHA • Client Application • QPC • parsing (SQL) • optimizing • catalog management • code deployment • query execution • DAP • data translation • query execution • Data Server • storage server Catalog Code Repository Internet M. Rodriguez-Martinez – N. Roussopoulos
Catalog Organization • Holds information describing the structure and proper use of tables, data types and query operators. • Generically referred to as “resources” • Each resource is uniquely identified by an URI: • mocha://cs1.umd.edu/EarthSci/Polygon • Metadata is encoded using RDF (an XML derivative) • makes it easy to understand, use and exchange metadata • Each resource has a catalog entry in the form: (URI, RDF File) M. Rodriguez-Martinez – N. Roussopoulos
location image week band Metadata Requirements Table Rasters Query: Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location 1. What kind of metadata are needed? 2. How to specified them? M. Rodriguez-Martinez – N. Roussopoulos
RDF Model: Data Types mocha://cs1.umd.edu/EarthSci/Raster mocha:Type mocha:Creator user1@cs.umd.edu Raster mocha:Size mocha:Class mocha:Repository cs1.umd.edu/EarthSci Raster.class 1 megabyte M. Rodriguez-Martinez – N. Roussopoulos
RDF Model: Query Operators mocha://cs1.umd.edu/EarthSci/Composite mocha:Creator mocha:Aggregate user1@cs.umd.edu Composite mocha:Result mocha:Class mocha:Arguments Composite.class mocha:Repository mocha:URI mocha:Type mocha:Type mocha:URI cs1.umd.edu/EarthSci rdf:type . . . . . . Raster rdf:Seq Raster M. Rodriguez-Martinez – N. Roussopoulos
RDF Model: Tables mocha://cs1.umd.edu/EarthSciDB/Rasters mocha:Table mocha:Owner user1@cs.umd.edu Rasters mocha:Database mocha:Columns cs1.umd.edu/EarthSciDB rdf:type rdf:Seq . . . mocha:URI mocha:URI mocha:Column mocha:Column mocha:Type mocha:Type . . . Raster . . . . . . image location Polygon M. Rodriguez-Martinez – N. Roussopoulos
<rdf:Descriptionabout= “mocha://cs1.umd.edu/EarthSci/Raster”> <mocha:Type>Raster</mocha:Type> <mocha:Class> Raster.class </mocha:Class> <mocha:Repository> cs1.umd.edu/EarthSci </mocha:Repository> <mocha:Size>1MB</mocha:Size> <mocha:Creator>user1@cs1.umd.edu </mocha:Creator> </rdf:Description> Metadata and Control Exchange • QPC sends to each DAP: • metadata for the datatypes and operators they will receive • query plan specifying task to do • Metadata is serialized as XML • RDF serialization syntax • Plans • XML documents • easy to use and understand • can be mapped to suitable form • tree, DAG, graph, etc. • prevents version inconsistencies • changes in Java classes M. Rodriguez-Martinez – N. Roussopoulos
location image week band Processing a Query in MOCHA • Query Parsing • Resource Discovery • Query Optimization • Metadata and Control Exchange • Code Deployment Phase • Query Execution Table Rasters Query: Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location M. Rodriguez-Martinez – N. Roussopoulos
Performance of MOCHA • shipping Composite() code to DAP • cuts data movement by 99% • 4-1 performance improvement Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Running Time (secs) Non-MOCHA MOCHA Middleware Type M. Rodriguez-Martinez – N. Roussopoulos
Middle-tier solution Extensible Java Code Re-usability across platforms Automatic Code Deployment “Plug-n-Play” Easier to Administer XML-based Metadata XML-based Control Efficient Query Processing data movement reduction moving code vs. data Benefits of MOCHA M. Rodriguez-Martinez – N. Roussopoulos
Conclusions • Identified limitations in existing middleware systems • Code Deployment Problem • Query Processing Problem • Proposed a new framework to automate the deployment of new functionality: • automatic code deployment • efficient query processing • Described its implementation in MOCHA,based on well-accepted technologies: Java, XML, RDF. http://www.cs.umd.edu/projects/mocha/ M. Rodriguez-Martinez – N. Roussopoulos