290 likes | 383 Views
A Query Translation Scheme for Rapid Implementation of Wrappers. Yannis Papakonstantinou, Ashish Gupta, Hector Garcia-Molina, Jeffery Ullman. Presented By Preetham Swaminathan 03/22/2007. Introduction.
E N D
A Query Translation Scheme for Rapid Implementation of Wrappers Yannis Papakonstantinou, Ashish Gupta, Hector Garcia-Molina, Jeffery Ullman Presented By Preetham Swaminathan 03/22/2007
Introduction • As part of the TSIMMIS project a lot of hard coded wrappers have been developed for a variety of sources including legacy systems. • Some Observations • Only small part of code deals with access details of source • Lot of code deals with communication, buffering etc. • Or code implements query and data transformation that can be expressed in a high level declarative fashion.
Introduction • Based on observations Wrapper implementation toolkit for rapid wrapper building developed. • Toolkit contains • Library of commonly used functions • Facility to translate queries into source specific commands and queries. • Translating results into a model useful to the application. • Main focus on the Query translation component of toolkit. (Converter)
Converter • Converter – Query translation component of the toolkit. • An implementor gives converter a set of templates. • These templates describe queries accepted by wrapper. • If application query matches template implementer provides an action. • The action is executed to produce native query for the source which answers the query.
Example • Consider data source that can only do selections on attribute dept. • Source does not understand the notion of projecting attributes. • Template describing the source select * from $X where $X.dept = ‘toy’ • The following query does not match this template because it consists of a projection. select emp.name from emp where emp.dept=‘toy’
Example • The wrapper could process the above query as follows • Transform the query into one without a projection. • Perform a projection on the result of the query – also known as process of filtering. • Wrapper toolkit can handle this type of query transformation. • Convertor not only generates native queries for source but also filters describing additional processing on the results.
Converter • Converters in the toolkit targets MSL query language. • MSL is logic based language for simple object oriented data model called OEM. • Converter is configured with templates written in QDTL. • Each template is associated with an action. • Converter takes as input MSL query and generates • Commands for source and • Filter to be applied to the results.
Converter • Converter will process • Directly supported queries – queries that syntactically match template. • Logically supported queries • Indirectly supported queries – can be processed as a combination of a direct query and a filter.
OEM Model • OEM stands for Object Exchange model. • OEM does not support classes, methods and inheritance. • Classes and methods can be emulated. • Example: <ob1 person {sub1,sub2,sub3,sub4,sub5}> <sub1 last_name, ‘Smith’> <sub2 first_name, ‘John’> <sub3 role , ‘faculty’> <sub4 department, ‘CS’> <sub5 telephone, ‘415-514-1292’>
OEM Model • At each source top level OEM objects are defined. • They provide entry points into object structure. • Sub-objects can be requested as explained below using the following MSL query. (Q1) *P:-<P person {<L last_name ‘Smith’>}> • Tail is of form <object id label value> • Matching • When field is a constant then pattern binds only with objects that have same constant value • When field is a variable the pattern can bind with any OEM object.
A Detailed Query Translation Example • Build a wrapper for a university “lookup” facility that contains information about employees and students. • Accessed from command line of computers and offers limited query capabilities. • Can return only the full records of persons including all fields like firstname, lastname and telephone. • No way for the user to retrieve just one field.
Query Translation • Only queries that are accepted are • Retrieve person records by specifying last name. (L2) lookup –ln Smith • Retrieve person records by specifying first and last name. (L3) lookup –ln Smith –fn John • Retrieve all person records (L4) lookup
Query Translation • Using Query description translation language (QDTL) the description for lookup facility can be written as below. (D1) (QT1.1) Query ::= *O:-<O person {<lastname $LN>}> (QT1.2) Query ::= *O:-<O person {<lastname $LN> <firstname $FN>}> (QT1.3) Query ::= *O:-<O person V> • Identifiers preceded by $ are constant place holders • Upper case identifiers are variable place holders.
Query Translation • Each template describes many more queries than those that match syntactically. • Each template describes following classes of queries. • Directly supported queries. • Logically supported queries. • Indirectly supported queries.
Query Translation • Directly Supported Queries • A query q is directly supported by a template t if q can be derived by substituting the constant placeholders of t by constants and the variables of t by variables. • *P:-<P person {<last_name ‘Smith’>}> is directly supported by template QT1.1 by substituting O with P and $LN with ‘Smith’.
Query Translation • Logically supported queries • A query q is logically supported by a template t if q is logically equivalent to some query q` directly supported by t . *O:-<O person {<first_name ‘John’> <last_name ‘Smith’>}> *O:-<O person {<last_name ‘Smith’> <first_name ‘John’>}> *O:-<O person {<LO last_name ‘Smith’>}> AND <O person {<LO L V> <first_name ‘John’>}> • All these queries are equivalent to *O:-<O person {<first_name ‘John’> <last_name ‘Smith’>}> (supported by QT1.2)
Query Translation • Indirectly supported queries • A query q is indirectly supported by template t if q can be broken down into a directly supported query and then filter is applied on the results. (Q6) *Q:-<Q person {<last_name ‘Smith’> <role ‘student’>}> • The above query is not logically supported by any templates in the description.
Query Translation • Converter realizes that the answer to the following query contains answers to the original query (subset of the following query) (Q7) *Q:-<Q person {<last_name ‘Smith’>} • Thus the converter matches Q6 to template QT1.1 as if it were Q7 binding $LN to ‘Smith’ and generates the filter *O:-<O person {<role ‘Student’>}> • The filter is an MSL query that is applied to the result of Q7 to produce the result of Q6
Native Query Formulation (D2) (QT2.1) Query::=*O:-<O person {<last_name $LN>}> (AC2.1) {sprintf(lookup_query, ’lookup –ln %s’, $LN);} (QT2.2) Query::=*O:-<O person{<last_name $LN> <first_name $FN>}> (AC2.2){sprintf(lookup_query, ‘lookup –ln %s –fn %s’, $LN,$FN);} (QT2.3) Query::=*O:-<O person V> (AC2.3) {sprintf(lookup_query, ‘lookup’);}
Non-terminals (D4) /* A description with nonterminals */ (QT4.1) Query ::= *OP :- <OP person {__OptLN __OptFN __OptRole}> /*Query Template*/ (NT4.2) __OptLN ::= <last name $LN> /*Nonterminal template*/ (NT4.3) __OptLN ::= /* empty nonterminal template*/ (NT4.4) __OptFN ::= <first name $FN> (NT4.5) __OptFN ::= /* empty */ (NT4.6) __OptRole ::= <role $R> (NT4.7) __OptRole ::= /* empty */
Nonterminals - Actions (D5) (QT5.1) Query ::= *OP :- <OP person {_OptLN _OptFN _OptRole}> (AC5.1) {sprintf(lookup query, 'lookup %s %s %s', $ _OptLN, $ _OptFN, $ _OptRole)} ; (NT5.2) _OptLN ::= <last name $LN> (AC5.2) {sprintf($_OptLN,'-ln %s',$LN);} (NT5.3) _OptLN ::= (AC5.3) {$_OptLN = '';} (NT5.4) _OptFN ::= <first name $FN> (AC5.4) {sprintf($ _OptFN, '-fn %s', $FN);} (NT5.5) _OptFN ::= (AC5.5) {$_OptFN = '';} (NT5.6) _OptRole ::= <role $R> (AC5.6) {sprintf($_OptRole,'-role %s',$R);} (NT5.7) _OptRole ::= (AC5.7) {$_OptRole = '';}
Wrapper Architecture • Wrapper Consists of • Implementer • provides the driver that has the primary control of query processing • Provides the QDTL description for the converter • Provides the Data Extraction (DEX) template for the extractor component of the toolkit. • Converter • Driver
Wrapper Architecture • Wrappers generated with the toolkit behave as server in a client server architecture. • Clients use client support library to issue queries and receive OEM results. • The server support library component of the toolkit receives queries and sends it to driver component for processing. • Driver invokes the converter which finds a query that supports the input query and returns native queries.
Wrapper Architecture • Driver submits the native queries to information source and receives result as OEM objects. • If filter was generated during processing the driver passes the OEM result and the filter to the filter processor. • Data Extractor (DEX) is used to parse the result and identify required data. • DEX is configured with a description of source output and what part of source output needs to be extracted.
Correspondence of OEM to Relational Models • OEM objects are represented relationally by flattening them into tuples of 3 relations top, object and member. • OEM objects can be converted using a few straight forward rules. • For an object o with object id oid, label l and atomic value v the tuple can be written as object(oid,l,v) • If o is a set object then the tuple becomes object(oid,l,set)
OEM to SQL • If o has sub objects oi where 1 ≤ i ≤ n identified by oid then we introduce tuple member(oid,oidi) • Finally if o is a top level object defined by oid then we introduce tuple top(oid) • Relational representation of MSL queries is obtained by querying the top, object and member relations that represent the object structure referenced in the query.
Example • Consider the query *O:-<O person {<LM last_name ‘Smith’>}> • The above MSL query can be written as the following datalog query. answer(O):- top(O), object(O,person,set), member(O,LM), object(LM, last_name, ’Smith’) • Paper contains an algorithm that for a given MSL finds supporting queries from QDTL and if required creates a filter to be applied to OEM result objects.
Conclusions • Toolkit that facilitates implementation of wrappers developed. • Heart of toolkit is the converter that maps incoming queries into native commands of the source. • Converter provides translation flexibility of systems like Yacc, but gives substantially more power (translates a wider class of queries)