520 likes | 666 Views
Exchange Intensional XML Data. Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd Amann Cedric-CNAM ; Omar Benjelloun INRIA ; Fred Dang Ngoc INRIA. Outline. Introduction The Model and The Problem Exchanging Intensional Data Safe Rewriting Possible Rewriting Implementation
E N D
Exchange Intensional XML Data Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd AmannCedric-CNAM ; Omar Benjelloun INRIA ; Fred Dang NgocINRIA
Outline • Introduction • The Model and The Problem • Exchanging Intensional Data • Safe Rewriting • Possible Rewriting • Implementation • Conclusion and Related Work
Introduction • What are intensional documents? • XML document where; • some of the documents are defined explicitly • some are defined by programs (i.e Webservices) that generate data. • Materialisation of the programs • the process of evaluating some of the programs included in an XML document and replacing them by their results.
Introduction (cont’d) • The goals of the paper • Study the new issues raised by the exchange of intensional XML document btw. Applications • Decide on which data should be materialised before it is sent and which should not
Sender Receiver capabilities ACL cost ... capabilities ACL cost ... g g r g f q r g g r q Introduction (cont’d) • Data exchange scenario for intensional documents g Data Exchange Schema q f f q g r q ... ... ... ... ...
Outline • Introduction • The Model and The Problem • Exchanging Intensional Data • Safe Rewriting • Possible Rewriting • Implementation • Conclusion and Related Work
The Model and The Problem • Simple intensional XML • Model Intension document • Simple schema • Instance of a schema • About rewritings • A Richer Data Model • Function pattern • Restricted Service Invocations
The Model and The ProblemSimple intensional XML • Model intentional XML documents as Labelled Trees consisting of two types of nodes: • Data nodes:Nodes with a label in L U D • Function Nodes correspond to “Service Calls”, that is, nodes with a label in F: • The children subtrees of a function node are the Function Parameters • When the function is called: • These subtrees are passed to it • The return value replaces the function node in the document. • Assume the existance of some Disjoint Domains: • N : domain of NODES • L : domain of LABELS • F : domain of FUNCTION NAMES • D : domain of DATA VALUES
The Model and The Problem Simple intensional XML (cont’d) • An example of intentional XML documents newspaper Get_Temp TimeOut title temp date city “Exhibits” “The Sun” “16 ºC” “04/10/2002” “Paris”
The Model and The ProblemSimple intensional XML (cont’d) • Simple schema • A document schema s is anexpression (L,F,τ) where, • L L :finite set of labels • F F :finite set of function names • τ :function that maps: • Each label name l Є L to a expression over L U F or to the keyword “data” • Each function name f Є F to a pair of expressions called • τin(f ) input type of f • τout(f ) output type of f
The Model and The Problem Simple intensional XML (cont’d) • An Example of a Schema: • data: • τ (newspaper) =title.date.(Get_Temp|temp) .(TimeOut|exhibit) • τ (title) = data • τ (date) = data • τ (temp) = data • τ (city) = data • τ (exhibit) = data • Functions: • τin (Get_Temp)= city • τout (Get_Temp)= temp • τin (TimeOut)= data • τout (Timeout)= (exhibit|performance) • τin (Get_Date)= title • τin (Get_Date)= date
The Model and The ProblemSimple intensional XML (cont’d) • Instances of a schema • An intensional document t is instance of a schema s=(L,F,τ) if for each: • Data NodenЄ t with label lЄ L, the labels of n’s children form a word in lang(τ(l )) • Same is valid for Function Node. Used to denode the regular language defined by τ (l )
The Model and The Problem Simple intensional XML (cont’d) • about Rewritings • t,t’: trees • IFt’ is obtained from t by; • selecting a function node v in t with some label fand • replacing it by an arbitrary output instance of f • THENwe say thatt t’ v
The Model and The Problem Simple intensional XML (cont’d) • about Rewritings (cont’d) • IFt t1 t2 ------ tn THEN • we say that t tn • nodes v1,........, vn are called rewriting sequence • the set of all trees t’ such that t t’ is denoted ext(t). vn v1 v2 * t rewrites into tn *
The Model and The ProblemSimple intensional XML (cont’d) • about Rewritings (cont’d) • Let: • t be a tree • s be a schema • 1. IF ext(t) contains some instance of s THEN t possibly rewrites into s. • 2. IFeither t is already an instance of s orthere exists some node vint such that all trees t’ where t t’ safely rewrite into s THEN we say that t safely rewrites into s v
The Model and The Problem Simple intensional XML (cont’d) • safely rewriting of schema • Let: • s be a schema • r is a distinguished label called root label • IF all the instances t of s with root label r rewrite safely into instances of s’ THENwe say that:s safely rewritesinto s’ Problems:
Sender Receiver capabilities ACL cost ... capabilities ACL cost ... g g r g f q r g g r q The Model and The Problem Simple intensional XML (cont’d) g Data Exchange Schema q f f q g r q ... ... ... ... ...
The Model and The ProblemA Richer Data Model • Function Patterns • A function belongs to the pattern if its name satisfies theboolean predicateand itssignatureis the same as the required one • EX: • τname(Forecast)= UDDIF InACL • τin(Forecast)= city • τout(Forecast)= temp
The Model and The Problem A Richer Data Model(cont’d) • Restricted Service Invocations • We assumed so far that all the functions appearing in a document may be invoked in a rewriting, in order to match a given schema. • This is not always the case, for the reasons like; • security, • cost, • access rights , etc. • THUS, function names/patterns in the schema can be partitioned into two disjoint groups of invocable and noninvocable ones. • A legal rewriting is then one that invokes only invocable functions.
Outline • Introduction • The Model and The Problem • Exchanging Intensional Data • Safe Rewriting • Possible Rewriting • Schema Rewriting • Implementation • Conclusion and Related Work
Exchanging Intensional Data • Rewriting process • Safe writing • Possible writing • Mix approach • Restriction
Exchanging Intensional Datarewriting process • Safe rewriting: • check if t safely rewrites to s • if so, find a rewriting sequence. • rewriting sequence a sequence of functions that need to be invoked to transformtinto the required structure • preferred required structure shortest/ cheapest one
Exchanging Intensional Datarewriting process(cont’d) • Possible Rewriting : • IF a safe rewriting does not exist • check whether at least t may rewrite to s. • IF it is acceptable to do so (the sender accepts that the rewriting may fail), • try to find a successful rewriting sequence if one exists • preferred rewriting sequence one with the least cost.
Exchanging Intensional Datarewriting process(cont’d) • Mixed Approached: In mixed approach, one could • first invoke some function calls • then attempt from there to find safe rewritings.
Exchanging Intensional Datarewriting process(cont’d) • K-depth rewriting sequence • For a rewriting sequencetv:t1 ..tn , • IFthe node Vj was returned by the invocation of the function Vi, Vj tj, Vi tj-1 • THEN we say thatfunction nodeVjdepends on afunction nodeV i. • IF the dependency graph among the nodes contains no paths of length greater than k. • THEN we say that a rewriting sequence is ofdepth k vn v1
Exchanging Intensional DataRestriction RESTRICTION: “Consider onlyk-depth left-to-right rewritings.“
Outline • Introduction • The Model and The Problem • Exchanging Intensional Data • Safe Rewriting • Possible Rewriting • Schema Rewriting • Implementation • Conclusion and Related Work
Safe Rewriting(DEC16,2004) • Algorithm for k-depth left to right safe rewriting • Safe Rewriting Algorithm: • Given: • word w • the output types Rf1,.....,Rfnof the available functions • target regular language R • Purpose of the algorithm: • to test ifwcan be safely rewritten into a word in R • if so, to find a safe rewriting sequence
Safe Rewriting (cont’d) • Note:For illustration purposes we use the newspaper document • w=title.date.Get_Temp.TimeOut word children labels form • R=title.date.temp (TimeOut|exhibit*)safe rewriting of the above word into the word in R • The Algorithm: Main idea: to put things in regular language terms, the intersection of the language generated by the k-depth invocation with the complement of the target language R should be Empty.
title date q0 q1 Safe Rewriting (cont’d) 1.Build the finite state automata for the following regular languages (1) Aww=title.date.Get_Temp.TimeOut (2) Build automata Afi each accepting the regular language Rfi (the output types of the available functions). Get_Temp q2 q3 TimeOut q4
Safe Rewriting (cont’d) (3)Build an automaton A accepting the complement of the regular language R . The automaton should be deterministic and complete. The complement automation A for schema τ’(newspaper)=title.date.temp(TimeOut|exhibit*) * * * * p0 title p1 date temp p3 TimeOut p4 p6 p2 * exhibit * p5 exhibit
q2 Get_Temp q0 q1 q3 q4 title date TimeOut ε ε ε ε temp q5 q6 q7 exhibit performance Safe Rewriting (cont’d) k • 1 depth automaton Aw for the word w=title.date.Get_Temp.TimeOut 2. Construct automation Awrepresents all the words that can be generated by such k-depth rewriting process (by iteration) 1 Represents choice of not invoking the function Fork node Fork node Represents choice of invoking the function
Safe Rewriting (cont’d) 3.Construct the cartesian product automaton AX=Aw X A k exhibit q4,p6 q5,p5 Performance q7,p5 ε ε ε Performance Exhibit exhibit TimeOut Perform. ε ε q7,p6 q3,p6 q7,p6 q7,p3 q4,p3 ε Get_Temp title date TimeOut q0,p0 q1,p1 q2,p2 q4,p4 q3,p3 ε ε temp q5,p2 q6,p3 Figure6:
Safe Rewriting (cont’d) 4. Mark nodes in AX: exhibit q4,p6 q5,p5 Performance q7,p5 ε ε ε Performance Exhibit exhibit TimeOut Perform. ε ε q7,p6 q3,p6 q7,p6 q7,p3 q4,p3 ε Get_Temp title date TimeOut q0,p0 q1,p1 q2,p2 q4,p4 q3,p3 ε ε temp q5,p2 q6,p3 Figure6:
Safe Rewriting (cont’d) • Try to obtain a SAFE REWRITING. • “A safe rewriting exists IFF the initial state is not marked” • Follow a non-marked path(corresponding tow ) starting from the initial state ofAx to a state [q p] where q is an accepting stateofAw • non-marked fork options on the path determine the rewriring choices (i.e. which functions to call) • when a function is invoked, we contnue the path with the new rewritten word rather than the wordw k
Safe Rewriting (cont’d) • To minimize the rewriting cost, choose a path with minimal number/cost of function invocations. • EXIT % End of the algorithm
Safe Rewriting (cont’d) • The complement automaton A for schema τ’(newspaper)=title.date.temp.exhibit* * * * * * q0 title q1 date temp p3 p4 p6 q3 * exhibit * p5 exhibit Figure7:
Safe Rewriting (cont’d) 1 1 • The cartesian product automatonAx = Aw x A exhibit q4,p6 q5,p5 Performance q7,p5 ε ε ε Performance Exhibit exhibit TimeOut Perform. ε ε q7,p6 q3,p6 q7,p6 q7,p3 q4,p3 ε Get_Temp TimeOut title date q0,p0 q1,p1 q2,p2 q3,p3 ε ε temp q5,p2 q6,p3 Figure8:
Outline • Introduction • The Model and The Problem • Exchanging Intensional Data • Safe Rewriting • Possible Rewriting • Implementation • Conclusion and Related Work
Possible Rewriting • The Algorithm • 1.Build finite state automaton for the following languages: • 1.1. An automaton Aw • 1.2. An automaton A accepting the regular language R k
Possible Rewriting(cont’d) • An automaton A for schema τ’’(newspaper)=title.date. Temp.exhibit* p0 title p1 date temp p3 Exhibit p4 p2 exhibit Figure10:
Possible Rewriting(cont’d) k • 2.Construct the cartesian product automaton Ax=Aw x A q4,p3 ε ε title date q0,p0 q1,p1 q2,p2 q7,p3 q3,p3 exhibit ε ε q7,p4 temp q5,p2 q6,p3 ε q4,p4 exhibit Figure11:
Possible Rewriting(cont’d) • The cartesian product automaton for possible rewritting. q4,p3 ε ε title date q0,p0 q1,p1 q2,p2 q7,p3 q3,p3 exhibit ε ε q7,p4 temp q5,p2 q6,p3 ε q4,p4 exhibit Figure11:
Outline • Introduction • The Model and The Problem • Exchanging Intensional Data • Safe Rewriting • Possible Rewriting • Implementation • Conclusion and Related Work
Implementation • In the implementation; • intensional XML document a well-formed XML document • To distinguish intensional parts from the rest of the document; • namespace http://www.activexml.com/ns/int is used. • http://www.activexml.com/ns/int namespace defined for function (service) calls.
Implementation (cont’d) newspaper TimeOut Get_Temp title date city “Exhibits” “The Sun” “04/10/2002” “Paris”
Implementation (cont’d) Namespace defined for function (service) calls Data nodes title and date 1.URL of the server 3.associated namespace 2.Method name Three attributes of the function nodes provide necessary information to call the SOAP Service
Implementation (cont’d) 1.URL of the server 3.associated namespace 2.Method name Function TimeOut
Implementation (cont’d) • Newspaper element with structuretitle.date.(Forecast|temp).(TimeOut|exhibit*)
Implementation (cont’d) • The Role ofSchema Enforcement Module : • 1. to verify whether the call parameters conform to the WSDLint description of the service. • 2. if not, try to rewrite them into the required structure. • 3. if 2 fails, to report an error. NOTE: • Similarly, before an ActiveXML returns its answer, the Schema Enforcement Module performs the same three steps on the returned data.