660 likes | 786 Views
EXCHANGING INTENSIONAL XML DATA. Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd Amann Cedric-CNAM ; Omar Benjelloun INRIA ; Fred Dang Ngoc INRIA. H. GÜL ÇALIKLI 2002700743 MURAT KORAŞ 2002700797. INTRODUCTION.
E N D
EXCHANGING INTENSIONAL XML DATA Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd AmannCedric-CNAM ; Omar Benjelloun INRIA ; Fred Dang NgocINRIA H. GÜL ÇALIKLI 2002700743 MURAT KORAŞ 2002700797
INTRODUCTION • Emergence of Web Services as standard means of publishing and accessing data on the web introduced a new class of XML documents called “intensional documents”. • Intensional Documents:XMLdocuments where; • some of the documents are defined explicitly • some are defined by programs that generate data.
INTRODUCTION • materialisation: the process of evaluating some of the programs included in an XML document and replacing them by their results. • GOAL of this PAPER: • Study the new issues raised by the exchange of intensional XML document btw. Applications • Decide on which data should be materialised before it is sent and which should not
INTRODUCTION CONSIDERATIONS for MATERIALISATION • Performance: • current system load • cost of communication • Capabilities: • unability to handle intensional parts of a document • lack of access rights (to a particular service) • Security: • invoking service calls from an untrusted party may cause severe security violations • Functionalities: • confidentiality reasons • calling services may involve fees to be paid.
Sender Receiver capabilities ACL cost ... capabilities ACL cost ... g g r g f q r g g r q INTRODUCTION Data exchange scenario for intensional documents g Data Exchange Schema q f f q g r q ... ... ... ... ...
THE MODEL and THE PROBLEM • SIMPLE INTENSIONAL XML: • Model intentional XML documents as Labelled Trees consisting of two types of nodes: • Data nodes • Function Nodes correspond to “Service Calls” • Assume the existance of someDisjoint Domains: • N :domain of NODES • L :domain of LABELS • F : domain of FUNCTION NAMES • D : domain of DATA VALUES
THE MODEL and THE PROBLEM • SIMPLE INTENSIONAL XML (cont’d) • DEFINITION 1: An intensional documentdis an expression (T,λ) where: • T=(N,E,<) is an ordered tree. • N N: finite set of nodes • E N X N : edges • < : associates with each node in N a total order on its children. • λ :N L U F U D is a labeling function for the nodes. NOTE: only leaf nodes may be assigned data values from D
THE MODEL and THE PROBLEM • SIMPLE INTENSIONAL XML (cont’d) • Nodes with a label in L U D are called Data Nodes. • Nodes with a label in F are called Function Nodes. • The children subtrees of a function node are the Function Parameters • When the function is called; • These subtrees are passed to it • The return value replaces the function node in the document.
THE MODEL and THE PROBLEM newspaper Get_Temp TimeOut title temp date city “Exhibits” “The Sun” “16 ºC” “04/10/2002” “Paris”
THE MODEL and THE PROBLEM • SIMPLE SCHEMA: • DEFINITION 2: A document schema s is anexpression (L,F,τ) where, • L L :finite set of labels • F F :finite set of function names • τ :function that maps: • Each label name l Є L to a regular expression over L U F or to the keyword data • Each function name f Є F to a pair of expressions called • τin(f ) input type of f • τout(f ) output type of f
THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d) • Example of a Schema: • data: • τ (newspaper) =title.date.(Get_Temp|temp) .(TimeOut|exhibit) • τ (title) = data • τ (date) = data • τ (temp) = data • τ (city) = data • τ (exhibit) = data
THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d) • Example of a Schema (cont’d): • functions: • τin (Get_Temp)= city • τout (Get_Temp)= temp • τin (TimeOut)= data • τout (Timeout)= (exhibit|performance) • τin (Get_Date)= title • τin (Get_Date)= date
THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 3: An intensional document t is instance of a schema s=(L,F,τ) if for each: • Data NodenЄ t with label lЄ L, the labels of n’s children form a word in lang(τ(l )) • Same is valid for Function Node. Used to denode the regular language defined by τ (l )
THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 3 (cont’d): f : a function name t1,......,tn : a sequence of intensional trees IFthe labels of n’s children form a word in lang(τin(f)) (lang(τout(f)) ) AND all the trees are instances of s. THEN t1,......,tnis an input instance of f (output instance) every subtree conforms to the same schema as the whole document
THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 4: (about Rewritings) • t,t’: trees • IFt’ is obtained from t by; • selecting a function node v in t with some label fand • replacing it by an arbitrary output instance of f • THENwe say thatt t’ v
THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 4: (about Rewritings) (cont’d) • IFt t1 t2 ------ tn THEN • we say that t tn • nodes v1,........, vn are called rewriting sequence • the set of all trees t’ such that t t’ is denoted ext(t). v1 v2 vn t rewrites into tn * *
THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 5: (about Rewritings) • Let: • t be a tree • s be a schema • 1. IF ext(t) contains some instance of s THEN t possibly rewrites into s. • 2. IFeither t is already an instance of s orthere exists some node vin t such that all trees t’ where t t’ safely rewrite into s THEN we say that t safely rewrites into s v
THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 6: • Let: • s be a schema • r is a distinguished label called root label • IF all the instances t of s with root label r rewrite safely into instances of s’ THENwe say that: s safely rewritesinto s’
THE MODEL and THE PROBLEM • A Richer Data Model : Function Patterns: • The schemas we have seen so far specify that a particular function, identified by its name, may appear in the document. • But sometimes, one does not know in advance which functions will be used at a given place. • A common intensional schema for such documents should not require the use of a particular function, but rather allow for a set of functions, which have a proper signature.
THE MODEL and THE PROBLEM • to specify such set of functions we useFunction Patterns • Function Patterns:A function belongs to the pattern if its name satisfies theboolean predicateand itssignatureis the same as the required one • EX: • τname(Forecast)= UDDIF InACL • τin(Forecast)= city • τout(Forecast)= temp V
THE MODEL and THE PROBLEM • A Richer Data Model (cont’d): • Restricted Service Invocations: • We assumed so far that all the functions appearing in a document may be invoked in a rewriting, in order to match a given schema. • This is not always the case, for the reasons like; • security, • cost, • access rights , etc. • THUS, function names/patterns in the schema can be partitioned into two disjoint groups of invocable and noninvocable ones. • A legal rewriting is then one that invokes only invocable functions.
EXCHANGING INTENSIONAL DATA • Rewriting Process: 1.Safe Writing: • check if t safely rewrites to s • if so, find a rewriting sequence. • rewriting sequence a sequence of functions that need to be invoked to transformtinto the required structure • preferred required structure shortest/ cheapest one
EXCHANGING INTENSIONAL DATA • Rewriting Process(cont’d): 2.Possible Writing : • IF a safe rewriting does not exist • check whether at least t may rewrite to s. • IF it is acceptable to do so (the sender accepts that the rewriting may fail), • try to find a successful rewriting sequence if one exists • preferred rewriting sequence one with the least cost.
EXCHANGING INTENSIONAL DATA • Rewriting Process(cont’d): 3.Mixed Approached: In mixed approach, one could • first invoke some function calls • then attempt from there to find safe rewritings.
EXCHANGING INTENSIONAL DATA • Rewriting Process(cont’d): • DEFINITION 7: • For a rewriting sequencetv:t1 ..tn , • IFV j ЄtibutV jЄti-1 . • THEN we say thatfunction nodeVjdepends on afunction nodeV i. • IF the dependency graph among the nodes contains no paths of length greater than k. • THEN we say that a rewriting sequence is ofdepth k v1 vn
EXCHANGING INTENSIONAL DATA RESTRICTION: “Consider onlyk-depth left-to-right rewritings.“
SAFE REWRITING • Algorithm for k-depth left to right safe rewriting • Algorithm is decomposed into three parts: • 1.Rewriting Function Parameters: • to invoke a function • its parameters should be of right type • if not • they should be rewritten to fit that type. • when rewriting the parameters; • the functions in them can be invoked ONLY IF their own parameters can be rewritten into (i.e. are the expected input type.)
SAFE REWRITING • Algorithm is decomposed into three parts (cont’d) • 1.Rewriting Function Parameters (cont’d) • For deepest functions • Verify that their parameters are instances of the corresponding input types. • If notrewriting fails. • Move upward ( do till all functions in the tree(forest) are done) • Try to safely rewrite f ’s own parameters into the required structure. • If notrewriting fails.
SAFE REWRITING • Algorithm is decomposed into three parts (cont’d) • 2.Top Down Traversal: • In each iteration of the recursive procedure “Rewriting Function Parameters”,the parameters of the outmost functions of tree (forest) are handled. • In this part safely rewrite the tree (forest)by invoking only these outmost functions. • THUS: • traverse the tree (forest) top down • At each step treat a single node and its children.
SAFE REWRITING • Algorithm is decomposed into three parts (cont’d) • 2.Top Down Traversal (cont’d) • node n with children whose labels form a word w • The subtree rooted at node n can be rewritten into the target schema s=(L,F,τ)IF and ONLY IF: • 1. wcan be safely rewritten into a word in lang(τ(label(n))) AND • 2. each of n’s children can be safely rewritten into an instance of s.
SAFE REWRITING • Algorithm is decomposed into three parts (cont’d) • 3.Rewriting the children of a node n: • Given: • wword (sequence of labels of n’s children) • Goal: • rewrite w so that it becomes a word in the regular language R=τ(label(n)) • The process of rewriting involves: • choosing some functions in wand replacing them by a possible output • then choosing some other functions (which might have been returned by previous calls) and replacing them by their output • and so on up to the depth k
SAFE REWRITING • Safe Rewriting Algorithm: • Given: • word w • the output types Rf1,.....,Rfnof the available functions • target regular language R • Purpose of the algorithm: • to test ifwcan be safely rewritten into a word in R • if so, to find a safe rewriting sequence
SAFE REWRITING • Safe Rewriting Algorithm: • Note:For illustration purposes we use the newspaper document • w=title.date.Get_Temp.TimeOut word children labels form • R=title.date.temp (TimeOut|exhibit*)safe rewriting of the above word into the word in R • The Algorithm: • 1) Build the finite state automata for the following regular languages • 1.1) An AutomatonAwaccepting was a single word.
SAFE REWRITING • The Algorithm (cont’d) • 1.2) Build automata Afi ,i=1,...,n each accepting the regular language Rfi • 1.3) Build an automaton A accepting the complement of the regular language R . The automaton should be deterministic and complete.
SAFE REWRITING • The complement automation A for schema τ’(newspaper)=title.temp(TimeOut|exhibit*) * * * * p0 title p1 date temp p3 TimeOut p4 p6 p3 * exhibit * p5 exhibit
SAFE REWRITING • The Algorithm (cont’d) • 2)Let Aw := Aw • 3) For j=1,...,k • Consider all the edgese=(v,u) in Awthat are labelled by the function name fi and not iterated in previous iterations • 3.1) extend Aw by attaching a copy of the automaton Afi with its inital and final states linked to v and u respectively by εmoves. • 3.2) denote v as a fork node (for the edge e) • 3.3) two fork options of v aree itself and the new outgoing ε edge k k k
Get_Temp q2 title date q0 q1 q3 TimeOut q4 ε ε ε ε temp q5 q6 q7 exhibit performance SAFE REWRITING 1 • 1 depth automaton Aw for the word w=title.date.Get_Temp.TimeOut Represents choice of not invoking the function Fork node Fork node Represents choice of invoking the function
SAFE REWRITING • The Algorithm (cont’d) • 4) Construct the cartesian product automaton AX=Aw X A • The fork nodes and fork options in AX reflect those of Aw : • 4.1) the fork nodes [q p] Є AX nodes where q was a fork node in Aw • 4.2) a fork option in AX consists of all edges originating from one fork option edge in Aw. k k k k
SAFE REWRITING • The cartesian product automaton Ax = Aw x A exhibit q4,p6 q5,p5 Performance q7,p5 ε ε ε Performance Exhibit exhibit TimeOut Perform. ε ε q7,p6 q3,p6 q7,p6 q7,p3 q4,p3 ε Get_Temp title date TimeOut q0,p0 q1,p1 q2,p2 q4,p4 q3,p3 ε ε temp q5,p2 q6,p3 Figure6:
SAFE REWRITING • The Algorithm (cont’d): • 5) Mark nodes in AX: • 5.1) mark states that are accepting states in both Aw and A • 5.2) iteratively mark; • nonfork (regular) nodes: IF one of their outgoing edges points to a marked node • fork nodes: IF both of their fork options (for some fi ) contain an edge that points to a marked node. k
SAFE REWRITING • The cartesian product automaton Ax = Aw x A exhibit q4,p6 q5,p5 Performance q7,p5 ε ε ε Performance Exhibit exhibit TimeOut Perform. ε ε q7,p6 q3,p6 q7,p6 q7,p3 q4,p3 ε Get_Temp title date TimeOut q0,p0 q1,p1 q2,p2 q4,p4 q3,p3 ε ε temp q5,p2 q6,p3 Figure6:
SAFE REWRITING • The Algorithm (cont’d): • 6)Try to obtain a SAFE REWRITING. • “A safe rewriting exists IFF the initial state is not marked” • 6.1) Follow a non-marked path(corresponding tow ) starting from the initial state ofAx to a state [q p] where q is an accepting stateofAw • 6.1.1) non-marked fork options on the path determine the rewriring choices (i.e. which functions to call) • 6.1.2)when a function is invoked, we cont,nue the path with the new rewritten word rather than the wordw k
SAFE REWRITING • The Algorithm (cont’d): • 6.2) To minimize the rewriting cost, choose a path with minimal number/cost of function invocations. • EXIT % End of the algorithm
SAFE REWRITING • The complement automaton A for schema τ’(newspaper)=title.date.temp.exhibit* 1 * * * * * q0 title q1 date temp p3 p4 p6 q3 * exhibit * p5 exhibit Figure7:
SAFE REWRITING 1 1 • The cartesian product automatonAx = Aw x A 1 exhibit q4,p6 q5,p5 Performance q7,p5 ε ε ε Performance Exhibit exhibit TimeOut Perform. ε ε q7,p6 q3,p6 q7,p6 q7,p3 q4,p3 ε Get_Temp TimeOut title date q0,p0 q1,p1 q2,p2 q3,p3 ε ε temp q5,p2 q6,p3 Figure8:
SAFE REWRITING • Complexity of the Algorithm: • s0 schema of the sender • s agreed data exchange schema • Complexity is determined by the size of thecartesian product of the automaton. • 1. Construct the cartesian product • 2. Traverse and mark the nodes of the resulting product • THUS complexity is bounded by: • O(|Ax| )=O( ( | Aw | X | A |) ) 2 2 k
SAFE REWRITING • Complexity of the Algorithm: (cont’d) • O(|Ax| )=O( ( | Aw | X |A |) ) 2 2 k Maximum size: O((|s0|+|w|) ) Complexity is polynomial in the size of schemas s and s0 (with the exponent determined by k) k
POSSIBLE REWRITING • The Algorithm • 1.Build finite state automaton for the following languages: • 1.1. An automaton Aw • 1.2. An automaton accepting the regular language R k
POSSIBLE REWRITING • An automaton A for schema τ’’(newspaper)=title.date. Temp.exhibit* p0 title p1 date temp p3 Exhibit p4 p2 exhibit Figure10:
POSSIBLE REWRITING • The Algorithm (cont’d) • 2.Construct the cartesian product automaton Ax=Aw x A k q4,p3 ε ε title date q0,p0 q1,p1 q2,p2 q7,p3 q3,p3 ε ε q7,p4 temp q5,p2 q6,p3 ε q4,p4 exhibit Figure11: