620 likes | 800 Views
Active XML: A data-centric perspective on Web services Omar Benjelloun INRIA Futurs With: Serge Abiteboul, Tova Milo, and many others. April 30 th , 2004. Active XML - Outline. Introduction Active XML Active XML documents Active XML services Novel issues Exchanging Active XML data
E N D
Active XML: A data-centric perspective on Web servicesOmar BenjellounINRIA Futurs With: Serge Abiteboul, Tova Milo, and many others. April 30th, 2004
Active XML - Outline • Introduction • Active XML • Active XML documents • Active XML services • Novel issues • Exchanging Active XML data • Querying Active XML data • Active XML Peers • The peer as a client • The peer as a server • Theoretical foundations • Applications • Conclusion
Distributed data management in P2P Information is everywhere Web service XML XML XML services XML services Internet services XML XML XML XML Data warehouses Databases Web sites PC, PDA, cell phones, home appliances, cars… Web service services
The golden triangle of distributed data management • XML • a standard for data representation & exchange • Extensible Markup Language • Labeled ordered trees • Types: XML Schema / tree automata • Query languages • XPath, XQuery • Web services • standards for distributed computing • SOAP, WSDL, UDDI • Activation of methods on remote servers • Many burgeoning standard proposals(Choreography, QoS, user interface, etc.) XML • XQuery XPath SOAP WSDL
What is Active XML (AXML)? • AXML is a declarative language • for distributed information management • and • an infrastructure to support this language, • in a peer-to-peer framework.
Active XML documents • XML documents with embedded calls to Web services • Intensional • Some of the data is given explicitly • Some is given intensionally (i.e. the means to acquire data when needed are given) • Dynamic • If the external sources change, the same document will provide different information • Reaction to world changes
Not a new idea in databases, nor on the Web • Mixing calls to data is an old idea • Procedural attributes in relational systems • Basis of Object-oriented Databases • In Web programming • Sun’s JSP, PHP+MySQL • Calls to Web services inside documents • Macromedia FLEX, Apache Jelly, Microsoft XAML • What is new is the exploitation of the idea…
Web services in brief • A number of standards • XML • SOAP: Exchange of messages between applications • WSDL: Description of service interfaces (e.g. input/output types) • UDDI: Advertisement and discovery of services • … other proposed standards (choreography, security, etc.) • For us: means to provide, invoke and describe remote functions with XML input/output. • They make AXML documents universally understandable.
A sample AXML document city newspaper title date GetTemp GetEvents “Exhibits” “06/10/2003” “Paris” “Le Monde” <?xml version=“1.0” ?> <newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“Yahoo.GetTemp”> <city>Paris</city> </call> <call svc=“TimeOut.GetEvents”> exhibits </call> </newspaper> • AXML documents may contain calls: • to any existing Web services (e-bay.net, google.com…) • to any AXML Web services (to be defined)
Materialization temp title newspaper date city GetTemp GetEvents “Exhibits” “Paris” “16°C” Y! <?xml version=“1.0” ?> <newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“Yahoo.GetTemp”> <city>Paris</city> </call> <call svc=“TimeOut.GetEvents”> exhibits </call> </newspaper> • We will see later that: • Replacing the call by its result is not the only option • Calls are not necessarily RPC-style synchronous invocations “06/10/2003” <temp>16°C</temp> “Le Monde” SOAP call
AXML Web services • Parameters: AXML data • Result: AXML data • Distribute computations:by sending as parameters data containing service calls, one can delegate some work to other peers. • Partial computations:by returning data containing service calls, one can give to the receiver the control of these calls. Great flexibility
Calling an AXML service newspaper title date exhibits City temp GetEvents “Exhibits” GetExhibits “16°C” “Paris” T! <?xml version=“1.0” ?> <newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“TimeOut.GetEvents”> exhibits </call> </newspaper> • Materialization is a recursive process • Termination is an issue “06/10/2003” <temp>16°C</temp> “Le Monde” SOAP call (still…) <exhibits> <call svc=“Yahoo.GetExhibits”> <city>Paris</city> </call> </exhibits>
Organization • Novel issues raised by the AXML language • Exchange of AXML data • Querying AXML data • Supporting infrastructure • AXML peers: • Management of persistent AXML data • Declarative AXML services • Applications
Active XML - Outline • Introduction • Active XML • Active XML documents • Active XML services • Novel issues • Exchanging Active XML data (SIGMOD 2003) • Querying Active XML data • Active XML Peers • The peer as a client • The peer as a server • Theoretical foundations • Applications • Conclusion
temp date title newspaper temp city city date title newspaper GetEvents GetEvents GetTemp GetTemp “Exhibits” “Exhibits” “06/10/2003” “06/10/2003” “Le Monde” “Le Monde” “Paris” “Paris” “16°C” “16°C” Y! To call or not to call ? • Materialization can be performed • by the sender, before sending a document… • or by the receiver, afterreceiving it.
Why control the materialization of calls? • For added functionality, e.g. • Intensional data allows to get up-to-date information. • For security reasons or capabilities, e.g. • I don’t trust this Web service/domain, • I don’t have the right credentials to invoke it, • It costs money, • Maybe the receiver doesn’t know Active XML! • For performance reasons, e.g. • A proxy can invoke all the services on behalf of a PDA. • … and many more reasons you can think of!
How to control it? Using types Receiver Sender Capabilities ACL Cost ... Capabilities ACL Cost ... • We extend XML Schema, withintensional types: XMLSchemaint g data exchange Schema q f g f q ... ... g g g q f r r g f ... q g g q ... r ... ... ... ... • Static analysis algorithms use signatures of services:WSDLint
The extended schema language city newspaper title date GetTemp GetEvents “Exhibits” “06/10/2003” “Paris” “Le Monde” To simplify, we use here a DTD-like syntax • Data: • newspaper = title.date.(GetTemp|temp).(GetEvents|exhibit*) • title = data • date = data • temp = data • city = data • exhibit = title.(GetDate|date) • Functions: • GetTemp(city) -> temp • GetEvents(data) -> (exhibit|performance)* • GetDate(title) -> date • Rewriting: replace call(s) by anarbitraryoutput of the service.
Rewritings • The Goal: Given • an intensional document d • a schema s, Can we rewrited so that it matches s? • Safe rewriting: one that for sure leads to s • (we know without making any call). • Possible rewriting: one that may lead to s(depending on the answers of services).
Difficulties • Infinite search space • Vertical • Horizontal • Main problem • The result of a Web service call is unknown, • We just know a signature (input/output types) • We want a very efficient solution. • Foundations of the problem • String & tree automata, • with existential and universal transitions.
Results • The general problem is undecidable [MSS03] • Restrictions on the considered rewritings • Left-to-right: No “going back and forth” • K-depth: bound on the nesting of function calls (Search space still infinite but finitely representable) • Under these restrictions • We have algorithms to find safe/possible rewritings. • They are PTIME(for deterministic schemas). • We can also do it between schemas. • Implementation • demo at VLDB 2003 (customizable news syndication)
Safe rewriting algorithm • Sketch • Deal with function parameters first, • Top-down traversal of the tree, • For each data node: • rewrite its children (viewed as a word), • to match the target type (a regular expression) • using regular automata techniques, and smart marking.
Safe rewriting algorithm (2) title date GetTemp GetEvents • Build an FSA that accepts all k-depth rewritings of the initial word. • Build an FSA that recognizes the complement of the target type. q3 q1 q4 q2 q0 temp q5 q6 q7 exhibit performance * * * * GetEvents * title date temp p0 p4 p6 p1 p2 p3 * * exhibit p5 exhibit
Safe rewriting algorithm (3) exhibit q4,p6 q7,p5 q4,p5 exhibit performance performance exhibit GetEvents exhibit performance q7,p6 q3,p6 q7,p3 q4,p3 q7,p6 GetTemp title date GetEvents q1,p1 q2,p2 q3,p3 q4,p4 q0,p0 temp q5,p2 q6,p3 • Compute the intersection of these languages: • A smart marking determines whether a safe rewriting exists. • Then run the word on the marked automaton to find an actual rewriting. • Optimization: lazy construction of the automata
Active XML - Outline • Introduction • Active XML • Active XML documents • Active XML services • Novel issues • Exchanging Active XML data • Querying Active XML data (SIGMOD 2004) • Active XML Peers • The peer as a client • The peer as a server • Theoretical foundations • Applications • Conclusion
Querying AXML Data City exhibits city temp newspaper title getDate “19°C” • Given a (tree pattern) query: • /newspaper[temp > 18°C]/exhibits//exhibit[location=“Le Louvre”] • Materialize the document? • Call only the services that may contribute • data to the query answer. • The problem: Lazy evaluation of service calls • To call or not to call, this time when evaluating a query GetEvents GetTemp “Exhibits” GetExhibits “Paris” “Le Monde” “Paris”
Lazy evaluation • Difficulties: • Calls can be found everywhere in the document • May appear dynamically (as a result of previous calls) • May become (ir)relevant due to previous invocations • Need to take signatures of calls into consideration • A possible approach: modify the query processor • Top-down evaluation • Trigger the calls found on the way • Not so great: • Computation is blocked • Optimization opportunities are lost
Our solution temp newspaper location exhibits exhibits temp newspaper exhibit * * * > 18°C • Given a query to evaluate: • Derive a set of • “node-focused” queries (NFQ), • that find the relevant calls • when evaluated on the document. • Need to be reevaluated, as the document evolves! > 18°C “Le Louvre” Etc.
Optimizations • Service calls sequencing • Analysis of the relationship between calls (through the NFQ’s) • Layering, and parallelization inside each layer. • Refinement via type analysis • Matching output types of services with data expected of queries • “Pushing” queries to capable services • Acceleration: • Via relaxation: • NFQ approximation • Superset of the relevant calls • Via a special access structure, similar to a DataGuide: • Restricted to paths that lead to service calls • Indexes the calls • Experimental assessment • 10x speed-up when combining optimizations
Distributed data management in P2P Web service XML XML XML AXML AXML services AXML XML services Web AXML AXML services XML XML AXML XML XML AXML Web service services
What do we need from an AXML system ? AXML peer soap • Persistent, manageable, dynamic AXML data. • Easy ways to define services • Control of the exchanged data (parameters & results of service calls) • Peer-to-peer architecture, where each AXML peer: • Repository: manages persistent AXML data • Client: uses (AXML) Web services • Server: provides AXML services
Global architecture AXML peer S2 AXML peer S1 SOAP query Query engine AXML engine AXML AXML peer S3 AXML SOAP wrapper read update SOAP AXML store service descriptions SOAP service XML XML SOAP client
Implementation • SUN’s Java SDK 1.4 (includes XML parser, XPath processor, XSLT engine) • Apache Tomcat 4.1 servlet engine • Apache Axis SOAP toolkit 1.1 • X-OQL query processor, persistent DOM repository • JSP-based Web user interface, using JSTL 1.0 standard tag library • Also, a lightweight implementation for PDA/phone (J2ME, CLDC profile), used for [ABB03demo].
Active XML - Outline • Introduction • Active XML • Active XML documents • Active XML services • New issues • Exchanging Active XML data • Querying Active XML data • Active XML Peers • The peer as a client • The peer as a server • Theoretical foundations • Applications • P2P auctions • News syndication • Other applications • Conclusion
Managing persistent AXML data • “Our newspaper should have its temperature information refreshed daily. New exhibits should be fetched every week and archived for 6 months” • Service call results enrich the document (calls can be kept for possible future reuse) • Main issues: • When to activate a service call? • What to do with its result?
When to activate a service call? • Explicit pull mode • Daily, weekly, or after some event: e.g., when another call occurs • This aspect of the problem is related toactive databases • Implicit pull mode • Detect which intensional information (the service calls) may contribute to the answer of a query (lazy evaluation) • This aspect of the problem is related todeductive databases • Push mode • Based on a query subscription; the service provider pushes information to the client (E.g., for synchronization purposes) • This is related tostream and subscription queries
How long does the returned data remain valid? Just long enough to answer a query: Mediation 1 day, 1 week, … or unbounded: Caching / Warehousing Various portions of the document may follow different policies: Hybrid For repeated service call invocations: mergepolicy append, replace, Fusion (using XML Schema-like keys), Specific merge policies can be provided as Web services Managing service call results
Example: AXML document with control attributes <?xml version=“1.0” ?> <newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“Yahoo.GetTemp” mode=“lazy” valid=“1 day” merge=“replace” > <city>Paris</city> </call> <call svc=“TimeOut.GetEvents” mode=“every Monday morning” valid=“6 months” merge=“append”> exhibits </call> </newspaper>
Active XML - Outline • Introduction • Active XML • Active XML documents • Active XML services • Novel issues • Exchanging Active XML data • Querying Active XML data • Active XML Peers • The peer as a client • The peer as a server • Theoretical foundations • Applications • Conclusion
Declarative AXML services • Servicescan be defined by queries or updates over the AXML documents of the repository (XQuery, XPath, Xupdate) • Which (lazy) service calls may contribute to the answer? • let service GetExhibitsByLocation($loc) be • for $a in document(“newspaper.xml")/newspaper/exhibits, • $b in $a//exhibit • where $b@name=$loc • return <exhibits> {$b} </exhibits>
Other means to define services • Other programming languages: • XSLT transformations (through Apache Xalan) • Java classes (through Axis) • Composition of existing services: • BPEL4WS (through IBM’s BPEL4J implementation)
Active XML - Outline • Introduction • Active XML • Active XML documents • Active XML services • New issues • Exchanging Active XML data • Querying Active XML data • Active XML Peers • The peer as a client • The peer as a server • Theoretical foundations (PODS 2004) • Applications • Conclusion
Theoretical foundations: Positive AXML • Restricted framework • Data model • set-based (unordered) AXML trees • Call results are accumulated in documents • Services • Monotone • Positive: defined by conjunctive fragment of XQuery • Results • Well-defined (possibly infinite)fix-point semantics • Termination, lazy evaluation… • Connections to: • Regular (infinite) trees, Query-Sub-Query [AM04],…
Demos • Peer-to-peer auctions (VLDB 2002 demo) • Discovery of new peers/auctions through intensional answers • RSS News syndication (VLDB 2003 demo 1) • Customization of services through schemas + news subscriptions • Distributed workspaces(VLDB 2003 demo 2) • Web warehousing(ECDL 2003 demo) A powerful framework for the fast development of distributed, data-centric applications.
Other applications • E.dot, a dynamic warehouse on food risk management • Use AXML as the platform for the warehouse definition, construction and maintenance • Network configuration • Use AXML exchange of information to configure hardware/software components • Software distribution • Use AXML to customize distributions and keep your view of the software fresh • Decentralized user profile/patient data management • Use AXML to coordinate the integration of data, and privacy enforcement services in a uniform way