260 likes | 482 Views
presented by: Irene Genitsaridh Univ. of Crete hy561 April 28, 2009. AXML. The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML
E N D
presented by: Irene Genitsaridh Univ. of Crete hy561 April 28, 2009 AXML The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda
The problem • The problem addressed is web data management. • Web characteristics • High heterogeneity of data sources. • Autonomy of data sources. • The scale of the Web. • Result • Web revolution is setting up new standards.
The solution • A language based on XML, Web services and XQuery, for complex data management tasks. • XML suitable model for web data exchange. • Xquery is a query language for XML promoted by the W3C (SQL of the Web) . • Web services are network-accessible programs taking XML parameters and returning XML results.
Active Xml Documents • Embedding calls to Web services inside XML documents.
Materialization. The service invocation is done using the SOAP protocolthe result of this invocation is used to enrich the document. • Tree Representation. The same document at different times will have different semantics.
Active Xml Services • Axml Services are Web services that accept AXML documents as input parameters, and return AXML documents as results. • Materialization becomes a recursive process, since calling an AXML service may return some data that may contain new service calls. After invoking getEvents@TimeOut.com.
Exchange Axml Data • The data exchanged by Web services is controlled by schemas for their input and output, specified within a WSDL description. Similarly, schemas are used to control the axml data exchange. DTD-like syntax The schema distinguishes between accepting a concrete type, e.g., a temperature element, and accepting a service call returning data of this particular type. The actual syntax in the system is an extension of XML Schema.
Query optimization on Axml documents • A site about a city’s night-life (restaurants- movies). Query: /goingOut/movies//show[title= "The Hours"]/schedule. • No point in materializing calls below the path: /goingOut/restaurants. • Avoid materializing a call found below: /goingOut/movies • Solution: ( Naive approach ) Materializing all the calls in the document recursively, until a fixpoint is reached, and finally running the query over the resulting document.
Evaluation Approach • Evaluation approach: Lazy evaluation Identifying in advance a tight superset of the service calls that should actually be invoked to answer a query. • General Problem: Service calls may appear anywhere in the data, and dynamically in results of previously materialized calls. • Solution: Force sufficient conditions for termination or that the computation halts if a full state is not reached after some time limit.
Relevant service calls • Sample Active Xml Document.
A sample query. • Queries are modeled by tree patterns. The relevant functions in the above Axml doc are 1, 3, 4 and 10.
Evaluation technique • Computing the set of relevant service calls: Given a query, generate a set of auxiliary queries that, when evaluated on a document, retrieve all service calls that are relevant to the query. • Advantage : In contrast to the naive approach only functions that may contribute to the query result are invoked. • Disadvantage: There is a tradeoff between accuracy and efficiency. It is expensive to exactly detect which calls are relevant and which are not. • The challenge is thus to find the right balance between the efforts spent on ruling out irrelevant calls and the actual time saved by avoiding their invocation.
2. Evaluation technique • Pruning via typing: The return types of services are used to rule out more irrelevant service calls. • Pushing queries • For instance, getNearbyRestos may return many restaurants. As we are only interested in five-star ones, and more precisely, only in their names and addresses. • Push to the function call a precise subquery, specifying that it has to apply the five-star rating selection, and only return the relevant names and addresses.
Complete relevant rewriting • Algorithms to find a complete relevant rewriting: • Linear path queries (LPQ) • /*() • /nyHotels/*() • /nyHotels/hotel/*() • /nyHotels/hotel/name/*() • /nyHotels/hotel/rating/*() • /nyHotels/hotel/nearby/*() • /nyHotels/hotel/nearby//*() • /nyHotels/hotel/nearby//restaurant/*() • /nyHotels/hotel/nearby//restaurant/name/*() • /nyHotels/hotel/nearby//restaurant/address/*() • /nyHotels/hotel/nearby//restaurant/rating/*()
Correct, but usually inaccurate. Ignores filtering conditions in the path from the root or in other branches that could make some of the functions irrelevant (e.g. there is no chance that a getNearbyRestos() function node under a hotel is relevant, if the hotel rating is not “*****”). Constructing one linear path query per node.
Complete relevant rewriting • Node Focused Queries. Instead of constructing one linear path query per node in the query, it is used an algorithm called NFQ that includes the filtering conditions from the original query. In Contrast with Linear Path Queries, now the function nodes that are relevant for a query q are precisely the ones retrieved by the NFQs of q.
3. Evaluation technique • Service calls sequencing: The relationships among the calls are analyzed to derive an efficient sequence of call invocations appropriate to answer the query. • An algorithm based on NFQ called NFQA is used to compute a (possibly infinite) relevant rewriting. If it terminates, the obtained document is complete for the query q.
4. Evaluation technique • F-guide: A specialized access structure in the style of data-guides is used to speed up the search for relevant calls. • The structure acts as an index, summarizing concisely the occurrences of functions (service calls) in the documents (hence its name, F-guide). • The F-guide also holds the path extents: for each path we keep pointers to the corresponding function call nodes in the document.
Axml Peer Functionality • It is adopted a distributed architecture based on the peer-to-peer paradigm to support the AXML language. • Each participant may act both as a client and as a server. • AXML peers have essentially three facets: • Repository. • Server (may provide Web services for other peers to use). • Client (may invoke the corresponding Web services that other peers provide).
The Axml Peer as Client • Enforce the following policy: Temperature information is refreshed daily. • Simple constructs in the language support specifying when service calls are invoked. So the language will enable specifying the above policy. • In this situation: • Service calls should generally be kept inside AXML documents, for future reuse. • Materialization will not replace service calls by their results anymore, but will append the results of each call next to it.
Important features of AXML Service calls. • Where to get the arguments of a call? The arguments of a service call are specified as children of the call element. In the simplest case, an argument is plain XML. More generally, arguments can be AXML data, and therefore may themselves contain service calls. • When to activate a call? A special attribute of the call element. • How long data returned by a service call remains valid? A special attribute of the call element.
The Axml Peer as Server • In an AXML peer, AXML services can be defined as parameterized queries or updates over the peer’s AXML documents. Sample Service
The Axml Peer Implementation • The AXML peer is implemented in Java. • The AXML peer relies on the Apache Xerces XML parser to parse documents, and manipulate them. The AXML peer also uses the Apache Xalan processor for XPath queries and XSLT transformations. • The Tomcat servlet engine: the AXML Peer needs to act as a Web server. • AXML documents can be turned into a Web application through Java Server Pages. • Axis is a Java toolkit that enables Web services functionality both on the server-side and the client-side. • The AXML peer relies on the X-OQL engine to execute complex queries on XML documents.
The Axml Peer : Experiences • Some of the applications that we developed using AXML peers. • Peer-to-peer auctions: The main goal of this application is to illustrate the flexible discovery mechanism of new peers and auctions. • Electronic patient record management: The goal of this application, is to show that AXML can seamlessly manage distributed data and the privacy of this data. This is done by combining the AXML language with GUPster framework (access control). • Academic and Industrial Collaborations • Distribution of Mandriva Linux : Aims at better management the production and distribution of Open Source software.