280 likes | 441 Views
Caching XML Web Services to Support Disconnected Operation. Venugopalan Ramasubramanian Cornell University Doug Terry Microsoft Research, Silicon Valley. Web Services. method of providing and accessing services on the Internet consumer services hotmail, orbitz, mapquest, ebay, …
E N D
Caching XML Web Services to Support Disconnected Operation Venugopalan Ramasubramanian Cornell University Doug Terry Microsoft Research, Silicon Valley
Web Services • method of providing and accessing services on the Internet • consumer services • hotmail, orbitz, mapquest, ebay, … • B to B services • supply chain management • request-response paradigm • RPCs on the internet
XML Web Services • W3C (world wide web consortium) standards • Microsoft, IBM, HP, … • Microsoft .Net web services (HailStorm) • mycontacts, myprofile, myfavoritewebsites • TerraServer, CoolRooster • SOAP (simple object access protocol) • standard representation of web service requests/responses (SOAP-RPC) • WSDL (web services description language) • description of web services
Availability of Web Services GOAL make web services available despite frequent disconnections and limited bandwidth! • web service clients reside on all kinds of devices • desktop, laptop, PDA, smart phone • network outages (especially wireless) • bandwidth restriction
Governing Principles • cannot modify web services • cannot modify access protocols • can perhaps modify client • must also comply with existing clients • can interpose storage and computation client-side caching is a solution to improve availability!
XML Standards: SOAP • SOAP-RPC standard • encoding definitions for data types • success, failure definitions • SOAP-Envelope • outer-most element • SOAP-Body • obligatory • request operation: name, parameters • response status: return value, failure • SOAP-Header • optional, multiple header blocks. • supplementary information: kerberos ticket • HTTP binding • HTTP request and response messages
example: soap request <s:Envelopexmlns:s=“http://schemas.xmlsoap.org/soap/envelope/” xmlns:m=“http://schemas.microsoft.com/hs/2001/10/myContacts” xmlns:c=“http://schemas.microsoft.com/hs/2001/10/core” xmlns:mp="http://schemas.microsoft.com/hs/2001/10/myProfile" > <s:Header> <licensesxmlns="http://schemas.xmlsoap.org/soap/security/2000-12"> <c:identity> <c:kerberos>3240</c:kerberos> </c:identity> </licenses> <pathxmlns="http://schemas.xmlsoap.org/rp/"> <action>http://schemas.microsoft.com/hs/2001/10/core#request</action> <to>http://terry.microsoft.com</to> <fwd><via /></fwd><rev><via /></rev> <id>b55528a4-5d63-49f1-87a2-5fab8d76f658</id> </path> <c:requestservice="myContacts" document="content" method="insert" genResponse="always" > <keypuid="3240" instance="1" cluster="1" /> </c:request> </s:Header> <s:Body> <c:insertRequest select="/m:myContacts/m:contact[mp:name/mp:givenName = ‘Terry']/mp:emailAddress" > <mp:email>terry@microsoft.com</mp:email> </c:insertRequest> </s:Body> </s:Envelope>
XML Standards: WSDL • concrete definition of the web service • data structures • interface offered by the web service • operation names and parameters • message formats (components of a message) • protocol binding (SOAP) • automatic generation of client-side stubs • Visual Studio .Net
MyContacts MyServices MyProfile cache Experiments with Web Cache • experiment with existing clients and services (Microsoft .Net web services) • check feasibility by building a cache to store HTTP requests/responses
Issues in Caching • web services are active • default HTTP cache directive is No Cache! • web services are diverse • unlike files and databases, web services have custom interfaces • fundamental questions • which requests are cacheable? • which operations have permanent side effects? • how to understand requests/responses? • services use different formats for requests/responses
example: soap request <s:Envelopexmlns:s=“http://schemas.xmlsoap.org/soap/envelope/” xmlns:m=“http://schemas.microsoft.com/hs/2001/10/myContacts” xmlns:c=“http://schemas.microsoft.com/hs/2001/10/core” xmlns:mp="http://schemas.microsoft.com/hs/2001/10/myProfile" > <s:Header> <licensesxmlns="http://schemas.xmlsoap.org/soap/security/2000-12"> <c:identity> <c:kerberos>3240</c:kerberos> </c:identity> </licenses> <pathxmlns="http://schemas.xmlsoap.org/rp/"> <action>http://schemas.microsoft.com/hs/2001/10/core#request</action> <to>http://terry.microsoft.com</to> <fwd><via /></fwd><rev><via /></rev> <id>b55528a4-5d63-49f1-87a2-5fab8d76f658</id> </path> <c:requestservice="myContacts" document="content" method="insert"genResponse="always" > <keypuid="3240" instance="1" cluster="1" /> </c:request> </s:Header> <s:Body> <c:insertRequest select="/m:myContacts/m:contact[mp:name/mp:givenName = ‘Terry']/mp:emailAddress" > <mp:email>terry@microsoft.com</mp:email> </c:insertRequest> </s:Body> </s:Envelope>
Issues in Caching contd. request 1: query request • consistency • later requests might invalidate responses cached earlier. • read/write, write/write conflicts • how to specify consistency requirements for generic web services? <queryRequestselect = “myContacts/contact[name=‘terry’]”/> request 2: delete request <deleteRequestselect = “myContacts/contact[name=‘terry’]/phone[@cat=‘cell’]”/>
More Issues… • user experience • user unaware of web service cache • operations reportedly successful could fail! • hoarding • keeping the cache hot • user controlled hoard requests • security • enforce access control
Our Approach • annotate WSDL description of web services to define cache properties • published by service providers or third party • no changes to server side code required • transparent cache for web services • acts as a web proxy on the client machine • no modifications of the client program necessary • custom cache managers for each web service • generated automatically from the annotated WSDL description
I N T E R N E T Web Service 1 Web Service 3 Web Service 2 Web Client 1 Web Client 2 C C M 1 C C M 2 C C M 3 Architecture Proxy Server Cache WBQ CCM1: Custom Cache Manager 1 WBQ: Write Back Queue
WSDL Annotations: for each Operation • cacheable: the operation can be cached • lifetime: the duration for which replies are cached • play-back: the operation has side effects and must be played back when connection is restored • default-response: a default response will be sent when connection is not available
WSDL Annotations: for each Service • identify the operation (operationName) • xpath (xml query language) expression to extract the name of the operation • extract the request message (identifier) • portions of the request message should be ignored while caching (date) • xpath expression to extract relevant parts of the message for identification
snippet from annotated myContacts.wsdl <bindingname="myContactsBinding"type="tns:myContactsPort" operationName = "substring-before(localname(/senv:Envelope/senv:Body/*[1]), 'Request')" Identifier = "/senv:Envelope/senv:Header/s0:licenses | /senv:Envelope/senv:Header/s1:request | /senv:Envelope/senv:Body"> <s:bindingtransport="http://schemas.xmls.org/s/http"style="document"/> <operationname="insert"cacheable="false"playback="true"defaultResponse="true"cacheHeader="true"> <s:operationsAction="http://schemas.microsoft.com/hs/2001/10/c#request"/>
Annotations for Consistency • when does request 2 invalidate the response of an earlier request 1 in the cache? • an insert could invalidate an earlier query response • consider requests to be functions with signatures req1: op1 (param1,1, param1,2, …, param1,n) req2: op2 (param2,1, param2,2, …, param2,m) • invalidate condition is an expression of req1 and req2 f(op1, op2, param1,1, …, param2,1, …)
Annotations for Consistency: XSL Transformations • extensible style sheet language (XSL) • transforms XML documents in to html/text/xml • Turing-complete language • cache transform: transforms a cached response • input: request1, reply1, request2, reply2 • output: transformed reply1 (null if invalidated) • powerful than just specifying invalidations • can actually transform the old response
Cache Transform Example request 1: query request <queryRequestselect = “myContacts/contact[name=‘terry’]”/> request 2: delete request <deleteRequestselect = “myContacts/contact[name=‘terry’]/phone[@cat=‘cell’]”/> smart cache transform would delete the cell phone number from the cached query response
<xsl:templatematch="/"> <xsl:variablename="service1"select="$req1/s:Header/c:request/@service"/> <xsl:variablename="service2"select="$req2/s:Header/c:request/@service"/> <xsl:variablename="opName1"select="substring-before(local-name($req1/s:Body/*[1]), 'Request')"/> <xsl:variablename="opName2"select="substring-before(local-name($req2/s:Body/*[1]), 'Request')"/> <xsl:choose> <xsl:whentest="$service1 = $service2"> <xsl:choose> <xsl:whentest="$opName2 = 'query' and ($opName1 = 'insert' or $opName1 = 'delete' or $opName1 = 'replace')"> <xsl:variablename="cleanQuery1"> <xsl:call-templatename="StripSegment"> <xsl:with-paramname="xpQuery"select="substring-after($req1/s:Body/c:*/@select, '/')"/> </xsl:call-template> </xsl:variable> <xsl:variablename="cleanQuery2"> <xsl:call-templatename="StripSegment"> <xsl:with-paramname="xpQuery"select="substring after($req2/s:Body/c:queryRequest/c:xpQuery/@select, '/')"/> </xsl:call-template> </xsl:variable> <xsl:call-templatename="CheckIntersection"> <xsl:with-paramname="xpQuery1"select="$cleanQuery1"/> <xsl:with-paramname="xpQuery2"select="$cleanQuery2"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-ofselect="$rep2"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:value-ofselect="$rep2"/> </xsl:otherwise> </xsl:choose> </xsl:template>
Picking Level of Consistency • user-freedom in choosing consistency guarantees • multiple consistency transforms • strong consistency • less availability • better user experience • weak consistency • user experience could deteriorate • operations reportedly successful could fail! • optional cache header • better availability
More Transforms • response transform • response from the cache may have to be changed before returning to the client. • adding time-stamp, unique identifiers etc. • default response transform • generates a default response for a request. • default responses are returned when disconnected but request is queued for play-back
Optional Cache Header • cache provides information to the client using cache header • response from cache or server • age of cached response • request will be played back in the future • no changes to the definition of WSDL • would not affect existing clients in any way. • cache aware clients can provide additional information to the user
example: default response and cache header <s:Envelopexmlns:s=“http://schemas.xmlsoap.org/soap/envelope/” xmlns:hs="http://schemas.microsoft.com/hs/2001/10/core"> <s:Header> <pathxmlns="http://schemas.xmlsoap.org/rp/"> <action>http://schemas.microsoft.com/hs/2001/10/core#response</action> </rev> <from>http://terry.microsoft.com</from> <relatesTo > d978b559-aceb-4e9e-9747-b8a306234bc8 <relatesTo> </path> < responsexmlns ="http://schemas.microsoft.com/hs/2001/10/core" /> <cacheHeaderdefaultResponse="true" toPlayback="true" xmlns="http://localhost/wsdlannotation" /> </s:Header> <s:Body> <hs:insertResponsestatus="success" selectedNodeCount="1" newChangeNumber="0" /> </s:Body> </s:Envelope>
Conclusion • built a prototype web services cache • experimented with Hailstorm web services and clients • annotated Hailstorm WSDL files • the prototype demonstrates custom cache managers in action for Hailstorm • couldn’t give a demo
Work for the Future • WSDL annotations for more web services • hard to find interesting web services with WSDL descriptions yet! • hoarding to enhance availability • specify user controlled hoard queries • hoard transform to obtain response from cached hoard requests • incorporate security constraints • tune cache performance