460 likes | 559 Views
A Secure,Publisher-Centric Web Caching Infrastructure. April 19 th , 2001. Selcuk Uluagac Aravind Pavuluri. Outline. Dynamic Caching Motivation & Gemini Security Issues Incremental Deployment Design & Implementation Performance Conclusions & Discussion. Outline.
E N D
A Secure,Publisher-Centric Web Caching Infrastructure April 19th, 2001 Selcuk Uluagac Aravind Pavuluri
Outline • Dynamic Caching • Motivation & Gemini • Security Issues • Incremental Deployment • Design & Implementation • Performance • Conclusions & Discussion 18845-01
Outline • Not Finished Yet !! • Active Cache: Caching Dynamic Contents on The Web “ Pei Cao et al.” • A Publishing System For Efficiently Creating Dynamic Data “Arun Iyengar et al.“ 18845-01
Dynamic Web Caching ? • Content generated on every request • Scripting Languages (Perl, CGI, Java,VBScript, etc.) • Personalization and E-commerce transactions • Presently not cached 18845-01
General Approach 18845-01
Gemini & Motivation • Drawbacks of Current Cache Infrastructure • Incapable of reporting access statistics • Not able to handle dynamic content • Loss of publisher control over the content • Not publisher centric • Solution is Gemini.. 18845-01
Key Elements of Gemini Architecture • Node (Cache) • Security Architecture • Incremental Deployment Strategy Gemini • Control Plane Data Plane Consistency control Filtering Logging&Reporting Versioning QoS Sand boxed VM Access Control 18845-01
Security Issues.. • The need for a new security approach??? • Active participant caches, not just end-to-end • Cache is responsible for reporting logs • Design Goals • Protect the publisher as well as the cache • Publisher decides who to trust • Publishers/clients find out about attacks eventually • The system should be incrementally deployable 18845-01
Security Background • RSA (Rivest,Shamir, Adleman) • Encryption • Public Key Private Key • Public Key Infrastructure (X.509) • Digital Signature • Verification • Certificate • Certificate Authority 18845-01
A New Trust Model • Cache Authorization • Publishers explicitly specify which content a cache can generate • Cache Verification • Publishers and clients verify that authorized caches are performing correctly 18845-01
Authorization & Content GenerationSteps… • PKI provides key distributions to clients, caches, publishers • Publisher’s certificate identifies its web site & PK • Certificate {P, KP,Valid, Expires, CA}Kca-1 • Publisher lists authorized caches for an object ?? • ACL: {URL,K1 K2,.. Kn,,Valid, Expires,P}Kp-1 • Publisher gives the cache: ACL, {Headers, Body} Kp-1 • Uses Pragma header field not to confuse legacy caches • Cache generates the content using the Body • Cache sends client • ACL,{URL,Cache,Client,H(Request),CurrDate,Body}Kcache-1 18845-01
Authorization & Content GenerationSteps… • Client is able check the signature on ACL and verify the authorization of the cache • Client verifies • Cache is in ACL & Cache Signature is valid • Cache signature’s purpose • Tamper detection by client • ID of cache generating the content • Non-repudiation • Cache can perform access control on the content based on the demand of publisher (cookie etc.) 18845-01
Verification • Client sends a feedback to the publisher regarding the misbehaving cache • Similarly, inconsistencies in cache log reporting can be detected • Publisher removes the cache from the ACL list ??? • When to question cache responses? • Publisher initiated (fake clients..) • Client initiated 18845-01
Protecting the cache • Publishers may send malicious code to caches • To prevent that.. • Publisher’s code runs inside sand boxed JVM • Limited API exposed to publisher’s code • Resource restrictions using OS level controls to counter denial-of-service attacks 18845-01
Incremental Deployment Strategy… Principles • Cache and document heterogeneity • Transparency to clients • Transparency to legacy caches • Proximity Leaf Cache 18845-01
Discovering Gemini Documents… • Publishers explicitly notify Gemini caches about documents that have associated Gemini documents. • Notification contains • Server name • Pattern to match • Transformation • They’re piggy-backed on HTTP responses • Caches store notifications as soft state 18845-01
Serving a request… 18845-01
Leaf Discovery • Leaf Cache Gemini cache which translates a request for a regular document into a request for a Gemini document. • With security the leaf cache becomes the first cache that both has the proper lookup table entry and is authorized by the publisher 18845-01
Scalability • Leverages thousands of legacy caches to help deliver Gemini documents • Computational burden is pushed as close to the edge of the network as possible. 18845-01
Node Design & Implementation 18845-01
Node Design & Implementation(cont…. ) • Platform => On top of Squid • Runtime Language => Java • Platform independent • Allows sand boxing • Partitioning of functionality • Squid Process • Look up table • Fetch Gemini Documents • Forwarding Gemini requests • Gemini Process • JVM • Security 18845-01
Node Operation • Squid front end receives the request from the client • Hands the requests to Gemini process via IPC • Gemini threads begin to process (Dispatcher,Checker, Worker) • The output is signed by the worker thread and sent to client • Request is logged 18845-01
Performance Evaluation • 5 to 15 times response time degradation for non-active Gemini documents • Signing the reply accounts for 90% of processing time 18845-01
Performance Evaluation (cont..) 18845-01
Conclusions & Discussion • Gemini addresses the Security issues in Dynamic Web Caching • Provides a node implementation • Provides a publisher centric architecture • End user performance ??? 18845-01
A Publishing System For Efficiently Creating Dynamic Data Arun Iyengar et al. IBM Research T.J. Watson Research Center 18845-01
Problems with Dynamic Caching At A First Glance • Several Problems With Dynamic Data Generation • Expensive to create • Overhead • Consistent update (we already know this!) • More ??? 18845-01
Little Fragments… • Fragments • Objects • Atomic vs. Complex Object • Object Dependence Graph(ODG) • Dynamic Pages… • Embedded fragments automatically updated • Atomic vs. Incremental Publication • Problems ?? • 3 proposed algorithms 18845-01
Publishing process • Immediate fragments • Quality controlled fragments • Trigger Monitor’s notified • Fetches new copies from source • The ODG is updated • Graph Traversal algorithms applied • Bundles of web pages are written to sink 18845-01
Sample screen 18845-01
Performance • Deployed in 2000 Olympic Games Web Site 18845-01
Performance • Easier to design web sites • Users specifies and modifies relationships among web pages& fragments • Performance improvement • Incremental publication • Faster with 3 algorithms 18845-01
Active Cache: Caching Dynamic Contents on the Web April 19th, 2201 Selcuk Uluagac Aravind Pavuluri
Motivation and Active Cache • Dynamic documents constitute an increasing percentage of contents on the web • Affects the scalability of the web • No approaches presently to do Dynamic Content Caching • Solution: Active Cache….. 18845-01
Brief Overview • Migrates parts of server processing on each user request to the caching proxy via “cache applets” • A cache applet is a server-supplied code that is attached with a URL • On a user request the proxy invokes the cache applet • Cache applets allow servers to obtain the benefit of proxy caching without losing the capability to track user accesses and tailor the content presentation 18845-01
The Active Cache Protocol • Web server specifies association between a cache applet and a URL-named document by sending a new entity header “Cache Applet” with the document • CacheApplet: code = “code.class”, archive=“code.jar”, codebase=“codebase.url” • For security reasons, codebase of the applet has to has the same server URL as the document. 18845-01
The Active Cache Protocol (cont…) • Active Cache Obligations • If a document is cached, it will either invoke the cache applet or send the request directly to the server. • If an applet’s execution fails due any reason, the request is sent to the server • If applet’s execution succeeds , the proxy will take the appropriate action based on the return value of the FromCache method • Each applet can deposit information in a log object and the proxy will send the log object back to the server. 18845-01
Proxy Decides…. • Whether to cache a document • Whether to invoke the applet • Cache applet may not process every request for the document • Some requests may go the original server • What document or applet to evict from the cache at any time 18845-01
Active Cache Interface • Cache applet must implement the “ActiveCacheInterface” • FromCache( user_http_request, client_ip, client_name, cache_file, new_file) • Cache Applet can only call the ActiveProxy class to perform its functions • ActiveProxy provides methods for file access, cache query, locking and unlocking as well as sending requests to the server 18845-01
Active Cache Interface … Methods in ActiveProxy • Boolean is_in_cache( string url) • Public int open(string url, int mode) • Public int close(int fd) • Public int create(string url, int mode) • Public int read(int fd, byte[] buf, int size) • Public int lock(int fd) • Public string curtime() 18845-01
Cache Applet Examples • Logging User Requests • Logs eventually sent to the server • Advertising Banner Rotation • Decides which banner to put according to the specifications • Access Permission Checking • Applet verifies weather the server signed the document • Client-Specific Information Distribution • www.my.yahoo.com 18845-01
Security Mechanisms • Language-based Protection • ActiveProxy class implements the constraints • Java built in security measures • Prevents illegal access to information belonging to the other web servers • Resource Accounting • Proxy keeps track of an applets resource consumption in terms of storage size, disk bandwidth,network bandwidth , CPU usage and virtual memory size • Set upper limits on resources using setrlimit • Prevents Denial of Service attacks 18845-01
Implementation • Extended the CERN httpd proxy • Handles each request in a separate process • Makes it easy to set limits on the resources • Implements the Active Cache Protocol and the security mechanisms 18845-01
Performance • Degrades the performance at least by 50 – 75% • Increase in client latency by a factor of 1.5 to 4 • CPU becomes the bottleneck 18845-01
Conclusions • Active Cache trades local CPU resources for network bandwidth savings • $6K - $10K/month for a T1 line vs. • $2K for high end Computer with sufficient CPU • Improves object hit and byte hit count from 35% and 30% to 55% and 41% respectively 18845-01