Finding cacheable areas in your Web Site using Python and Selenium

Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel

What does this session talk about? • Python • Performance • Web applications • Hands on session

Caching • Hot topic in web applications because • Better response time across geo distribution • Better scalability • Difficult to focus at development time • Help developers to improve response time

Source: Steve Souders– Cache is King!

What to do • Find text areas repeated in a web resource (page, json response, other dynamic resources) in order to split them in different responses • Use Cache-Control, Expires and ETag HTTP Headers for caching control • Identify all the dependencies for a given URL • Even AJAX calls

Proposed Solution • Take snapshots in different points in time • Use selenium for: • Download ALL the content • Needs to run JS code for Ajax • Compare the snapshots looking for similarities • Split the similar text in different HTTP responses

Solution – Snapshots • Selenium through a forward proxy Proxy Twisted Web Server Store Content Data

Running Selenium – Snapshots • Call Selenium from Python • Use of WebDriver >>> from selenium import webdriver >>> >>> br = webdriver.Firefox() >>> >>> br.get(“http://www.intel.com”) >>> >>> br.close()

Twisted Proxy - Snapshots class CacheProxyClient(proxy.ProxyClient): defconnectionMade(self): # Connection Made. Prepare object properties defhandleHeader(self, key, value): # Save response header. defhandleResponsePart(self, buf): # Store response data. defhandleResponseEnd(self): # Finished response transmission. Store it class CacheProxyClientFactory(proxy.ProxyClientFactory): protocol = CacheProxyClient class CacheProxyRequest(proxy.ProxyRequest): protocols = dict(http=CacheProxyClientFactory) class CacheProxy(proxy.Proxy): requestFactory = CacheProxyRequest class CacheProxyFactory(http.HTTPFactory): protocol = CacheProxy

Selenium + Twisted - Snapshots • Run Selenium using Proxy >>> from selenium import webdriver >>> fp = webdriver.FirefoxProfile() >>> fp.set_preference("network.proxy.type", 1) >>> fp.set_preference("network.proxy.http", "localhost") >>> fp.set_preference("network.proxy.http_port", 8080) >>> br = webdriver.Firefox(firefox_profile=fp)

Selenium + Twisted - Snapshots • Configure Twisted and run Selenium in an internal Twisted thread from twisted.internetimport endpoints, reactor endpoint = endpoints.serverFromString(reactor, "tcp:%d:interface=%s" % (8080, "localhost")) d = endpoint.listen(CacheProxyFactory()) reactor.callInThread( runSelenium, url_str) reactor.run()

All together running

Comparison method Output = n = 2 = 1 2 3 n 1

''' Equal sequence searcher ''' defmatchingString(s1, s2): '''Compare 2 sequence of strings and return the matching sequences concatenated''' from difflibimport SequenceMatcher matcher = SequenceMatcher(None, s1, s2) output = "" for (i,_,n) in matcher.get_matching_blocks(): output += s1[i:i+n] return output defmatchingStringSequence( seq ): ''' Compare between pairs up to final result ''' try: matching = seq[0] for s in seq[1:len(seq)]: matching = matchingString(matching, s) return matching except TypeError: return "" Comparison

Next Steps • Split similar texts in different HTTP responses • Set Cache-Control • Public • Private • No-cache • Set Expires • Depending on the time it should be cache • Set ETag • If response is big and does change too often

Advanced Features to be done • Detect cache invalidation time from snapshots • SSL supports • Wait for all AJAX calls • Selenium Scripting • Authenticated URLs • Full feature sequence

Summary • If caching areas has not been identified previous to development, this code could save time and effort in doing so • Caching areas need to be analyzed for looking best cache method (server cache, CDN, browser caching) • Refactoring for maximizing caching data is the next step

Q & A

Thank you! david.r.elfi@intel.com @elfoTech

Finding cacheable areas in your Web Site using Python and Selenium

Finding cacheable areas in your Web Site using Python and Selenium

Presentation Transcript

Creating Your Web Site

Using Frames in a Web Site

Creating Your Web Site

Analyzing your web site

Using a Blog as your library Web Site

Hosting Your Web site

Enhancing Your Web Site

Automation using Selenium

Finding Resources On Your Web Site

Automation using Selenium

YOUR WEB SITE REVISITED

Your web site here.

Optimizing Your Web Site

Planning your Web Site

Planning Your Web Site

selenium with python

Selenium with python Training in BTM, Bangalore | Selenium Course in BTM, Bangalore

Building Site Selection Tool using Python

Preparing your Data using Python

Automation using Selenium

Best Selenium Training, Selenium with Java / Python Online Training

Web Scraping using Python