1 / 19

Finding cacheable areas in your Web Site using Python and Selenium

Learn how to identify cacheable areas in web applications for improved performance and scalability. Utilize Python, Selenium, and Twisted to capture snapshots, analyze text similarities, and implement effective caching strategies.

rachelk
Download Presentation

Finding cacheable areas in your Web Site using Python and Selenium

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel

  2. What does this session talk about? • Python • Performance • Web applications • Hands on session

  3. Caching • Hot topic in web applications because • Better response time across geo distribution • Better scalability • Difficult to focus at development time • Help developers to improve response time

  4. Source: Steve Souders– Cache is King!

  5. What to do • Find text areas repeated in a web resource (page, json response, other dynamic resources) in order to split them in different responses • Use Cache-Control, Expires and ETag HTTP Headers for caching control • Identify all the dependencies for a given URL • Even AJAX calls

  6. Proposed Solution • Take snapshots in different points in time • Use selenium for: • Download ALL the content • Needs to run JS code for Ajax • Compare the snapshots looking for similarities • Split the similar text in different HTTP responses

  7. Solution – Snapshots • Selenium through a forward proxy Proxy Twisted Web Server Store Content Data

  8. Running Selenium – Snapshots • Call Selenium from Python • Use of WebDriver >>> from selenium import webdriver >>> >>> br = webdriver.Firefox() >>> >>> br.get(“http://www.intel.com”) >>> >>> br.close()

  9. Twisted Proxy - Snapshots class CacheProxyClient(proxy.ProxyClient): defconnectionMade(self): # Connection Made. Prepare object properties defhandleHeader(self, key, value): # Save response header. defhandleResponsePart(self, buf): # Store response data. defhandleResponseEnd(self): # Finished response transmission. Store it class CacheProxyClientFactory(proxy.ProxyClientFactory): protocol = CacheProxyClient class CacheProxyRequest(proxy.ProxyRequest): protocols = dict(http=CacheProxyClientFactory) class CacheProxy(proxy.Proxy): requestFactory = CacheProxyRequest class CacheProxyFactory(http.HTTPFactory): protocol = CacheProxy

  10. Selenium + Twisted - Snapshots • Run Selenium using Proxy >>> from selenium import webdriver >>> fp = webdriver.FirefoxProfile() >>> fp.set_preference("network.proxy.type", 1) >>> fp.set_preference("network.proxy.http", "localhost") >>> fp.set_preference("network.proxy.http_port", 8080) >>> br = webdriver.Firefox(firefox_profile=fp)

  11. Selenium + Twisted - Snapshots • Configure Twisted and run Selenium in an internal Twisted thread from twisted.internetimport endpoints, reactor endpoint = endpoints.serverFromString(reactor, "tcp:%d:interface=%s" % (8080, "localhost")) d = endpoint.listen(CacheProxyFactory()) reactor.callInThread( runSelenium, url_str) reactor.run()

  12. All together running

  13. Comparison method Output = n = 2 = 1 2 3 n 1

  14. ''' Equal sequence searcher ''' defmatchingString(s1, s2): '''Compare 2 sequence of strings and return the matching sequences concatenated''' from difflibimport SequenceMatcher matcher = SequenceMatcher(None, s1, s2) output = "" for (i,_,n) in matcher.get_matching_blocks(): output += s1[i:i+n] return output defmatchingStringSequence( seq ): ''' Compare between pairs up to final result ''' try: matching = seq[0] for s in seq[1:len(seq)]: matching = matchingString(matching, s) return matching except TypeError: return "" Comparison

  15. Next Steps • Split similar texts in different HTTP responses • Set Cache-Control • Public • Private • No-cache • Set Expires • Depending on the time it should be cache • Set ETag • If response is big and does change too often

  16. Advanced Features to be done • Detect cache invalidation time from snapshots • SSL supports • Wait for all AJAX calls • Selenium Scripting • Authenticated URLs • Full feature sequence

  17. Summary • If caching areas has not been identified previous to development, this code could save time and effort in doing so • Caching areas need to be analyzed for looking best cache method (server cache, CDN, browser caching) • Refactoring for maximizing caching data is the next step

  18. Q & A

  19. Thank you! david.r.elfi@intel.com @elfoTech

More Related