190 likes | 213 Views
Learn how to identify cacheable areas in web applications for improved performance and scalability. Utilize Python, Selenium, and Twisted to capture snapshots, analyze text similarities, and implement effective caching strategies.
E N D
Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel
What does this session talk about? • Python • Performance • Web applications • Hands on session
Caching • Hot topic in web applications because • Better response time across geo distribution • Better scalability • Difficult to focus at development time • Help developers to improve response time
What to do • Find text areas repeated in a web resource (page, json response, other dynamic resources) in order to split them in different responses • Use Cache-Control, Expires and ETag HTTP Headers for caching control • Identify all the dependencies for a given URL • Even AJAX calls
Proposed Solution • Take snapshots in different points in time • Use selenium for: • Download ALL the content • Needs to run JS code for Ajax • Compare the snapshots looking for similarities • Split the similar text in different HTTP responses
Solution – Snapshots • Selenium through a forward proxy Proxy Twisted Web Server Store Content Data
Running Selenium – Snapshots • Call Selenium from Python • Use of WebDriver >>> from selenium import webdriver >>> >>> br = webdriver.Firefox() >>> >>> br.get(“http://www.intel.com”) >>> >>> br.close()
Twisted Proxy - Snapshots class CacheProxyClient(proxy.ProxyClient): defconnectionMade(self): # Connection Made. Prepare object properties defhandleHeader(self, key, value): # Save response header. defhandleResponsePart(self, buf): # Store response data. defhandleResponseEnd(self): # Finished response transmission. Store it class CacheProxyClientFactory(proxy.ProxyClientFactory): protocol = CacheProxyClient class CacheProxyRequest(proxy.ProxyRequest): protocols = dict(http=CacheProxyClientFactory) class CacheProxy(proxy.Proxy): requestFactory = CacheProxyRequest class CacheProxyFactory(http.HTTPFactory): protocol = CacheProxy
Selenium + Twisted - Snapshots • Run Selenium using Proxy >>> from selenium import webdriver >>> fp = webdriver.FirefoxProfile() >>> fp.set_preference("network.proxy.type", 1) >>> fp.set_preference("network.proxy.http", "localhost") >>> fp.set_preference("network.proxy.http_port", 8080) >>> br = webdriver.Firefox(firefox_profile=fp)
Selenium + Twisted - Snapshots • Configure Twisted and run Selenium in an internal Twisted thread from twisted.internetimport endpoints, reactor endpoint = endpoints.serverFromString(reactor, "tcp:%d:interface=%s" % (8080, "localhost")) d = endpoint.listen(CacheProxyFactory()) reactor.callInThread( runSelenium, url_str) reactor.run()
Comparison method Output = n = 2 = 1 2 3 n 1
''' Equal sequence searcher ''' defmatchingString(s1, s2): '''Compare 2 sequence of strings and return the matching sequences concatenated''' from difflibimport SequenceMatcher matcher = SequenceMatcher(None, s1, s2) output = "" for (i,_,n) in matcher.get_matching_blocks(): output += s1[i:i+n] return output defmatchingStringSequence( seq ): ''' Compare between pairs up to final result ''' try: matching = seq[0] for s in seq[1:len(seq)]: matching = matchingString(matching, s) return matching except TypeError: return "" Comparison
Next Steps • Split similar texts in different HTTP responses • Set Cache-Control • Public • Private • No-cache • Set Expires • Depending on the time it should be cache • Set ETag • If response is big and does change too often
Advanced Features to be done • Detect cache invalidation time from snapshots • SSL supports • Wait for all AJAX calls • Selenium Scripting • Authenticated URLs • Full feature sequence
Summary • If caching areas has not been identified previous to development, this code could save time and effort in doing so • Caching areas need to be analyzed for looking best cache method (server cache, CDN, browser caching) • Refactoring for maximizing caching data is the next step
Thank you! david.r.elfi@intel.com @elfoTech