1 / 12

CSE 190: Internet E-Commerce

Learn about the essential operations required to keep a web site up and running, including deployment processes, monitoring techniques, maintenance procedures, load testing, browser compliance, and more.

marieh
Download Presentation

CSE 190: Internet E-Commerce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 190: Internet E-Commerce Lecture 14: Operations

  2. Operations • Everything it takes to keep a web site up and running, 24x7 • Deployment Process • Monitoring (SNMP) • Build system • Link rot • Maintenance window • Load testing • Browser compliance • Log rotation • Database backups • Disk failure • Router failure • Robots • Staffing • Data centers • Expense of running a high availability site is comparable to running a physical store front

  3. Deployment Process • Proceeds in three phases • Development • Within corporation, not accessible outside • Stage • Within internet environment • UAT run here • Only operations staff may access • Live • Accessible to outside world

  4. Monitoring • SNMP (Simple Network Management Protocol) • Used to monitor both hardware, software • Provides: Counters, Values, Triggers, Statistics • Remote control of services • Information stored in MIB (Management Information Base) • RMON sometimes used as alternative to SNMPv2 • Software • HP OpenView

  5. Maintenance Window • Installation • Standard: J2EE standard web service descriptor (XML file with tarball of files) • InstallShield • Custom installation scripts • Upgrades • Defined time on Friday or weekend to upgrade site, posted on web site • Process: • Front page linked to ‘Site down’ • Load balancer redirected if appropriate • Application stops accepting new clients • (Pause) Application terminates all active sessions • Application upgraded • Sanity checks performed • Servers rebooted • Load balancer restored

  6. Link Rot • Link rot: the continual process by which links become invalid over time • Tracked with custom tools • Best practice: Pages have permanent URLs • Referral field: • Tracking this in logs shows who’s linking to what URL on your site

  7. Load Testing • Network load (60% bandwidth max) • Average page size (~20-30k) • CPU load: Occurs at least three levels • HTTP level • Application level • DB query level • Metrics: maximum number of simultaneous users, latency vs. users • Memory usage (256 M – 1 G per machine) • Disk I/O load • 1 Gb per machine typical • Tools • Mercury Interactive: WinRunner • Segue: SilkTest • Rational: SiteLoad • Microsoft: WCAT

  8. Browser Compatibility • Cost of testing proportional to the number of platforms you’re compatible with • The same product isn’t the same on different operating systems • E.g. IE4.5 isn’t the same on Mac vs. Windows • Incompatible DOMs between MS, Netscape, Mozilla • Browser archive • http://browsers.evolt.org/

  9. Robots • Robots: Automatically traverse web pages to retrieve documents, link structure, data • Used for: • Indexing • HTML validation • Link validation • Mirroring • Problems: • Too much rapid access from single IP • May be indexing dynamic, obsolete data • Robot exclusion file:# /robots.txt file for mysite.com User-agent: webcrawlerDisallow: User-agent: lycra Disallow: / User-agent: * • Disallow: /jspDisallow: /logs

  10. Integration Useful Life Obsolete & test Burn in Useful Life Wear out Hardware Failure Rate Software Failure Rate Failure Models • Mean Time To Failure (MTTF) = average amount of time the system is up • Mean Time between Failures (MTBF) = average amount of time between failures • Mean Time To Repair (MTTR) = average amount of time the system is down after it fails - active repair time (diagnostics and repair) • Mean Down Time (MDT) - average amount of time system is down after it fails - active repair time + preventive maintenance + logistics time (time spent waiting for personnel, etc) • Intrinsic availability: Mean Time To Failure (MTTF) Mean Time To Failure (MTTF) + MTTR • Operational availability: Mean Time Between Failure (MTBF) Mean Time Between Failure (MTBF) + MDT

  11. When things go wrong • Network operations • Software recovers from common failures • Network staff paged by email if server not available (via SNMP) • Usually rotating assignment • Application developers may be called in if restarting servers, etc. fails completely. Only if it doesn’t look like a network problem.

  12. Data Centers • Data centers: Host your machines in their own premises • Also called “colocation” • Features • Security: controlled entrance, exit • Weather: maintained temperature, humidity • Power: Backup power, available circuits • Bandwidth: OC-192 connections • Monitoring: 24/7 staff, may reboot misbehaving machines • Machines typically arranged in “cages”; 1u, 2u machines • Server blades • Examples • NTT / Verio • Exodus / Global Crossing

More Related