380 likes | 542 Views
DM221 Build a Productive 24x7 Database Operation Infrastructure. George Wang Melinda Meyers Principal Consultant Senior DBA Sybase Inc. America Online Inc. zwang@sybase.com mmeyersm@aol.com. AOL Business Challenge Architecture Future Direction Q & A. Agenda.
E N D
DM221Build a Productive 24x7 Database Operation Infrastructure • George Wang Melinda Meyers • Principal Consultant Senior DBA • Sybase Inc. America Online Inc. • zwang@sybase.com mmeyersm@aol.com
AOL Business Challenge Architecture Future Direction Q & A Agenda
World’s leader in interactive services, Web brands, Internet technologies, and e-commerce services AOL Mission Statement “To build a global medium as central to people’s lives as telephone or television…and even more valuable.” AOL Vision Statement “To build an interactive medium that improves the lives of people and benefits society as no other medium before it.” AOL Business
World’s #1 Web-based communication portal World’s #1 internet online service #1 value brand internet online service 1 of the top 5 web sites #1 local content network and community guide Nation’s #1 movie guide and ticketing service AOL Business
AOL Business Note: The number of ASE and RS on the graph does not indicate production deployment ASE RS
Explosive growth Large-scale distributed deployment 24x7 operation Heterogeneous environment Mixed versions of OS Mixed versions of ASE and RS Dynamic configuration Staff Challenge
Operation System monitoring Problem detection Maintenance Performance analysis Repository Automation Notification Architecture
Standardization Installation Configuration Procedures High-availability Fast response History analysis Proactive action Operation – Minimize Down Time
NOC SA On-call Designated SA Group Operation Chain of Escalation
Responding • Connectivity to BAK • Responding • Health of all threads • Stable device System Monitoring Ping ASE RS ASE BAK BAK
50 75 90 Warning Warning Alarm System Monitoring Data and Log Space 0 100
LogChecker Rule-based Check ASE errorlog Detect error messages Filter out informational messages Check RS errorlog Capture message tags Problem Detection
RS Heartbeat & Latency Program flow Alarm if latency > threshold Detect health of RS Latency analysis Problem Detection ASE RS ASE Insert a row at Time A Detect the row at Time A+latency
Database & Transaction Dump Dump to file system Copy system tables Unix backup Monitor capacity Maintenance
Miscellaneous Threshold of transaction log 50%, 75% and 90% Update statistics Rotate errorlogs Database consistency check Maintenance
Performance Data Collection ASE Monitor & Historical Server CPU utilization Store procedure execution IO activity Cache activity Object activity Server status Locking Performance Analysis
Performance Data Analysis Exception Trend Capacity Load Benchmark Performance Analysis
Server inventory Maintenance log Problem history Performance warehouse Repository
Server characteristics name hostname SA CPU version OS PM memory type subsystem POC connection Manual update via web pages Automatic update via collection agents Server Inventory
Maintenance history Installation Upgrade Bounce OS maintenance Configuration Space allocation Update & query via web pages Help diagnose problems Maintenance Log
Problem history Symptom Diagnostics Solution Case tracking Workaround Update & query via web pages Benefits Diagnose similar problems Share knowledge and skills Problem History
Automatic data collection Automatic data summary Analysis model Dynamic – On-demand analysis on the Web Static – Pre-defined and complex data model Delivery via the Web Performance Warehouse
Operation & Repository SI: Server Inventory ML: Maintenance Log PH: Problem History PW: Performance Warehouse
Unix Cron Benefits Simple Drawbacks Failure detection Job stream & dependency Standalone vs. Distributed environment Job Scheduling
Autosys Scheduling and operations automation for distribution environment Benefits Centralized job scheduling & management Flexible job scheduling and dependency Uninterrupted job processing Failure detection Fault tolerance Job Scheduling
Autosys Ethernet Client Server Remote Agent Polls Remote Agent Autosys Database Event Processor • Remote Agent • start up • run job • return job status • exit • event found • starting conditions met • start up remote agent
Job location Job @ Autosys
Start condition Min & max run time Automatic restart Grouping Dependency Standard input & output redirection Job @ Autosys
EMAIL Email Notification ASE RS ASE Monitoring Host Autosys NOC SA On-call Designated SA Group
Benefits Easy to configure Drawbacks Large volume Duplicate Hard to prioritize Broken escalation chain Difficult to identify problem Email Notification
Probes Object Server Netcool/OMNIbus ASE RS ASE Monitoring Host Autosys NOC SA On-call Designated SA Group
Real-time event monitoring & management Consolidates Integrates Configurable Meaningful Netcool/Omnibus
Partially Infrastructure Component Web-enabled Operation Repository Automation Notification
Productivity Availability Reliability Scalability Infrastructure
Improved quality of online services Overall system availability 99.6% in 1999 Strong subscriber growth (10M to 20M in 2 years) Strong revenue growth (500M to 1.3B in 2 years) Business Impact
Web-based infrastructure Knowledge-centric event analysis Performance-based early detection Automated agent Problem auto-correction Enterprise management integration Future Direction
Contact Information George Wang <zwang@sybase.com> Mendy Meyers <mmeyersm@aol.com> Questions & Answers Conclusion Thank You