190 likes | 314 Views
Lecture 30: Logging. TDT4285 Planlegging og drift av IT-systemer Spring 2010 Anders Christensen, IDI. Definition. Logging is the generation and collection of short, informative messages, stored chronologically in a standardized format, describing events or states on a computer system.
E N D
Lecture 30: Logging TDT4285 Planlegging og drift av IT-systemer Spring 2010 Anders Christensen, IDI TDT4285 Planl&drift IT-syst
Definition Logging is the generation and collection of short, informative messages, stored chronologically in a standardized format, describing events or states on a computer system. TDT4285 Planl&drift IT-syst
Examples • Sep 15 15:34:47 roma kernel: da0 at ahc0 bus 0 target 0 lun 0 • Sep 15 15:34:47 roma kernel: da0: <QUANTUM ATLAS10K2-TY184J DDD6> Fixed Direct Access SCSI-3 device • Sep 15 15:34:47 roma kernel: da0: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled • Sep 15 15:34:47 roma kernel: da0: 17510MB (35860910 512 byte sectors: 255H 63S/T 2232C) • Sep 15 15:34:47 roma kernel: da1 at ahc0 bus 0 target 2 lun 0 • Sep 15 15:34:47 roma kernel: da1: <QUANTUM ATLAS10K2-TY184J DDD6> Fixed Direct Access SCSI-3 device • Sep 15 15:34:48 roma kernel: da1: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled • Sep 15 15:34:48 roma kernel: da1: 17510MB (35860910 512 byte sectors: 255H 63S/T 2232C) • Sep 15 15:34:48 roma kernel: SMP: AP CPU #1 Launched! • Sep 15 15:34:48 roma kernel: Mounting root from ufs:/dev/da0s1a • Sep 15 15:34:45 roma ntpd[429]: ntpd 4.1.1b-a Mon Feb 23 18:29:46 GMT 2004 (1) • Sep 15 15:34:45 roma ntpd[429]: kernel time discipline status 2040 • Sep 15 15:38:03 roma ntpd[429]: kernel time discipline status change 2041 • Sep 15 21:05:39 roma su: adf to root on /dev/ttyp1 • Sep 15 21:17:15 roma syslogd: exiting on signal 15 • Sep 15 21:17:15 roma syslogd: kernel boot file is /boot/kernel/kernel • Sep 15 21:14:10 charybdis Forwarded from furu: last message repeated 5 times • Sep 15 21:26:19 charybdis Forwarded from furu: adftest: [ID 702911 daemon.notice] teasdf asdf • Sep 15 21:26:35 roma syslogd: exiting on signal 15 • Sep 15 21:26:46 roma syslogd: kernel boot file is /boot/kernel/kernel • Sep 15 21:26:56 charybdis Forwarded from furu: adftest: [ID 702911 daemon.notice] teasdf asdf • Sep 15 21:27:02 charybdis Forwarded from furu: adftest: [ID 702911 daemon.notice] teasdf asdf • Sep 15 21:27:03 charybdis Forwarded from furu: adftest: [ID 702911 daemon.notice] teasdf asd TDT4285 Planl&drift IT-syst
Monitoring and management Logging is necessary for good mgmt: • Without: There is less room for detecting faults, and thus for finding and correcting them before the users complain. • With: Enables you to detect the fault before it creates problems, before the users notice, and maybe even before the event occurs. TDT4285 Planl&drift IT-syst
Logging, monitoring and statistics Analysis Statistics Batch Agregation to Use Log Monitoring Interprets and filters Real-time Triggering Maintains Generates Alarms Present state Events TDT4285 Planl&drift IT-syst
Methods for logging Polling – repeatedly and regularily to collect data: • Qualitatively: to test functionality • Quantitatively: to measure performance Events – the logentry is a consequence of an event that happened. TDT4285 Planl&drift IT-syst
The use of logging Visualization Restoration Billing State Surveillance Usage data Statistics Events Fault detection Trends Automatic repairs Document SLA targets Security Alarms TDT4285 Planl&drift IT-syst
Example TDT4285 Planl&drift IT-syst
”We know that the system crashed at 14:06:35, but what happened in the minutes and seconds before, and what happened at elsewhere? Other subsystems? Other machines? Just before? What would be normal? The error message? When did it happen? The order of events? Post mortem analysis TDT4285 Planl&drift IT-syst
Error is logged A filter continually monitors the logs When a log message about an error is detected, alarms or repair can be initiated. Can also be used to predict future problems. Automatic detection and repair Component Prediction Logging Repair Log Filter TDT4285 Planl&drift IT-syst
Analysis Traditional: the programming of filters, where sys sysadmin traps specific messages, or focuses on the remaining messages after all known messages have been removed. Untraditional: computer analysis of the logs, where the interesting stuff is detected through general rules, not rules specific to the actual log or program that generated it. TDT4285 Planl&drift IT-syst
Volume and variation in the logs prevents overview and makes it nearly impossible to see patterns in what has been logged. ”Information overflow” Many machines + Volume Many programs Variation + Change rate TDT4285 Planl&drift IT-syst
Filtering a log may: Extract certain data Summarize data Remove irrelevant data so that the interesting stuff is more visible. Requires: Programming! Filtering Log Filter Hits Summary Rest TDT4285 Planl&drift IT-syst
Wishlist for filtering • See subsystems in context • Ignore follow-on errors • Detect patterns from historic data • Predict future development • Be self-configuring • Have no need for manual update • Prioritize messages TDT4285 Planl&drift IT-syst
Who notifies? Three methods: • The faulty component notifies • Independent system monitors and notifies • Combination of 1 and 2 There should always be more than on alarm on critical systems TDT4285 Planl&drift IT-syst
Some points in relation to monitoring Circular logs Privacy Rotation Number of copies reuse Backup anonymization Remote storage Disk usage Log condensation less TDT4285 Planl&drift IT-syst
Who is notified How many are notified Notification in parallel or serial Coordination of notifications Escalation and repetion of notifications Acknowledgements on notifications Is the notification understandable New notification is the error state is changed BAS 337EL02XX04a PRI 5 (17.12 13:41). Key points wrt notifications TDT4285 Planl&drift IT-syst
Some Trip Wires • Synchronization of clocks • Time zones and summer time • Don’t trust the logs, they can be doctored • Syslog uses UDP – may loose data • The disks will fill up • Circular logs will be overwritten • The write-only (read-never) log TDT4285 Planl&drift IT-syst