130 likes | 222 Views
CIT 470: Advanced Network and System Administration. Debugging. Debugging. Debugging Learn the customer’s problem. Find the problem’s cause and fix it. Have the right tools. Fix things once Fix something once rather than over and over. Avoid the temporary fix trap.
E N D
CIT 470: Advanced Network and System Administration Debugging CIT 470: Advanced Network and System Administration
Debugging Debugging • Learn the customer’s problem. • Find the problem’s cause and fix it. • Have the right tools. Fix things once • Fix something once rather than over and over. • Avoid the temporary fix trap. • Measure twice, cut once, and other advice. CIT 470: Advanced Network and System Administration
Learn the Customer’s Problem • Understand at a high level what customer is attempting to do and what part is failing. • Customer problem reports vary: • My mail program is broken. • I can’t reach the mail server. • My mailbox disappeared! • Actual problem might be: • Network problem. • Power failure. • DNS problem. CIT 470: Advanced Network and System Administration
Find the Problem and Fix it Approach debugging systematically • Form a hypothesis. • Test hypothesis. • Record results. • Modify hypothesis based on results. Problem is often in the last change. • Last config change, last new hardware, etc. Avoid random changes and workarounds. • Rebooting is not a solution! CIT 470: Advanced Network and System Administration
Process of Elimination Process of Elimination • Remove parts of system one by one until problem disappears. • Problem must have been in last component. Examples • Remove DIMMs one by one to identify a bad memory unit. • Remove driver or application one by one to identify the source of the conflict. CIT 470: Advanced Network and System Administration
Successive Refinement Successive Refinement • Add a component at a time, verifying that it works correctly at each step along the way. • Examine output at each step along the way. Examples • traceroute: tests network connectivity one hop at a time until it encounters problem or reaches dest • pipeline: develop a piped set of commands by adding one command a time to the pipeline CIT 470: Advanced Network and System Administration
Have the Right Tools Tools to let you see inside devices/systems. network: sniffer, ping, traceroute, telnet/nc network services: netstat, rpcinfo operating system: log files process: system call tracer, e.g. strace performance: top, ps, vmstat, iostat Know how tools draw their conclusions. Tools can make mistakes or mislead you. CIT 470: Advanced Network and System Administration
Fix Things Once • Fixing something once is faster than fixing it over and over again. • Corollaries • Fix the problem permanently. • Don’t reinvent the wheel. • Fix the problem for all hosts at the same time. CIT 470: Advanced Network and System Administration
Avoid Temporary Fix Trap Quick fixes aren’t. • A few minutes of your time every day adds up over a month or year. • Temporary fixes accumulate until you spend your entire day doing one quick fix after another. Temporary fixes may be required • Lack of resources (hardware/software) or time. • Must always be followed by permanent fixes. • Add permanent fix to your calendar or request system to ensure that it happens. CIT 470: Advanced Network and System Administration
Learning from Carpenters Measure twice, cut once. • Double-check your work before making changes. • ex: Replace UNIX command with echo to be sure you’re modifying the correct files. • ex: Reread configuration file before restarting server. Copy exact • Develop correct solution and test it. • Copy solution exactly to other hosts or sites. CIT 470: Advanced Network and System Administration
Automate Automation can fix problems permanently • Log rotation script will ensure you don’t have to manually delete logs to avoid full disks. • Tape jukebox will ensure that you don’t forget to manually swap backup tapes. Avoid using automation for quick fixes • Automation can perform a temporary fix without needing human intervention, e.g. kill runaways. • Problem may grow over time without your awareness, and automation can’t fix buggy software. CIT 470: Advanced Network and System Administration
Key Points • Learn the customer’s problem. • Systematically identify the cause and fix it. • Process of elimination. • Successive refinement. • Fix the problem permanently. • Don’t reinvent the wheel. • Test your solution. • Use fix on all of your hosts. • Use automation wisely. CIT 470: Advanced Network and System Administration
References • Mark Burgess, Principles of System and Network Administration, Wiley, 2000. • Aeleen Frisch, Essential System Administration, 3rd edition, O’Reilly, 2002. • Thomas A. Limoncelli and Christine Hogan, The Practice of System and Network Administration, Addison-Wesley, 2002. • Evi Nemeth et al, UNIX System Administration Handbook, 3rd edition, Prentice Hall, 2001. CIT 470: Advanced Network and System Administration